Our primary focus currently is on the benipal social e-commerce platform. The first module on the platform is available online now in alpha version at www.benipal.com
for user feedback and trial.
social e-commerce platform
Project Requirements:
Problem: Over 5,000,000,000 data points from over 3,000 merchants that needed to be properly matched, sorted and categorized.
Solution: HBase and Mapreduce Using HBase, we created a custom solution which sorted all data in a textbook linear structure allowing for easy mapping. Using MySQL, our team was unable to finish the mapping since each data point could by itself be matched to any of the other 4,999,999,999 points and MySQL was not the right choice. By moving to HBase and Mapreduce, we were able to sort the entire data and create a base with the data points individually sorted and running the same data against the Base table, we were able to finish the same job in 4 hours, using our in-house 35 node Hadoop cluster.
Problem: Index a 500 GB HBase table with over 100 Million Rows and 1000s of Column families in Lucene
Solution: We forked the Lucene and Solr source code and created a HBase connector for SOLR which would integrate with the current SOLR User Interface for easier management. The existing MySQL dataimport connector was heavily modified to be used against HBase. HBase table schema was matched against the Solr Indexing schema and each attribute was indexed, both jointly and severally. Lucene and Solr were finetuned to achieve indexing speeds of upto 3,000 documents per second with lesser indexing attributes. Under a full import of the entire DB, our current speed stands at roughly 2000 documents per second. We hope to release the HBase connector for use by the open source community after more testing.
Problem: Search the Lucene Index to provide a better search experience
Solution: When the standard and dismax query handlers proved inadequate for the highly accurate results we required, our team created a custom handler using heuristic algorithms developed in-house. By separating each individual attribute from over 100,000,000 products we were able to finetune our algorithm and create a solution which can accurately find what you are searching for. The question was simple. Does your search question mean you are looking for a Book, a Movie or a Computer? A 4 member team worked round the clock for over 6 months to achieve the search accuracy that can be currently viewed at www.benipal.com
Problem: Map over 400 GB of updated merchant data on a daily basis.
Solution: Mapreduce using HBase.
Problem: Setup a Base Computing Infrastructure which would scale with growth:
Solution: Spring with Terracotta and ehCache.
OnGoing Projects: Mobile Apps for Barcode scanning.
Problem: Low Light Barcode scanning is mostly ineffective.
Solution: Ongoing.
social e-commerce platform
Project Requirements:
Problem: Over 5,000,000,000 data points from over 3,000 merchants that needed to be properly matched, sorted and categorized.
Solution: HBase and Mapreduce Using HBase, we created a custom solution which sorted all data in a textbook linear structure allowing for easy mapping. Using MySQL, our team was unable to finish the mapping since each data point could by itself be matched to any of the other 4,999,999,999 points and MySQL was not the right choice. By moving to HBase and Mapreduce, we were able to sort the entire data and create a base with the data points individually sorted and running the same data against the Base table, we were able to finish the same job in 4 hours, using our in-house 35 node Hadoop cluster.
Problem: Index a 500 GB HBase table with over 100 Million Rows and 1000s of Column families in Lucene
Solution: We forked the Lucene and Solr source code and created a HBase connector for SOLR which would integrate with the current SOLR User Interface for easier management. The existing MySQL dataimport connector was heavily modified to be used against HBase. HBase table schema was matched against the Solr Indexing schema and each attribute was indexed, both jointly and severally. Lucene and Solr were finetuned to achieve indexing speeds of upto 3,000 documents per second with lesser indexing attributes. Under a full import of the entire DB, our current speed stands at roughly 2000 documents per second. We hope to release the HBase connector for use by the open source community after more testing.
Problem: Search the Lucene Index to provide a better search experience
Solution: When the standard and dismax query handlers proved inadequate for the highly accurate results we required, our team created a custom handler using heuristic algorithms developed in-house. By separating each individual attribute from over 100,000,000 products we were able to finetune our algorithm and create a solution which can accurately find what you are searching for. The question was simple. Does your search question mean you are looking for a Book, a Movie or a Computer? A 4 member team worked round the clock for over 6 months to achieve the search accuracy that can be currently viewed at www.benipal.com
Problem: Map over 400 GB of updated merchant data on a daily basis.
Solution: Mapreduce using HBase.
Problem: Setup a Base Computing Infrastructure which would scale with growth:
Solution: Spring with Terracotta and ehCache.
OnGoing Projects: Mobile Apps for Barcode scanning.
Problem: Low Light Barcode scanning is mostly ineffective.
Solution: Ongoing.