Last updated: 21/03/2013
Closing date: 31/10/2013
MapReduce and especially Hadoop MapReduce (the most popular open source implementation) is used for novel solutions on massive datasets such as web or real-time analytics and data mining. Scientists from Saarland University have developed substantial enhancements of Hadoop Distributed File System (HDFS) and Hadoop MapReduce which dramatically improve the runtime of MapReduce jobs. Partners for further development and licensing of the technology are sought.
MapReduce and especially Hadoop MapReduce (the most popular open source implementation) has become the de facto standard for large scale analytics in enterprises. It is used for novel solutions on massive datasets such as web or real-time analytics and data mining.
However there is one major drawback of Hadoop MapReduce: the truly slow response times. They are mainly due to the full scan data access of the MapReduce jobs.
Scientists from Saarland University have developed substantial enhancements of Hadoop Distributed File System (HDFS) and Hadoop MapReduce which dramatically improve the runtime of MapReduce jobs.
The key idea of smart replication is to keep the already existing physical replicas of an HDFS block in different layouts, sort orders and/or different (clustered) indexes.
This means that for a default replication factor of three, at least three different sort orders, indexes and/or data layouts are available for MapReduce job processing. The smart replication related modification of the HDFS upload pipeline ensures the suitable creation of indexes and layouts as well as sort orders already during data upload. Thus, the likelihood to find a suitable index and/or data layout increases and consequently the runtime for the workload decreases.
Benchmark experiments indicate that smart replication typically creates a win-win situation over Hadoop, i.e. by simultaneously improving both data upload to HDFS (up to 60%) as well as the runtime of the actual Hadoop MapReduce job (up to a factor of 68).
- modifications of HDFS and MapReduce related to smart replication are almost invisible to the user
- smart replication is easy to integrate into existing Hadoop based systems
- smart replication is implementable for distributed systems in general
Partner expertise sought:
- Type of partner sought: commercial and industrial partners
- Specific area of activity of the partner: software development, IT
- Task to be performed by the partner sought: further development and/or application of the technology through a licence
Listed under: Electronics, Microelectronics \ Software
Select an option from below to enquire about this opportunity: