Any progress on this Fabian? HBase bulk loading is a common task for us and it's very annoying and uncomfortable to run a separate YARN job to accomplish it...
On 10 Apr 2015 12:26, "Flavio Pompermaier" <pomperma...@okkam.it> wrote: Great! That will be awesome. Thank you Fabian On Fri, Apr 10, 2015 at 12:14 PM, Fabian Hueske <fhue...@gmail.com> wrote: > Hmm, that's a tricky question ;-) I would need to have a closer look. But > getting custom comparators for sorting and grouping into the Combiner is > not that trivial because it touches API, Optimizer, and Runtime code. > However, I did that before for the Reducer and with the recent addition of > groupCombine the Reducer changes might be just applied to combine. > > I'll be gone next week, but if you want to, we can have a closer look at > the problem after that. > > 2015-04-10 12:07 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: > >> I think I could also take care of it if somebody can help me and guide me >> a little bit.. >> How long do you think it will require to complete such a task? >> >> On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <fhue...@gmail.com> >> wrote: >> >>> We had an effort to execute any HadoopMR program by simply specifying >>> the JobConf and execute it (even embedded in regular Flink programs). >>> We got quite far but did not complete (counters and custom grouping / >>> sorting functions for Combiners are missing if I remember correctly). >>> I don't think that anybody is working on that right now, but it would >>> definitely be a cool feature. >>> >>> 2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: >>> >>>> Hi guys, >>>> >>>> I have a nice question about Hadoop compatibility. >>>> In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html >>>> you say that you can reuse existing mapreduce programs. >>>> Could it be possible to manage also complex mapreduce programs like >>>> HBase BulkImport that use for example a custom partioner >>>> (org.apache.hadoop.mapreduce.Partitioner)? >>>> >>>> In the bulk-import examples the call >>>> HFileOutputFormat2.configureIncrementalLoadMap >>>> that sets a series of job parameters (like partitioner, mapper, reducers, >>>> etc) -> http://pastebin.com/8VXjYAEf. >>>> The full code of it can be seen at https://github.com/apache/ >>>> hbase/blob/master/hbase-server/src/main/java/org/ >>>> apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java. >>>> >>>> Do you think there's any change to make it run in flink? >>>> >>>> Best, >>>> Flavio >>>> >>> >>> >> >