Looking at my previous mail which mentions changes to API, optimizer, and runtime code of the DataSet API this would be a major and non-trivial effort and also require that a committer spends a good amount of time for this.
2018-01-16 10:07 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it>: > Do you think is that complex to support it? I think we can try to > implement it if someone could give us some support (at least some big > picture) > > On Tue, Jan 16, 2018 at 10:02 AM, Fabian Hueske <fhue...@gmail.com> wrote: > >> No, I'm not aware of anybody working on extending the Hadoop >> compatibility support. >> I'll also have no time to work on this any time soon :-( >> >> 2018-01-13 1:34 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it>: >> >>> Any progress on this Fabian? HBase bulk loading is a common task for us >>> and it's very annoying and uncomfortable to run a separate YARN job to >>> accomplish it... >>> >>> On 10 Apr 2015 12:26, "Flavio Pompermaier" <pomperma...@okkam.it> wrote: >>> >>> Great! That will be awesome. >>> Thank you Fabian >>> >>> On Fri, Apr 10, 2015 at 12:14 PM, Fabian Hueske <fhue...@gmail.com> >>> wrote: >>> >>>> Hmm, that's a tricky question ;-) I would need to have a closer look. >>>> But getting custom comparators for sorting and grouping into the Combiner >>>> is not that trivial because it touches API, Optimizer, and Runtime code. >>>> However, I did that before for the Reducer and with the recent addition of >>>> groupCombine the Reducer changes might be just applied to combine. >>>> >>>> I'll be gone next week, but if you want to, we can have a closer look >>>> at the problem after that. >>>> >>>> 2015-04-10 12:07 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: >>>> >>>>> I think I could also take care of it if somebody can help me and guide >>>>> me a little bit.. >>>>> How long do you think it will require to complete such a task? >>>>> >>>>> On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <fhue...@gmail.com> >>>>> wrote: >>>>> >>>>>> We had an effort to execute any HadoopMR program by simply specifying >>>>>> the JobConf and execute it (even embedded in regular Flink programs). >>>>>> We got quite far but did not complete (counters and custom grouping / >>>>>> sorting functions for Combiners are missing if I remember correctly). >>>>>> I don't think that anybody is working on that right now, but it would >>>>>> definitely be a cool feature. >>>>>> >>>>>> 2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: >>>>>> >>>>>>> Hi guys, >>>>>>> >>>>>>> I have a nice question about Hadoop compatibility. >>>>>>> In https://flink.apache.org/news/2014/11/18/hadoop-compatibilit >>>>>>> y.html you say that you can reuse existing mapreduce programs. >>>>>>> Could it be possible to manage also complex mapreduce programs like >>>>>>> HBase BulkImport that use for example a custom partioner >>>>>>> (org.apache.hadoop.mapreduce.Partitioner)? >>>>>>> >>>>>>> In the bulk-import examples the call >>>>>>> HFileOutputFormat2.configureIncrementalLoadMap >>>>>>> that sets a series of job parameters (like partitioner, mapper, >>>>>>> reducers, >>>>>>> etc) -> http://pastebin.com/8VXjYAEf. >>>>>>> The full code of it can be seen at https://github.com/apache/h >>>>>>> base/blob/master/hbase-server/src/main/java/org/apache/hadoo >>>>>>> p/hbase/mapreduce/HFileOutputFormat2.java. >>>>>>> >>>>>>> Do you think there's any change to make it run in flink? >>>>>>> >>>>>>> Best, >>>>>>> Flavio >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>> >> > > > -- > Flavio Pompermaier > Development Department > > OKKAM S.r.l. > Tel. +(39) 0461 041809 <+39%200461%20041809> >