Re: Hadoop compatibility and HBase bulk loading

Fabian Hueske Tue, 16 Jan 2018 01:15:39 -0800

Looking at my previous mail which mentions changes to API, optimizer, and
runtime code of the DataSet API this would be a major and non-trivial
effort and also require that a committer spends a good amount of time for
this.



2018-01-16 10:07 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it>:

> Do you think is that complex to support it? I think we can try to
> implement it if someone could give us some support (at least some big
> picture)
>
> On Tue, Jan 16, 2018 at 10:02 AM, Fabian Hueske <fhue...@gmail.com> wrote:
>
>> No, I'm not aware of anybody working on extending the Hadoop
>> compatibility support.
>> I'll also have no time to work on this any time soon :-(
>>
>> 2018-01-13 1:34 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it>:
>>
>>> Any progress on this Fabian? HBase bulk loading is a common task for us
>>> and it's very annoying and uncomfortable to run a separate YARN job to
>>> accomplish it...
>>>
>>> On 10 Apr 2015 12:26, "Flavio Pompermaier" <pomperma...@okkam.it> wrote:
>>>
>>> Great! That will be awesome.
>>> Thank you Fabian
>>>
>>> On Fri, Apr 10, 2015 at 12:14 PM, Fabian Hueske <fhue...@gmail.com>
>>> wrote:
>>>
>>>> Hmm, that's a tricky question ;-) I would need to have a closer look.
>>>> But getting custom comparators for sorting and grouping into the Combiner
>>>> is not that trivial because it touches API, Optimizer, and Runtime code.
>>>> However, I did that before for the Reducer and with the recent addition of
>>>> groupCombine the Reducer changes might be just applied to combine.
>>>>
>>>> I'll be gone next week, but if you want to, we can have a closer look
>>>> at the problem after that.
>>>>
>>>> 2015-04-10 12:07 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>:
>>>>
>>>>> I think I could also take care of it if somebody can help me and guide
>>>>> me a little bit..
>>>>> How long do you think it will require to complete such a task?
>>>>>
>>>>> On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <fhue...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> We had an effort to execute any HadoopMR program by simply specifying
>>>>>> the JobConf and execute it (even embedded in regular Flink programs).
>>>>>> We got quite far but did not complete (counters and custom grouping /
>>>>>> sorting functions for Combiners are missing if I remember correctly).
>>>>>> I don't think that anybody is working on that right now, but it would
>>>>>> definitely be a cool feature.
>>>>>>
>>>>>> 2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>:
>>>>>>
>>>>>>> Hi guys,
>>>>>>>
>>>>>>> I have a nice question about Hadoop compatibility.
>>>>>>> In https://flink.apache.org/news/2014/11/18/hadoop-compatibilit
>>>>>>> y.html you say that you can reuse existing mapreduce programs.
>>>>>>> Could it be possible to manage also complex mapreduce programs like
>>>>>>> HBase BulkImport that use for example a custom partioner
>>>>>>> (org.apache.hadoop.mapreduce.Partitioner)?
>>>>>>>
>>>>>>> In the bulk-import examples the call 
>>>>>>> HFileOutputFormat2.configureIncrementalLoadMap
>>>>>>> that sets a series of job parameters (like partitioner, mapper, 
>>>>>>> reducers,
>>>>>>> etc) -> http://pastebin.com/8VXjYAEf.
>>>>>>> The full code of it can be seen at https://github.com/apache/h
>>>>>>> base/blob/master/hbase-server/src/main/java/org/apache/hadoo
>>>>>>> p/hbase/mapreduce/HFileOutputFormat2.java.
>>>>>>>
>>>>>>> Do you think there's any change to make it run in flink?
>>>>>>>
>>>>>>> Best,
>>>>>>> Flavio
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>
> --
> Flavio Pompermaier
> Development Department
>
> OKKAM S.r.l.
> Tel. +(39) 0461 041809 <+39%200461%20041809>
>

Re: Hadoop compatibility and HBase bulk loading

Reply via email to