Re: Hadoop compatibility and HBase bulk loading

Flavio Pompermaier Fri, 12 Jan 2018 16:35:22 -0800

Any progress on this Fabian? HBase bulk loading is a common task for us and
it's very annoying and uncomfortable to run a separate YARN job to
accomplish it...


On 10 Apr 2015 12:26, "Flavio Pompermaier" <pomperma...@okkam.it> wrote:

Great! That will be awesome.
Thank you Fabian

On Fri, Apr 10, 2015 at 12:14 PM, Fabian Hueske <fhue...@gmail.com> wrote:

> Hmm, that's a tricky question ;-) I would need to have a closer look. But
> getting custom comparators for sorting and grouping into the Combiner is
> not that trivial because it touches API, Optimizer, and Runtime code.
> However, I did that before for the Reducer and with the recent addition of
> groupCombine the Reducer changes might be just applied to combine.
>
> I'll be gone next week, but if you want to, we can have a closer look at
> the problem after that.
>
> 2015-04-10 12:07 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>:
>
>> I think I could also take care of it if somebody can help me and guide me
>> a little bit..
>> How long do you think it will require to complete such a task?
>>
>> On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <fhue...@gmail.com>
>> wrote:
>>
>>> We had an effort to execute any HadoopMR program by simply specifying
>>> the JobConf and execute it (even embedded in regular Flink programs).
>>> We got quite far but did not complete (counters and custom grouping /
>>> sorting functions for Combiners are missing if I remember correctly).
>>> I don't think that anybody is working on that right now, but it would
>>> definitely be a cool feature.
>>>
>>> 2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>:
>>>
>>>> Hi guys,
>>>>
>>>> I have a nice question about Hadoop compatibility.
>>>> In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html
>>>> you say that you can reuse existing mapreduce programs.
>>>> Could it be possible to manage also complex mapreduce programs like
>>>> HBase BulkImport that use for example a custom partioner
>>>> (org.apache.hadoop.mapreduce.Partitioner)?
>>>>
>>>> In the bulk-import examples the call 
>>>> HFileOutputFormat2.configureIncrementalLoadMap
>>>> that sets a series of job parameters (like partitioner, mapper, reducers,
>>>> etc) -> http://pastebin.com/8VXjYAEf.
>>>> The full code of it can be seen at https://github.com/apache/
>>>> hbase/blob/master/hbase-server/src/main/java/org/
>>>> apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java.
>>>>
>>>> Do you think there's any change to make it run in flink?
>>>>
>>>> Best,
>>>> Flavio
>>>>
>>>
>>>
>>
>

Re: Hadoop compatibility and HBase bulk loading

Reply via email to