Ted Yu, You understood wrong, i said Incremental load from HBase to Hive, individually you can say Incremental Import from HBase.
On Wed, Dec 21, 2016 at 10:04 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Incremental load traditionally means generating hfiles and > using org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles to load the > data into hbase. > > For your use case, the producer needs to find rows where the flag is 0 or > 1. > After such rows are obtained, it is up to you how the result of processing > is delivered to hbase. > > Cheers > > On Wed, Dec 21, 2016 at 8:00 AM, Chetan Khatri < > chetan.opensou...@gmail.com> wrote: > >> Ok, Sure will ask. >> >> But what would be generic best practice solution for Incremental load >> from HBASE. >> >> On Wed, Dec 21, 2016 at 8:42 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> I haven't used Gobblin. >>> You can consider asking Gobblin mailing list of the first option. >>> >>> The second option would work. >>> >>> >>> On Wed, Dec 21, 2016 at 2:28 AM, Chetan Khatri < >>> chetan.opensou...@gmail.com> wrote: >>> >>>> Hello Guys, >>>> >>>> I would like to understand different approach for Distributed >>>> Incremental load from HBase, Is there any *tool / incubactor tool* which >>>> satisfy requirement ? >>>> >>>> *Approach 1:* >>>> >>>> Write Kafka Producer and maintain manually column flag for events and >>>> ingest it with Linkedin Gobblin to HDFS / S3. >>>> >>>> *Approach 2:* >>>> >>>> Run Scheduled Spark Job - Read from HBase and do transformations and >>>> maintain flag column at HBase Level. >>>> >>>> In above both approach, I need to maintain column level flags. such as >>>> 0 - by default, 1-sent,2-sent and acknowledged. So next time Producer will >>>> take another 1000 rows of batch where flag is 0 or 1. >>>> >>>> I am looking for best practice approach with any distributed tool. >>>> >>>> Thanks. >>>> >>>> - Chetan Khatri >>>> >>> >>> >> >