Re: Loading hbase from parquet files

Nishanth S Wed, 08 Oct 2014 14:21:07 -0700

Thank you guys for the information.

-cheers
Nishan


On Wed, Oct 8, 2014 at 12:49 PM, Andrey Stepachev <oct...@gmail.com> wrote:

> For that use case I'd prefer to write new filtered HFiles with map reduce
> and then import those data into hbase using bulk import. Keep in mind, that
> incremental load tool moves files, not copies them. So once written you
> will not do any additional writes (except for those regions which was split
> while you filtering data). If importing data is small that would not be a
> problem.
>
> On Wed, Oct 8, 2014 at 8:45 PM, Nishanth S <nishanth.2...@gmail.com>
> wrote:
>
> > Thanks Andrey.In the current system  the hbase cfs have a ttl of  30 days
> > and data gets deleted after this(has snappy compression).Below is
> something
> > what I am trying to acheive.
> >
> > 1.Export the data from hbase  table  before it gets deleted.
> > 2.Store it  in some format  which supports maximum compression(storage
> cost
> > is my primary concern here),so looking at parquet.
> > 3.Load a subset of this data back into hbase based on  certain rules(say
> i
> > want  to load all rows which has a particular string in one of the
> fields).
> >
> >
> > I was thinking of bulkloading this data back into hbase but I am not sure
> > how I can  load a subset of the data using
> > org.apache.hadoop.hbase.mapreduce.Driver
> > import.
> >
> >
> >
> >
> >
> >
> > On Wed, Oct 8, 2014 at 10:20 AM, Andrey Stepachev <oct...@gmail.com>
> > wrote:
> >
> > > Hi Nishanth.
> > >
> > > Not clear what exactly you are building.
> > > Can you share more detailed description of what you are building, how
> > > parquet files are supposed to be ingested.
> > > Some questions arise:
> > > 1. is that online import or bulk load
> > > 2. why rules need to be deployed to cluster. Do you suppose to do
> reading
> > > inside hbase region server?
> > >
> > > As for deploying filters your cat try to use coprocessors instead. They
> > can
> > > be configurable and loadable (but not
> > > unloadable, so you need to think about some class loading magic like
> > > ClassWorlds)
> > > For bulk imports you can create HFiles directly and add them
> > incrementally:
> > > http://hbase.apache.org/book/arch.bulk.load.html
> > >
> > > On Wed, Oct 8, 2014 at 8:13 PM, Nishanth S <nishanth.2...@gmail.com>
> > > wrote:
> > >
> > > > I was thinking of using org.apache.hadoop.hbase.mapreduce.Driver
> > import.
> > > I
> > > > could see that we can pass in filters  to this utility but looks less
> > > > flexible since  you need to deploy a new filter every time  the rules
> > for
> > > > processing records change.Is there some way that we could define a
> > rules
> > > > engine?
> > > >
> > > >
> > > > Thanks,
> > > > -Nishan
> > > >
> > > > On Wed, Oct 8, 2014 at 9:50 AM, Nishanth S <nishanth.2...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hey folks,
> > > > >
> > > > > I am evaluating on loading  an  hbase table from parquet files
> based
> > on
> > > > > some rules that  would be applied on  parquet file records.Could
> some
> > > one
> > > > > help me on what would be the best way to do this?.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Nishan
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Andrey.
> > >
> >
>
>
>
> --
> Andrey.
>

Re: Loading hbase from parquet files

Reply via email to