Thank you guys for the information. -cheers Nishan
On Wed, Oct 8, 2014 at 12:49 PM, Andrey Stepachev <oct...@gmail.com> wrote: > For that use case I'd prefer to write new filtered HFiles with map reduce > and then import those data into hbase using bulk import. Keep in mind, that > incremental load tool moves files, not copies them. So once written you > will not do any additional writes (except for those regions which was split > while you filtering data). If importing data is small that would not be a > problem. > > On Wed, Oct 8, 2014 at 8:45 PM, Nishanth S <nishanth.2...@gmail.com> > wrote: > > > Thanks Andrey.In the current system the hbase cfs have a ttl of 30 days > > and data gets deleted after this(has snappy compression).Below is > something > > what I am trying to acheive. > > > > 1.Export the data from hbase table before it gets deleted. > > 2.Store it in some format which supports maximum compression(storage > cost > > is my primary concern here),so looking at parquet. > > 3.Load a subset of this data back into hbase based on certain rules(say > i > > want to load all rows which has a particular string in one of the > fields). > > > > > > I was thinking of bulkloading this data back into hbase but I am not sure > > how I can load a subset of the data using > > org.apache.hadoop.hbase.mapreduce.Driver > > import. > > > > > > > > > > > > > > On Wed, Oct 8, 2014 at 10:20 AM, Andrey Stepachev <oct...@gmail.com> > > wrote: > > > > > Hi Nishanth. > > > > > > Not clear what exactly you are building. > > > Can you share more detailed description of what you are building, how > > > parquet files are supposed to be ingested. > > > Some questions arise: > > > 1. is that online import or bulk load > > > 2. why rules need to be deployed to cluster. Do you suppose to do > reading > > > inside hbase region server? > > > > > > As for deploying filters your cat try to use coprocessors instead. They > > can > > > be configurable and loadable (but not > > > unloadable, so you need to think about some class loading magic like > > > ClassWorlds) > > > For bulk imports you can create HFiles directly and add them > > incrementally: > > > http://hbase.apache.org/book/arch.bulk.load.html > > > > > > On Wed, Oct 8, 2014 at 8:13 PM, Nishanth S <nishanth.2...@gmail.com> > > > wrote: > > > > > > > I was thinking of using org.apache.hadoop.hbase.mapreduce.Driver > > import. > > > I > > > > could see that we can pass in filters to this utility but looks less > > > > flexible since you need to deploy a new filter every time the rules > > for > > > > processing records change.Is there some way that we could define a > > rules > > > > engine? > > > > > > > > > > > > Thanks, > > > > -Nishan > > > > > > > > On Wed, Oct 8, 2014 at 9:50 AM, Nishanth S <nishanth.2...@gmail.com> > > > > wrote: > > > > > > > > > Hey folks, > > > > > > > > > > I am evaluating on loading an hbase table from parquet files > based > > on > > > > > some rules that would be applied on parquet file records.Could > some > > > one > > > > > help me on what would be the best way to do this?. > > > > > > > > > > > > > > > Thanks, > > > > > Nishan > > > > > > > > > > > > > > > > > > > > > -- > > > Andrey. > > > > > > > > > -- > Andrey. >