Re: Pig HBase integration

2014-09-28 Thread Krishna Kalyan
Thank you so much Serega. Regards, Krishna On Sun, Sep 28, 2014 at 11:01 PM, Serega Sheypak wrote: > > https://pig.apache.org/docs/r0.11.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html > I'm not sure how does Pig HBaseStroage works. I suppose it would read all > data and then join i

Re: Pig HBase integration

2014-09-28 Thread Serega Sheypak
https://pig.apache.org/docs/r0.11.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html I'm not sure how does Pig HBaseStroage works. I suppose it would read all data and then join it as usual dataset. So you should get serious hbase perfomace degradation during read, you would get key-by-key

Re: Pig HBase integration

2014-09-28 Thread Krishna Kalyan
We actually have 2 data sets in HDFS, location (3-5 GB, approx 10 columns in each record) and weblog (2-3 TB, approx 50 columns in each record). We need to join the data sets using the locationId, which is in both the data-sets. We have 2 options: 1. Have both the data-sets in HDFS only and JOIN t

Re: Pig HBase integration

2014-09-28 Thread Serega Sheypak
store location to hdfs store weblog to hdfs join them use HBase bulk load tool to load join result to hbase. What's the reason to keep location dataset in hbase and weblogs in hdfs? You can expect data load perfomance improvement. For me it takes few minutes to bulk load 500.000.000 records to 10

Re: Pig HBase integration

2014-09-28 Thread Krishna Kalyan
Thanks Serega, Our usecase details: We have a location table which will be stored in HBase with locationID as the rowkey / Joinkey. We intend to join this table with a transactional WebLog file in HDFS (Expected size can be around 2TB). Joining query will be passed from Pig. Can we expect a perfor

Re: Pig HBase integration

2014-09-27 Thread Serega Sheypak
Depends on the datasets size and HBase workload. The best way is to do join in pig, store it and then use HBase bulk load tool. It's general recommendation. I have no idea about your task details 2014-09-27 7:32 GMT+04:00 Krishna Kalyan : > Hi, > We have a use case that involves ETL on data comin

Pig HBase integration

2014-09-26 Thread Krishna Kalyan
Hi, We have a use case that involves ETL on data coming from several different sources using pig. We plan to store the final output table in HBase. What will be the performance impact if we do a join with an external CSV table using pig?. Regards, Krishna

Re: Pig + Hbase integration

2012-10-29 Thread Manu S
Hi Jean, This issue had been solved by following the suggestions of Cheolsoo *1) ClassNotFoundError Even though you're "registering" jars in your script, they're not present in classpath. So you're seeing that ClassNotFound error. Can you try this? PIG_CLASSPATH=/hbase-0.94.1.jar:/ lib/zookeepe

Re: Pig + Hbase integration

2012-10-29 Thread Jean-Daniel Cryans
On Thu, Oct 25, 2012 at 7:44 AM, Manu S wrote: > Hi, > > I am using Pig-0.10.0 & hbase-0.94.2. > > I am trying to store the processed output to Hbase cluster using pig > script. > > I registered the required .jar and set the mapreduce and zookeeper > parameters within the script itself. > > *# cat

Re: LeaseException while extracting data via pig/hbase integration

2012-02-20 Thread Mikael Sitruk
gt;> (via Tom White) >> >> >> - Original Message - >> > From: Mikael Sitruk >> > To: user@hbase.apache.org; Andrew Purtell >> > Cc: >> > Sent: Wednesday, February 15, 2012 11:32 PM >> > Subject: Re: LeaseException while extrac

Re: LeaseException while extracting data via pig/hbase integration

2012-02-16 Thread Mikael Sitruk
(via Tom White) > > > - Original Message - > > From: Mikael Sitruk > > To: user@hbase.apache.org; Andrew Purtell > > Cc: > > Sent: Wednesday, February 15, 2012 11:32 PM > > Subject: Re: LeaseException while extracting data via pig/hbase > integ

Re: LeaseException while extracting data via pig/hbase integration

2012-02-16 Thread Andrew Purtell
Hein (via Tom White) - Original Message - > From: Mikael Sitruk > To: user@hbase.apache.org; Andrew Purtell > Cc: > Sent: Wednesday, February 15, 2012 11:32 PM > Subject: Re: LeaseException while extracting data via pig/hbase integration > > Andy hi > > Not sure what

Re: LeaseException while extracting data via pig/hbase integration

2012-02-15 Thread Mikael Sitruk
ds, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > > > > - Original Message - > > From: Jean-Daniel Cryans > > To: user@hbase.apache.org > > Cc: > > Sent: Wednesday, February 1

Re: LeaseException while extracting data via pig/hbase integration

2012-02-15 Thread Andrew Purtell
ginal Message - > From: Jean-Daniel Cryans > To: user@hbase.apache.org > Cc: > Sent: Wednesday, February 15, 2012 10:17 AM > Subject: Re: LeaseException while extracting data via pig/hbase integration > > You would have to grep the lease's id, in your first email it w

Re: LeaseException while extracting data via pig/hbase integration

2012-02-15 Thread Mikael Sitruk
Ok, I don't have this log anymore but since the problem was reproduced in other log (which i keep), here is the grep 2012-02-08 14:13:02,970 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: org.apache.hadoop.hbase.regionserver.LeaseException: lease '-6992210222685255354' does not exist

Re: LeaseException while extracting data via pig/hbase integration

2012-02-15 Thread Jean-Daniel Cryans
You would have to grep the lease's id, in your first email it was "-7220618182832784549". About the time it takes to process each row, I meant client (pig) side not in the RS. J-D On Tue, Feb 14, 2012 at 1:33 PM, Mikael Sitruk wrote: > Please see answer inline > Thanks > Mikael.S > > On Tue, Fe

Re: LeaseException while extracting data via pig/hbase integration

2012-02-14 Thread Mikael Sitruk
Please see answer inline Thanks Mikael.S On Tue, Feb 14, 2012 at 8:30 PM, Jean-Daniel Cryans wrote: > On Tue, Feb 14, 2012 at 2:01 AM, Mikael Sitruk > wrote: > > hi, > > Well no, i can't figure out what is the problem, but i saw that someone > > else had the same problem (see email: "LeaseExcept

Re: LeaseException while extracting data via pig/hbase integration

2012-02-14 Thread Jean-Daniel Cryans
On Tue, Feb 14, 2012 at 2:01 AM, Mikael Sitruk wrote: > hi, > Well no, i can't figure out what is the problem, but i saw that someone > else had the same problem (see email: "LeaseException despite high > hbase.regionserver.lease.period") > What can i tell is the following: > Last week the problem

Re: LeaseException while extracting data via pig/hbase integration

2012-02-14 Thread Mikael Sitruk
hi, Well no, i can't figure out what is the problem, but i saw that someone else had the same problem (see email: "LeaseException despite high hbase.regionserver.lease.period") What can i tell is the following: Last week the problem was consistent 1. I updated hbase.regionserver.lease.period=30

Re: LeaseException while extracting data via pig/hbase integration

2012-02-13 Thread Jean-Daniel Cryans
Late answer, did you figure it out? This exception happens when you don't use your scanner lease for more than the lease time (default one minute). AFAIK that didn't change, so maybe something else got slow? Or maybe some special configurations you had didn't make it during the upgrade? J-D On M

LeaseException while extracting data via pig/hbase integration

2012-02-06 Thread Mikael Sitruk
Hi all Recently I have upgraded my cluster from Hbase 0.90.1 to 0.90.4 (using cloudera from cdh3u0 to cdh3u2) Everything was ok till I ran pig extract on the new cluster, from the old cluster everything worked well. Now each time i run the extract in conjunction to other work performed on the clus