RE: How to find input file associated with failed map task?

2011-02-16 Thread Vivek Padmanabhan
This is a common concern. There is a MR jira raised for the same. https://issues.apache.org/jira/browse/MAPREDUCE-2076 One way I use to find which inputs went to map task is as below, : a) Get the input spit locations from the task log; b) Got to the location and from data node logs grep for the a

Re: FLATTEN custom bags

2011-02-16 Thread Daniel Dai
Hi, Aniket, Does myLoader implements LoadMetaData? If it does, what schema it returns? I suspect that your schema for bag does not set twolevelaccess flag (though we are working to drop it in 0.9). Daniel Aniket Mokashi wrote: Hi, I have a custom loader that creates and returns a tuple of i

FLATTEN custom bags

2011-02-16 Thread Aniket Mokashi
Hi, I have a custom loader that creates and returns a tuple of id, bags. I want to open these bags and get their contents. For example- data = load 'loc' using myLoader() as (id, bag1, bag2); bag1Content = foreach data generate FLATTEN(bag1); This works. But when I do bag1Content = foreach data g

Problems with union, projection producing unexpected results

2011-02-16 Thread James Kebinger
Hello all, I've been scratching my head over a problem with a pig script I'm having, and hoping another set of eyeballs will help. I'm using pig 0.8, in local mode Here's my simplified use case: I have a log file with events on pages, and the id of the event can be a users login or a users numeri

Re: Multi column Left join in Pig

2011-02-16 Thread sonia gehlot
Ok thanks On Wed, Feb 16, 2011 at 3:35 PM, Ramesh, Amit wrote: > > No, joins are only possible on fields common to all the aliases in the > join. > > > On 2/16/11 2:56 PM, "sonia gehlot" wrote: > > > Thank you very much Amit. > > > > One more question in the same way if I want to join multiple

Re: Multi column Left join in Pig

2011-02-16 Thread Ramesh, Amit
No, joins are only possible on fields common to all the aliases in the join. On 2/16/11 2:56 PM, "sonia gehlot" wrote: > Thank you very much Amit. > > One more question in the same way if I want to join multiple tables > >> select blah, blah >> From >> page_events pe > Left Join referrer r

Re: Multi column Left join in Pig

2011-02-16 Thread sonia gehlot
Thank you very much Amit. One more question in the same way if I want to join multiple tables > select blah, blah > From > page_events pe Left Join referrer ref on ref.id = pe.id > Left Join page_events pe_pre > on pe.day = pe_pre.day > And pe.session_id = pe_pre.session_id > And pe.pag

Re: Multi column Left join in Pig

2011-02-16 Thread Ramesh, Amit
You can just do: join_pe_pre = JOIN page_events BY (day, session_id, page_seq_num) LEFT OUTER, page_events_pre BY (day, session_id, page_seq_num + 1); Amit On 2/16/11 2:09 PM, "sonia gehlot" wrote: > Hi All, > > I am new to Hadoop and I started exploring Pig since last month. I have few > q

Multi column Left join in Pig

2011-02-16 Thread sonia gehlot
Hi All, I am new to Hadoop and I started exploring Pig since last month. I have few question I have to replicate some SQL query to Pig that has left join for example: select blah, blah From page_events pe Left Join page_events pe_pre on pe.day = pe_pre.day And pe.session_id = pe_pre.session_id An

Re: Pig 0.8: DESCRIBE and DUMP are in disagreement after a GROUP BY and a FLATTEN

2011-02-16 Thread Ramesh, Amit
Thanks for the info, guys! Will look into using a recent snapshot. Thanks! Amit On 2/16/11 11:46 AM, "Daniel Dai" wrote: > Yes, it is fixed by PIG-998. Doing a describe on trunk will get: > > data: {f0: chararray,b1::t1: (f1: chararray,f2: int),b3: {(f3: chararray)}} > > Daniel > > Alan Ga

Re: Pig 0.8: DESCRIBE and DUMP are in disagreement after a GROUP BY and a FLATTEN

2011-02-16 Thread Daniel Dai
Yes, it is fixed by PIG-998. Doing a describe on trunk will get: data: {f0: chararray,b1::t1: (f1: chararray,f2: int),b3: {(f3: chararray)}} Daniel Alan Gates wrote: The issue here is that describe is incorrectly removing the second level of tuple, even though dump is doing the right thing.

Re: LzoTokenizedStorage working, but now I can't get the data back out with LzoTokenizedLoader

2011-02-16 Thread Dmitriy Ryaboy
"no codec" means you didn't get LZO set up right on the cluster. There are instructions on the wiki of the googlecode project for hadoop lzo. D On Wed, Feb 16, 2011 at 11:05 AM, Kris Coward wrote: > > After a bunch of fiddling around (including some pretty heavy use of the > secretDebugCmd--than

LzoTokenizedStorage working, but now I can't get the data back out with LzoTokenizedLoader

2011-02-16 Thread Kris Coward
After a bunch of fiddling around (including some pretty heavy use of the secretDebugCmd--thanks), I finally got the LzoTokenizedStorage working, but now I'm having problems with the LzoTokenizedLoader. I'm still using pig 0.8.0-CDH3B4-SNAPSHOT, and for storage, have only seemed to have luck with

How to find input file associated with failed map task?

2011-02-16 Thread Kester, Scott
This may be better asked on one of the other hadoop lists, but as the job in question is done with Pig I thought I would start here. I have a nightly job that runs against around 1000 gzip log files. Around once a week one of the map tasks will fail reporting some form of gzip error/corruption