Re: Unable to access Map within a tuple.

2011-03-17 Thread Daniel Dai
Hi, Deepak, Can you be more specific? I did some simple test and cannot reproduce. What is your query? UDF? Daniel On 03/16/2011 11:24 PM, deepak kumar v wrote: Hi, Below are list of tuples generated after flattening a bag . (day, age, name, address, ['k1#v1','k2#v2']), (12/2,22,deepak,newy

Re: pig's alike projects

2011-03-17 Thread Charles Gonçalves
Yes indeed. But I will just spot on drawback and stick on it. Anyway, once PIG is not the main theme in my msc I can just say 'Because I wanted!" And Baraa, Good Luck to you too! On Thu, Mar 17, 2011 at 11:22 PM, Baraa Mohamad < baraa.issa.moha...@gmail.com> wrote: > > hehehe you are right Dmit

Re: pig's alike projects

2011-03-17 Thread Baraa Mohamad
hehehe you are right Dmitriy Ryaboy this is an essential part of my PhD thesis and I'm still searching :) Good Luck Baraa On Fri, Mar 18, 2011 at 3:11 AM, Dmitriy Ryaboy wrote: > That sounds like a Master's Thesis in itself :) > > > On Thu, Mar 17, 2011 at 7:06 PM, Charles Gonçalves > wrote: >

Re: pig's alike projects

2011-03-17 Thread Dmitriy Ryaboy
That sounds like a Master's Thesis in itself :) On Thu, Mar 17, 2011 at 7:06 PM, Charles Gonçalves wrote: > Hi Guys, > > Thank you, once I'm using pig in my msc I just want to be aware of > possible > 'competidors' if someone argue about why I choose pig and not X. > > On Thu, Mar 17, 2011 at 11

Re: pig's alike projects

2011-03-17 Thread Charles Gonçalves
Hi Guys, Thank you, once I'm using pig in my msc I just want to be aware of possible 'competidors' if someone argue about why I choose pig and not X. On Thu, Mar 17, 2011 at 11:04 PM, Baraa Mohamad < baraa.issa.moha...@gmail.com> wrote: > Hi > Yes you have > Hive developed by Facebook , > SCOPE

Re: pig's alike projects

2011-03-17 Thread Baraa Mohamad
Hi Yes you have Hive developed by Facebook , SCOPE and DryadLINQ by Microsoft , Jaql by IBM and I heard about something called ASTERIX/AQL but I dont know anything about it Regards Baraa On Fri, Mar 18, 2011 at 2:19 AM, Charles Gonçalves wrote: > Hi Guys, > > I read the sawzall

Re: pig's alike projects

2011-03-17 Thread Dmitriy Ryaboy
There's an IBM research project called JAQL; DryadLINQ can also be considered somewhat similar. MS also has "scope". Nathan Marz has been pushing his Cascalog library, that builds on top of the Cascading library by Chris Wensel. Hive of course is also similar (but different). On Thu, Mar 17, 2011

Re: pig's alike projects

2011-03-17 Thread Baraa Mohamad
Hi, Yes you have Hive developed by Facebook , SCOPE and DryadLINQ by Microsoft , Jaql by IBM and I heard about something called ASTERIX/AQL but I dont know anything about it Regards On Fri, Mar 18, 2011 at 2:19 AM, Charles Gonçalves wrote: > Hi Guys, > > I read the sawzall

Re: Schema?

2011-03-17 Thread Daniel Dai
In 0.9, you can use the syntax: m:[{(c:chararray, m1:[chararray])}] Daniel On 03/17/2011 09:18 AM, Alan Gates wrote: Currently there is no way to specify the schema for values in the map up front. You have to cast them when you bring them out of the map. We hope to resolve that in 0.9. Alan.

pig's alike projects

2011-03-17 Thread Charles Gonçalves
Hi Guys, I read the sawzall paper today and wonder if there are any others systems like pig and sawzall? Did anyone know others projects ? Thanks -- *Charles Ferreira Gonçalves * http://homepages.dcc.ufmg.br/~charles/ UFMG - ICEx - Dcc Cel.: 55 31 87

Re: reducer throttling?

2011-03-17 Thread Alex Rovner
Dexin, You can control the amount of reducers by adding the following in your pig script: SET default_parallel 29; Pig will run with 29 reducers with the above statement. As far as the bulk insert goes: We are using MS-SQL as our database, but MySQL would be able to handle the bulk insert the

Re: reducer throttling?

2011-03-17 Thread Dexin Wang
Can you describe a bit more about your bulk insert technique? And the way you control the number of reducers is also by adding artificial ORDER or GROUP step? Thanks! On Thu, Mar 17, 2011 at 1:33 PM, Alex Rovner wrote: > We use bulk insert technique after the job completes. You can control the

Re: exec doesnot stop multiquery optimizer

2011-03-17 Thread Richard Ding
Hi Shawn, Inserting "exec" in a script forces the Pig statements before "exec" to run. But if there is no store before the "exec" statement, it becomes a no-op. If you want to disable multiquery with "exec" statement, just add "exec" after each store. This way Pig will independently execute eac

Re: reducer throttling?

2011-03-17 Thread Alex Rovner
We use bulk insert technique after the job completes. You can control the amount of each bulk insert by controlling the amount of reducers. Sent from my iPhone On Mar 17, 2011, at 2:03 PM, Dexin Wang wrote: > We do some processing in hadoop then as the last step, we write the result > to dat

Re: exec doesnot stop multiquery optimizer

2011-03-17 Thread Xiaomeng Wan
Hi Richard, In my case, there is no store or dump involved. My code is like: a = ... b = ... c = ... EXEC; d = ... e = ... ... By adding the "EXEC;", i try to stop multiquery optimizer from combining all a, b, c, d, e into a single MR plan. In other words, do a, b, c in one MR and d, e in anothe

Re: question about Pig UDF

2011-03-17 Thread Alan Gates
Yes. Are you looking for a way to share info across multiple instances of the UDF? You can share static information via UDFContext. If you want to share state while running that is very difficult and not recommended (since there is no guarantee that your various map or reduce instances

Re: question about Pig UDF

2011-03-17 Thread souri datta
so if i make the list static also, it will be created multiple times as each instance will be created in different machine's JVM . is that correct? On Thu, Mar 17, 2011 at 11:59 PM, Alan Gates wrote: > It will be instantiated multiple times; once for each map or reduce > (depending on which it

Re: com.twitter.elephantbird.mapreduce.input.LzoThriftB64LineInputFormat is not set

2011-03-17 Thread Torben Brodt
the exception still says that the configuration for the classname is not set. i added "print" to ThriftUtils:setClassConf (did not setup pig in eclipse yet) the variable is set to my thrift class, but it cannot be accessed later. seems to be lost somewhere, what can be the reason? INFO org.apach

Re: question about Pig UDF

2011-03-17 Thread Alan Gates
It will be instantiated multiple times; once for each map or reduce (depending on which it is in). Pig itself also constructs your UDF during planning on the machine you launch your job on. Alan. On Mar 17, 2011, at 11:12 AM, souri datta wrote: Hi, If in a UDF , say in the constructo

Re: exec doesnot stop multiquery optimizer

2011-03-17 Thread Richard Ding
What you want is the 'run' command. With the run command, every store triggers execution. Thanks -- Richard On 3/17/11 10:48 AM, "Xiaomeng Wan" wrote: Hi, I tried to use exec to stop multiquery optimizer from combining too many actions together, which will result in heap space problem. But i

question about Pig UDF

2011-03-17 Thread souri datta
Hi, If in a UDF , say in the constructor of the class, i initialize a list (say ArrayList namesList) of objects(say names). And in the exec() method , I do some processing. When I am using this udf in a 20 node hadoop cluster, will this list 'nameList' be instantiated multiple times or will

reducer throttling?

2011-03-17 Thread Dexin Wang
We do some processing in hadoop then as the last step, we write the result to database. Database is not good at handling hundreds of concurrent connections and fast writes. So we need to throttle down the number of tasks that writes to DB. Since we have no control on the number of mappers, we add a

exec doesnot stop multiquery optimizer

2011-03-17 Thread Xiaomeng Wan
Hi, I tried to use exec to stop multiquery optimizer from combining too many actions together, which will result in heap space problem. But it seems multiquery just ignores exec, and still combines actions before and after exec together. Disable multiquery works for me, just wondering whether it is

Re: Schema?

2011-03-17 Thread Alan Gates
Currently there is no way to specify the schema for values in the map up front. You have to cast them when you bring them out of the map. We hope to resolve that in 0.9. Alan. On Mar 17, 2011, at 2:11 AM, deepak kumar v wrote: I have a UDF , the output is a tuple of the following format

Schema?

2011-03-17 Thread deepak kumar v
I have a UDF , the output is a tuple of the following format ( [ 'Key'#{ ( chararray,[ 'key', chararray]) }] ) I am able to specify output schema for the outer tuple and inner Map. I need to specify schema for the key , ValueBag within the map and schema for tuples within ValueBag. And items wit