Folks on the list need some time mate. I have specified a couple of links on the other thread of yours. Check it out and see if it helps.
Best Regards, Tariq +91-9741563634 https://mtariq.jux.com/ On Tue, Dec 25, 2012 at 11:09 AM, Kshiva Kps <[email protected]> wrote: > Hi, > > Is there any PIG editors and where we can write 100 to 150 pig scripts > I'm believing is not possible to do in CLI mode . > Like IDE for JAVA /TOAD for SQL pls advice , many thanks > > Thnaks > > > On Tue, Dec 25, 2012 at 3:45 AM, Cheolsoo Park <[email protected] > >wrote: > > > Hi Thomas, > > > > If I understand your question correctly, what you want is reduce the > number > > of mappers that spawn streaming processes. The default-parallel controls > > the number of reducers, so it won't have any effect to the number of > > mappers. Although the number of mappers is auto-determined by the size of > > input data, you can try to set "pig.maxCombinedSplitSize" to combine > input > > files into bigger ones. For more details, please refer to: > > http://pig.apache.org/docs/r0.10.0/perf.html#combine-files > > > > You can also read a discussion on a similar topic here: > > > > > http://search-hadoop.com/m/J5hCw1UdxTa/How+can+I+set+the+mapper+number&subj=How+can+I+set+the+mapper+number+for+pig+script+ > > > > Thanks, > > Cheolsoo > > > > > > On Tue, Dec 18, 2012 at 12:00 PM, Thomas Bach > > <[email protected]>wrote: > > > > > Hi, > > > > > > I have around 4 million time series. ~1000 of them had a special > > > occurrence at some point. Now, I want to draw 10 samples for each > > > special time-series based on a similarity comparison. > > > > > > What I have currently implemented is a script in Python which consumes > > > time-series one-by-one and does a comparison with all 1000 special > > > time-series. If the similarity is sufficient with one of them I pass > > > it back to Pig and strike out the according special time-series, > > > subsequent time-series will not be compared against this one. > > > > > > This routine runs, but it lasts around 6 hours. > > > > > > One of the problems I'm facing is that Pig starts >160 scripts > > > although 10 would be sufficient. Is there some way to define the > > > number of scripts Pig starts in a `STREAM THROUGH` step? I tried to > > > set default_parallel to 10, but it doesn't seem to have any effect. > > > > > > I'm also open to any other ideas on how to accomplish the task. > > > > > > Regards, > > > Thomas Bach. > > > > > >
