Re: Limit number of Streaming Programs

Prasanth J Tue, 25 Dec 2012 04:47:44 -0800

Hi Kshiva

There are several pig latin plugins for different IDEs/Editors. Checkout 
https://cwiki.apache.org/PIG/pigtools.html


Thanks
-- Prasanth

On Dec 25, 2012, at 11:09 AM, Kshiva Kps <[email protected]> wrote:

> Hi,
> 
> Is there any PIG editors and where we can write 100 to 150 pig scripts
> I'm believing is not possible to  do in CLI mode .
> Like IDE for JAVA /TOAD for SQL pls advice , many thanks
> 
> Thnaks
> 
> 
> On Tue, Dec 25, 2012 at 3:45 AM, Cheolsoo Park <[email protected]>wrote:
> 
>> Hi Thomas,
>> 
>> If I understand your question correctly, what you want is reduce the number
>> of mappers that spawn streaming processes. The default-parallel controls
>> the number of reducers, so it won't have any effect to the number of
>> mappers. Although the number of mappers is auto-determined by the size of
>> input data, you can try to set "pig.maxCombinedSplitSize" to combine input
>> files into bigger ones. For more details, please refer to:
>> http://pig.apache.org/docs/r0.10.0/perf.html#combine-files
>> 
>> You can also read a discussion on a similar topic here:
>> 
>> http://search-hadoop.com/m/J5hCw1UdxTa/How+can+I+set+the+mapper+number&subj=How+can+I+set+the+mapper+number+for+pig+script+
>> 
>> Thanks,
>> Cheolsoo
>> 
>> 
>> On Tue, Dec 18, 2012 at 12:00 PM, Thomas Bach
>> <[email protected]>wrote:
>> 
>>> Hi,
>>> 
>>> I have around 4 million time series. ~1000 of them had a special
>>> occurrence at some point. Now, I want to draw 10 samples for each
>>> special time-series based on a similarity comparison.
>>> 
>>> What I have currently implemented is a script in Python which consumes
>>> time-series one-by-one and does a comparison with all 1000 special
>>> time-series. If the similarity is sufficient with one of them I pass
>>> it back to Pig and strike out the according special time-series,
>>> subsequent time-series will not be compared against this one.
>>> 
>>> This routine runs, but it lasts around 6 hours.
>>> 
>>> One of the problems I'm facing is that Pig starts >160 scripts
>>> although 10 would be sufficient. Is there some way to define the
>>> number of scripts Pig starts in a `STREAM THROUGH` step? I tried to
>>> set default_parallel to 10, but it doesn't seem to have any effect.
>>> 
>>> I'm also open to any other ideas on how to accomplish the task.
>>> 
>>> Regards,
>>>        Thomas Bach.
>>> 
>>

Re: Limit number of Streaming Programs

Reply via email to