Thanks everyone for helping me out, I figured it was one of those logical
errors which lead to infinite loops. Actually indexof operation doesnt
always return -1 on failure which was causing this to get into infinite
loop (I should have thought about this). (ie. indexof('[', 187) would
return 187 a
This is a map side udf.
pig script loads a log file and grabs contents inside angle brackets.
a = load; b = foreach a generate F(a); dump b;
I see following on tasktrackers-
2011-02-23 18:01:25,992 INFO
org.apache.pig.impl.util.SpillableMemoryManager: first memory handler call
- Collection thresho
Hi, Aniket,
What is your Pig script? Is the UDF in map side or reduce side?
Daniel
Dmitriy Ryaboy wrote:
That's a max of 3.3K single-character strings. Even with the java overhead
that shouldn't be more than a meg right?
none of these should make it out of young gen assuming the list "cats"
doe
That's a max of 3.3K single-character strings. Even with the java overhead
that shouldn't be more than a meg right?
none of these should make it out of young gen assuming the list "cats"
doesn't stick around outside the udf.
On Thu, Feb 24, 2011 at 3:49 PM, Aniket Mokashi wrote:
> Hi Jai,
>
> Tha
Hi Jai,
Thanks for your email. I suspect that its the Strings in tight loop reason
as you have suggested. I have a loop in my udf that does the following.
while((startInd = someLog.indexOf('[',startInd)) > 0) {
endInd = someLog.indexOf(']', startInd);
Sharing the code would be useful as mentioned. Also of help would the heap
settings that the JVM had.
However, off the top of my head, one common situation (esp. in text
processing/tokenizing) is instantiating Strings in a tight loop.
Besides you could also exercise your UDF in a local JVM and
Aniket, share the code?
It really depends on how you create them.
-D
On Wed, Feb 23, 2011 at 7:49 PM, Aniket Mokashi wrote:
> I ve written a simple UDF that parses a chararray (which looks like
> ...[a].[b]...[a]...) to capture stuff inside brackets and return them
> as String a=2;b=1; and s
I ve written a simple UDF that parses a chararray (which looks like
...[a].[b]...[a]...) to capture stuff inside brackets and return them
as String a=2;b=1; and so on. The input chararray are rarely more than
1000 characters and are not more than 10 (I ve added log.warn in my
udf to ensure