Sounds like bug. Can you open a Jira ticket?

Daniel

From: Steve Terrell <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Thursday, April 30, 2015 at 3:29 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Missing Records Bug(?) in Pig 0.14.0 With Specific Combination of 
Commands and Streaming Function

I think I found a bug in versions 0.12.0 through 0.14.0.  I've been trying to 
get passed it all day.  Can someone please confirm?

The below is the bare minimum I was able to extract from my original problem to 
in order to demonstrate the bug.  So, don't expect the following code to serve 
any practical purpose.  :)

My input file (test_in) is two columns with a tab delimiter:
1   F
2   F

My streaming function (sf.py) ignores the actual input and simply generates 2 
records:
#!/usr/bin/python
if __name__ == '__main__':
    print 'x'
    print 'y'
(But I should mention that in my original problem the input to output was 
one-to-one.  I just ignored the input here to get to the bare minimum effect.)

My pig script:
MY_INPUT = load 'test_in' as ( f1, f2);
split MY_INPUT into T if (f2 == 'T'), F otherwise;
T2 = group T by f1;
store T2 into 'test_out/T2';
F2 = group F by f1;
store F2 into 'test_out/F2';  -- (this line is actually optional to demo the 
bug)
F3 = stream F2 through `sf.py`;
store F3 into 'test_out/F3';

My expected output for test/out/F3 is two records that come directly from sf.py:
x
y

However, I only get:
x

I've tried all of the following to get the expected behavior:

  *   upgraded Pig from 0.12.0 to 0.14.0
  *   local vs. distributed mode
  *   flush sys.stdout in the streaming function
  *   replace sf.py with sf.sh which is a bash script that used "echo x; echo 
y" to do the same thing.  In this case, the final contents of test_out/F# would 
vary - sometimes I would get both x and y, and sometimes I would just get x.

Aside from removing the one Pig line that I've marked optional, any other 
attempts to simplify the Pig script or input file causes the bug to not 
manifest.

I've attached the logs from running "pig -x local test.pig"

Thanks,
    Steve

Reply via email to