Sure 'nuff! https://issues.apache.org/jira/browse/PIG-4548
Thanks! On Mon, May 18, 2015 at 3:01 PM, Daniel Dai <[email protected]> wrote: > Sounds like bug. Can you open a Jira ticket? > > Daniel > > From: Steve Terrell <[email protected]<mailto:[email protected]>> > Reply-To: "[email protected]<mailto:[email protected]>" < > [email protected]<mailto:[email protected]>> > Date: Thursday, April 30, 2015 at 3:29 PM > To: "[email protected]<mailto:[email protected]>" <[email protected] > <mailto:[email protected]>> > Subject: Missing Records Bug(?) in Pig 0.14.0 With Specific Combination of > Commands and Streaming Function > > I think I found a bug in versions 0.12.0 through 0.14.0. I've been trying > to get passed it all day. Can someone please confirm? > > The below is the bare minimum I was able to extract from my original > problem to in order to demonstrate the bug. So, don't expect the following > code to serve any practical purpose. :) > > My input file (test_in) is two columns with a tab delimiter: > 1 F > 2 F > > My streaming function (sf.py) ignores the actual input and simply > generates 2 records: > #!/usr/bin/python > if __name__ == '__main__': > print 'x' > print 'y' > (But I should mention that in my original problem the input to output was > one-to-one. I just ignored the input here to get to the bare minimum > effect.) > > My pig script: > MY_INPUT = load 'test_in' as ( f1, f2); > split MY_INPUT into T if (f2 == 'T'), F otherwise; > T2 = group T by f1; > store T2 into 'test_out/T2'; > F2 = group F by f1; > store F2 into 'test_out/F2'; -- (this line is actually optional to demo > the bug) > F3 = stream F2 through `sf.py`; > store F3 into 'test_out/F3'; > > My expected output for test/out/F3 is two records that come directly from > sf.py: > x > y > > However, I only get: > x > > I've tried all of the following to get the expected behavior: > > * upgraded Pig from 0.12.0 to 0.14.0 > * local vs. distributed mode > * flush sys.stdout in the streaming function > * replace sf.py with sf.sh which is a bash script that used "echo x; > echo y" to do the same thing. In this case, the final contents of > test_out/F# would vary - sometimes I would get both x and y, and sometimes > I would just get x. > > Aside from removing the one Pig line that I've marked optional, any other > attempts to simplify the Pig script or input file causes the bug to not > manifest. > > I've attached the logs from running "pig -x local test.pig" > > Thanks, > Steve >
