Sure 'nuff!

https://issues.apache.org/jira/browse/PIG-4548

Thanks!

On Mon, May 18, 2015 at 3:01 PM, Daniel Dai <[email protected]> wrote:

> Sounds like bug. Can you open a Jira ticket?
>
> Daniel
>
> From: Steve Terrell <[email protected]<mailto:[email protected]>>
> Reply-To: "[email protected]<mailto:[email protected]>" <
> [email protected]<mailto:[email protected]>>
> Date: Thursday, April 30, 2015 at 3:29 PM
> To: "[email protected]<mailto:[email protected]>" <[email protected]
> <mailto:[email protected]>>
> Subject: Missing Records Bug(?) in Pig 0.14.0 With Specific Combination of
> Commands and Streaming Function
>
> I think I found a bug in versions 0.12.0 through 0.14.0.  I've been trying
> to get passed it all day.  Can someone please confirm?
>
> The below is the bare minimum I was able to extract from my original
> problem to in order to demonstrate the bug.  So, don't expect the following
> code to serve any practical purpose.  :)
>
> My input file (test_in) is two columns with a tab delimiter:
> 1   F
> 2   F
>
> My streaming function (sf.py) ignores the actual input and simply
> generates 2 records:
> #!/usr/bin/python
> if __name__ == '__main__':
>     print 'x'
>     print 'y'
> (But I should mention that in my original problem the input to output was
> one-to-one.  I just ignored the input here to get to the bare minimum
> effect.)
>
> My pig script:
> MY_INPUT = load 'test_in' as ( f1, f2);
> split MY_INPUT into T if (f2 == 'T'), F otherwise;
> T2 = group T by f1;
> store T2 into 'test_out/T2';
> F2 = group F by f1;
> store F2 into 'test_out/F2';  -- (this line is actually optional to demo
> the bug)
> F3 = stream F2 through `sf.py`;
> store F3 into 'test_out/F3';
>
> My expected output for test/out/F3 is two records that come directly from
> sf.py:
> x
> y
>
> However, I only get:
> x
>
> I've tried all of the following to get the expected behavior:
>
>   *   upgraded Pig from 0.12.0 to 0.14.0
>   *   local vs. distributed mode
>   *   flush sys.stdout in the streaming function
>   *   replace sf.py with sf.sh which is a bash script that used "echo x;
> echo y" to do the same thing.  In this case, the final contents of
> test_out/F# would vary - sometimes I would get both x and y, and sometimes
> I would just get x.
>
> Aside from removing the one Pig line that I've marked optional, any other
> attempts to simplify the Pig script or input file causes the bug to not
> manifest.
>
> I've attached the logs from running "pig -x local test.pig"
>
> Thanks,
>     Steve
>

Reply via email to