thank you Daniel.

follow  up question: is there any reasosn why bzip is processed by pig but
gzip is processed in Hadoop?

thanks, Tomas

On Mon, May 18, 2015 at 8:35 AM, Daniel Dai <[email protected]> wrote:

> The umcompression of gzip is on Hadoop side (TextInputFormat), if Hadoop
> fixed concatenated gzip, Pig should be fixed as well. Bzip however, is
> processed by Pig code, that does not support concatenation.
>
> It seems we need to update the documentation.
>
> Daniel
>
> On 5/5/15, 3:51 AM, "Tomas Hudik" <[email protected]> wrote:
>
> >Hi,
> >I read a section:
> >https://pig.apache.org/docs/r0.11.1/func.html#handling-compression
> >
> >according to which any concatenated bzip/gzip files will produce strange
> >results.
> >I did a test - concatenated some files and processed them. However, all
> >the
> >results were identical to ones that were produces on non-concatenated
> >files. Why? They should be different...
> >
> >Then I saw: https://issues.apache.org/jira/i#browse/HADOOP-6835
> >
> >My questions:
> >1. is https://pig.apache.org/docs/r0.11.1/func.html#handling-compression
> >still correct and concatenation will produce wrong results? Is this true
> >for any concatenated files or it might happanes once a time
> >2. is there any way how to find out whether tar.gz or tar.bz2 is
> >concatenated?
>
>

Reply via email to