The umcompression of gzip is on Hadoop side (TextInputFormat), if Hadoop
fixed concatenated gzip, Pig should be fixed as well. Bzip however, is
processed by Pig code, that does not support concatenation.

It seems we need to update the documentation.

Daniel

On 5/5/15, 3:51 AM, "Tomas Hudik" <[email protected]> wrote:

>Hi,
>I read a section:
>https://pig.apache.org/docs/r0.11.1/func.html#handling-compression
>
>according to which any concatenated bzip/gzip files will produce strange
>results.
>I did a test - concatenated some files and processed them. However, all
>the
>results were identical to ones that were produces on non-concatenated
>files. Why? They should be different...
>
>Then I saw: https://issues.apache.org/jira/i#browse/HADOOP-6835
>
>My questions:
>1. is https://pig.apache.org/docs/r0.11.1/func.html#handling-compression
>still correct and concatenation will produce wrong results? Is this true
>for any concatenated files or it might happanes once a time
>2. is there any way how to find out whether tar.gz or tar.bz2 is
>concatenated?

Reply via email to