If I recall this problem correctly, the root cause is the default zstd
compression block size is 256kb, and Hadoop Zstd compression will attempt
to use the OS platform default compression size, if it is available.  The
recommended output size is slightly bigger than input size to account for
header size in Zstd compression.
http://software.icecube.wisc.edu/coverage/00_LATEST/icetray/private/zstd/lib/compress/zstd_compress.c.gcov.html#2982

Where, Hadoop code
https://github.com/apache/hadoop/blame/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardCompressor.c#L259
is
setting output size to the same as input size, if input size is bigger than
output size.  By manually setting buffer size to a small value, input size
will be smaller than recommended output size to keep the system working.
By returning ZTD_CStreamOutSize() in getSteramSize, it may enable the
system to work without a predefined default.

On Mon, May 11, 2020 at 2:29 PM Wei-Chiu Chuang
<weic...@cloudera.com.invalid> wrote:

> Thanks for the pointer, it does look similar. However we are roughly on the
> latest of branch-3.1 and this fix is in our branch. I'm pretty sure we have
> all the zstd fixes.
>
> I believe the libzstd version used is 1.4.4 but need to confirm. I
> suspected it's a library version issue because we've been using zstd
> compression for over a year, and this bug (reproducible) happens
> consistently just recently.
>
> On Mon, May 11, 2020 at 1:57 PM Ayush Saxena <ayush...@gmail.com> wrote:
>
> > Hi Wei Chiu,
> > What is the Hadoop version being used?
> > Give a check if HADOOP-15822 is there, it had something similar error.
> >
> > -Ayush
> >
> > > On 11-May-2020, at 10:11 PM, Wei-Chiu Chuang <weic...@apache.org>
> wrote:
> > >
> > > Hadoop devs,
> > >
> > > A colleague of mine recently hit a strange issue where zstd compression
> > > codec crashes.
> > >
> > > Caused by: java.lang.InternalError: Error (generic)
> > > at
> > >
> >
> org.apache.hadoop.io.compress.zstd.ZStandardCompressor.deflateBytesDirect(Native
> > > Method)
> > > at
> > >
> >
> org.apache.hadoop.io.compress.zstd.ZStandardCompressor.compress(ZStandardCompressor.java:216)
> > > at
> > >
> >
> org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)
> > > at
> > >
> >
> org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:76)
> > > at
> > >
> >
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
> > > at java.io.DataOutputStream.write(DataOutputStream.java:107)
> > > at
> > >
> >
> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.writeKVPair(IFile.java:617)
> > > at
> > >
> >
> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.append(IFile.java:480)
> > >
> > > Anyone out there hitting the similar problem?
> > >
> > > A temporary workaround is to set buffer size "set
> > > io.compression.codec.zstd.buffersize=8192;"
> > >
> > > We suspected it's a bug in zstd library, but couldn't verify. Just want
> > to
> > > send this out and see if I can get some luck.
> >
>

Reply via email to