A similar bug was reported: HADOOP-17096 <https://issues.apache.org/jira/browse/HADOOP-17096>
On Mon, May 11, 2020 at 3:48 PM Eric Yang <ey...@apache.org> wrote: > If I recall this problem correctly, the root cause is the default zstd > compression block size is 256kb, and Hadoop Zstd compression will attempt > to use the OS platform default compression size, if it is available. The > recommended output size is slightly bigger than input size to account for > header size in Zstd compression. > http://software.icecube.wisc.edu/coverage/00_LATEST/icetray/private/zstd/lib/compress/zstd_compress.c.gcov.html#2982 > > Where, Hadoop code > https://github.com/apache/hadoop/blame/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardCompressor.c#L259 > is > setting output size to the same as input size, if input size is bigger than > output size. By manually setting buffer size to a small value, input size > will be smaller than recommended output size to keep the system working. > By returning ZTD_CStreamOutSize() in getSteramSize, it may enable the > system to work without a predefined default. > > On Mon, May 11, 2020 at 2:29 PM Wei-Chiu Chuang > <weic...@cloudera.com.invalid> wrote: > >> Thanks for the pointer, it does look similar. However we are roughly on >> the >> latest of branch-3.1 and this fix is in our branch. I'm pretty sure we >> have >> all the zstd fixes. >> >> I believe the libzstd version used is 1.4.4 but need to confirm. I >> suspected it's a library version issue because we've been using zstd >> compression for over a year, and this bug (reproducible) happens >> consistently just recently. >> >> On Mon, May 11, 2020 at 1:57 PM Ayush Saxena <ayush...@gmail.com> wrote: >> >> > Hi Wei Chiu, >> > What is the Hadoop version being used? >> > Give a check if HADOOP-15822 is there, it had something similar error. >> > >> > -Ayush >> > >> > > On 11-May-2020, at 10:11 PM, Wei-Chiu Chuang <weic...@apache.org> >> wrote: >> > > >> > > Hadoop devs, >> > > >> > > A colleague of mine recently hit a strange issue where zstd >> compression >> > > codec crashes. >> > > >> > > Caused by: java.lang.InternalError: Error (generic) >> > > at >> > > >> > >> org.apache.hadoop.io.compress.zstd.ZStandardCompressor.deflateBytesDirect(Native >> > > Method) >> > > at >> > > >> > >> org.apache.hadoop.io.compress.zstd.ZStandardCompressor.compress(ZStandardCompressor.java:216) >> > > at >> > > >> > >> org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81) >> > > at >> > > >> > >> org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:76) >> > > at >> > > >> > >> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) >> > > at java.io.DataOutputStream.write(DataOutputStream.java:107) >> > > at >> > > >> > >> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.writeKVPair(IFile.java:617) >> > > at >> > > >> > >> org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.append(IFile.java:480) >> > > >> > > Anyone out there hitting the similar problem? >> > > >> > > A temporary workaround is to set buffer size "set >> > > io.compression.codec.zstd.buffersize=8192;" >> > > >> > > We suspected it's a bug in zstd library, but couldn't verify. Just >> want >> > to >> > > send this out and see if I can get some luck. >> > >> >