Zhang Jiawei created AVRO-4172: ---------------------------------- Summary: [C++] Fix ZSTD codec compatibility with Java Avro Key: AVRO-4172 URL: https://issues.apache.org/jira/browse/AVRO-4172 Project: Apache Avro Issue Type: Bug Reporter: Zhang Jiawei Attachments: image-2025-08-10-18-27-06-588.png
We have identified two cross-language compatibility issues related to the ZSTD codec in Avro: # Different codec names • In Java Avro (and the other language bindings that follow it) the codec is written into the file metadata as {{{}"zstandard"{}}}, • while the C++ implementation writes {{{}"zstd"{}}}. This makes a data file produced by one language unreadable by the other. Java: [https://github.com/apache/avro/blob/dc7bbd086283bb61dfabd8fcdf980d22f30c7a93/lang/java/avro/src/main/java/org/apache/avro/file/DataFileConstants.java#L40] C++: [https://github.com/apache/avro/blob/dc7bbd086283bb61dfabd8fcdf980d22f30c7a93/lang/c%2B%2B/impl/DataFile.cc#L57] # Streaming vs. single-shot encoding Java Avro writes ZSTD data in streaming mode, whereas the C++ implementation can only decode single-shot ZSTD frames. As a result, a ZSTD-compressed file generated by Java Avro cannot be read by the current C++ library. Reference: [https://github.com/apache/avro/blob/dc7bbd086283bb61dfabd8fcdf980d22f30c7a93/lang/c%2B%2B/impl/DataFile.cc#L494] -- This message was sent by Atlassian Jira (v8.20.10#820010)