[
https://issues.apache.org/jira/browse/AVRO-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15480604#comment-15480604
]
Ryan Blue commented on AVRO-1873:
---------------------------------
I wrote the same content from Java and from Ruby and hexdumped the result. The
problem was that the last 4 bytes were missing from the ruby payload, but the
rest of the Snappy-encoded data looked identical. From looking at [Java's
SnappyCodec|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/file/SnappyCodec.java],
it looks like those last 4 bytes are a CRC32 checksum. Adding the checksum
(using Zlib.crc32) fixed compatibility and made it so Avro blocks written by
Java and Ruby are identical.
For the read path, I implemented the check but the code doesn't throw an error
if the checksum doesn't match. Instead, it assumes that it is reading an older
Ruby file and decompresses the entire incoming buffer and passes the result
along. I don't think there's a way to both validate the checksum and detect old
files, so this seems reasonable to me.
> avro gem doesn't compatible with other languages with snappy compression
> ------------------------------------------------------------------------
>
> Key: AVRO-1873
> URL: https://issues.apache.org/jira/browse/AVRO-1873
> Project: Avro
> Issue Type: Bug
> Components: ruby
> Affects Versions: 1.8.1
> Environment: CentOS 6.8 64bit, Snappy 1.1.0, Python 3.5, Ruby 2.2.3
> Reporter: Pumsuk Cho
> Priority: Blocker
> Fix For: 1.8.2
>
>
> I've tested avro gem today, then found some weird result.
> With python library like "fastavro", generated an avro file snappy
> compressed. This file works fine with avro-tools-1.8.1.jar.
> java -jar avro-tools-1.8.1.jar tojson testing.avro returns what I expected.
> But NOT compatible with ruby using avro gem returns "Invalid Input" message.
> And snappy compressed avro file made with avro gem doesn't work with
> avro-tools nor in python with avro-python3 and fastavro.
> my ruby codes are below:
> schema = Avro::Schema.paese(File.open('test.avsc', 'r').read)
> avrofile = File.open('test.avro', 'wb')
> writer = Avro::IO::DatumWriter.new(schema)
> datawriter = Avro::DataFile::Writer.new file, writer, schema, 'snappy'
> datawriter<< {"title" => "Avro", "author" => "Apache Foundation"}
> datawriter.close
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)