[ 
https://issues.apache.org/jira/browse/AVRO-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15480604#comment-15480604
 ] 

Ryan Blue commented on AVRO-1873:
---------------------------------

I wrote the same content from Java and from Ruby and hexdumped the result. The 
problem was that the last 4 bytes were missing from the ruby payload, but the 
rest of the Snappy-encoded data looked identical. From looking at [Java's 
SnappyCodec|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/file/SnappyCodec.java],
 it looks like those last 4 bytes are a CRC32 checksum. Adding the checksum 
(using Zlib.crc32) fixed compatibility and made it so Avro blocks written by 
Java and Ruby are identical.

For the read path, I implemented the check but the code doesn't throw an error 
if the checksum doesn't match. Instead, it assumes that it is reading an older 
Ruby file and decompresses the entire incoming buffer and passes the result 
along. I don't think there's a way to both validate the checksum and detect old 
files, so this seems reasonable to me.

> avro gem doesn't compatible with other languages with snappy compression
> ------------------------------------------------------------------------
>
>                 Key: AVRO-1873
>                 URL: https://issues.apache.org/jira/browse/AVRO-1873
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.8.1
>         Environment: CentOS 6.8 64bit, Snappy 1.1.0, Python 3.5, Ruby 2.2.3
>            Reporter: Pumsuk Cho
>            Priority: Blocker
>             Fix For: 1.8.2
>
>
> I've tested avro gem today, then found some weird result.
> With python library like "fastavro", generated an avro file snappy 
> compressed. This file works fine with avro-tools-1.8.1.jar.
> java -jar avro-tools-1.8.1.jar tojson testing.avro returns what I expected.
> But NOT compatible with ruby using avro gem returns "Invalid Input" message. 
> And snappy compressed avro file made with avro gem doesn't work with 
> avro-tools nor in python with avro-python3 and fastavro.
> my ruby codes are below:
> schema = Avro::Schema.paese(File.open('test.avsc', 'r').read)
> avrofile = File.open('test.avro', 'wb')
> writer = Avro::IO::DatumWriter.new(schema)
> datawriter = Avro::DataFile::Writer.new file, writer, schema, 'snappy'
> datawriter<< {"title" => "Avro", "author" => "Apache Foundation"}
> datawriter.close



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to