Mengran Wang created AVRO-2882:
----------------------------------

             Summary: Validate input data format before decoding it
                 Key: AVRO-2882
                 URL: https://issues.apache.org/jira/browse/AVRO-2882
             Project: Apache Avro
          Issue Type: Improvement
          Components: java
    Affects Versions: 1.9.2, 1.8.2
            Reporter: Mengran Wang
         Attachments: Screen Shot 2020-06-18 at 5.48.39 PM.png

When decoding a byte array using the Avro BinaryDecoder and 
SpecificDatumReader, is it possible to use the schema to check whether the 
input matches the definition before allocating memory buffer to process the 
data? 

One bug we have in production is that we defined a type of payload that 
consists of two parts: the first part is a fixed size byte array and the second 
part is a record of variable-length strings. During the deserialization 
process, we'll extract the byte array first (using schema A) and then read out 
the strings (using schema B). However, we accidentally create a malformed 
payload that leaves out the byte array part. We assume Avro should throw out 
some kind of RuntimeException when decoding this malformed payload, but it 
ended up allocating a huge memory buffer *scratchUtf8* to read the string and 
eventually cause a JVM OOM error on our end. 
{code:java}
fixed MD5(16); // fixed length 
record A {
  MD5 hash;
}

record B {
  string name1;
  string name2;
  union {null, string} name3 = null;
}
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to