David Mollitor created AVRO-3183:
------------------------------------

             Summary: Do Not Double Buffer Data in DataFileWriter
                 Key: AVRO-3183
                 URL: https://issues.apache.org/jira/browse/AVRO-3183
             Project: Apache Avro
          Issue Type: Improvement
          Components: java
    Affects Versions: 1.10.0
            Reporter: David Mollitor
            Assignee: David Mollitor


{code:java|title=DataFileWriter.java}
  private void init(OutputStream outs) throws IOException {
    this.underlyingStream = outs;
    this.out = new BufferedFileOutputStream(outs);
    EncoderFactory efactory = new EncoderFactory();
    // binaryEncoder returns a buffered Encoder and is wrapping a 
BufferedFileOutputStream
    this.vout = efactory.binaryEncoder(out, null);
    dout.setSchema(schema);
    buffer = new NonCopyingByteArrayOutputStream(Math.min((int) (syncInterval * 
1.25), Integer.MAX_VALUE / 2 - 1));
    // binaryEncoder returns a buffered Encoder and is wrapping a 
NonCopyingByteArrayOutputStream
    this.bufOut = efactory.binaryEncoder(buffer, null);
    if (this.codec == null) {
      this.codec = CodecFactory.nullCodec().createInstance();
    }
    this.isOpen = true;
  }
{code}

The {{FileWriter}} is double-buffering the output which just adds redundant 
overhead and truthfully the buffering offered by the object returned by 
{{binaryEncoder}} is a bit simplistic and does not do as good of a job as the 
buffering in {{BufferedFileOutputStream}}.

Remove this double buffering by using a 'direct' {{binaryEncoder}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to