Hi Ryan,

Thank you so much for your reply! You were right about the encoder in the
serializer method, that was my mistake. I submitted a png rather than just
text because I thought the highlighting would help.
I may not have been very clear about my question, I understand that via the
DatumWriter/DatumReader I can serialize and deserialize a given Avro
GenericRecord respectively.

My question is, consider several GenericRecords all concatenated into a
single byte array as follows:

*[serializedGenericRecord1, serializedGenericRecord2,
serializedGenericRecord3, etc...]*

How can I deserialize them using the DatumReader API? If it's possible
out-of-the-box can you point me in the right direction?
Does this make sense?

See the code below (in text this time :) ) if it helps:

public void serialize(final List<Event> events, final UUID schemaId,
final ByteBuffer buffer) throws IOException {
    final Schema schema = getAvroSchema(schemaId);
    final ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
    final Encoder encoder =
EncoderFactory.get().binaryEncoder(outputStream, null);
    final GenericDatumWriter<GenericRecord> datumWriter = new
GenericDatumWriter<>(schema);

    for (final Event event : events) {
        final GenericData.Record record = new GenericData.Record(schema);
        //populate record object
        datumWriter.write(record, encoder);
    }

    outputStream.close();
    buffer.put(outputStream.toByteArray());
}

public List<Event> deserialize(final ByteBuffer buffer, final UUID
schemaId) throws IOException {
    final List<Event> events = new ArrayList<>();
    final Schema schema = getAvroSchema(schemaId);
    final BinaryDecoder decoder =
DecoderFactory.get().binaryDecoder(buffer.array(), null);
    final GenericDatumReader<GenericRecord> datumReader = new
GenericDatumReader<>(schema);
    GenericRecord record = new GenericData.Record(schema);

    // How do I loop?
    record  = datumReader.read(record, decoder);
    // populate Event object and add to list

    return events;
}


Thank you once again for your help!

Cheers
Pedro Cardoso

Research Data Engineer

pedro.card...@feedzai.com




[image: Follow Feedzai on Facebook.] <https://www.facebook.com/Feedzai/>[image:
Follow Feedzai on Twitter!] <https://twitter.com/feedzai>[image: Connect
with Feedzai on LinkedIn!] <https://www.linkedin.com/company/feedzai/>
<https://feedzai.com/>[image: Feedzai in Forbes Fintech 50!]
<https://www.forbes.com/fintech/list/>


On Wed, Jan 29, 2020 at 5:34 PM Ryan Skraba <r...@skraba.com> wrote:

> Hello!
>
> It's a bit difficult to discover what's going wrong -- I'm not sure that
> the code in the image corresponds to the exception you are encountering!
> Notably, there's no reference to DataFileStream...  Typically, it would be
> easier with code as TXT than as PNG!
>
> It is definitely possible to serialize Avro GenericRecords into bytes!
> The example code looks like it's using the DataFileWriter (and ignoring the
> Encoder).  Keep in mind that this creates an Avro file (also known as a
> Avro Object Container file or .avro file).  This is more than just "pure"
> serialized bytes -- it contains some header information and sync markers,
> which makes it easier to split and process a single file on multiple nodes
> in big data.
>
> If you were to use a DatumWriter and an encoder, you could obtain just the
> "pure" binary data without any framing bytes.  If that is your goal, I
> suggest looking into the DatumWriter / DatumReader classes (as opposed to
> the DataFileXxx classes).
>
> From the given exception "Invalid sync" it looks like you might be writing
> pure Avro bytes and attempting to read the file format.
>
> Since the DatumWriter API uses OutputStream (instead of ByteBuffer),
> there's a utility class called ByteBufferOutputStream that you might find
> interesting.  It permits writing to a series of 8K java.nio.ByteBuffer
> instances, which might be OK for your use case.  There are other
> implementations of ByteBuffer-backed OutputStreams available that might be
> better suited.
>
> I hope this is useful, Ryan
>
>
> On Wed, Jan 29, 2020 at 4:22 PM Pedro Cardoso <pedro.card...@feedzai.com>
> wrote:
>
>> Hello,
>>
>> I am trying to write a sequence of Avro GenericRecords into a Java
>> ByteBuffer and later on deserialize them. I have tried using
>> FileWriter/Readers and copying the content of the underlying buffer to my
>> target object. The alternative is to try to split a ByteBuffer by the
>> serialized GenericRecords individually and use a BinaryDecoder to read each
>> property of a record individually.
>>
>> Please see attached such an example of the former code.
>> The presented code fails with
>>
>> org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync!
>> at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:223)
>> at com.feedzai.research.experiments.bookkeeper.Avro.main(Avro.java:97)
>> Caused by: java.io.IOException: Invalid sync!
>> at
>> org.apache.avro.file.DataFileStream.nextRawBlock(DataFileStream.java:318)
>> at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:212)
>> ... 1 more
>>
>> Hence my questions are:
>>  - Is it at all possible to serialize/deserialize lists of Avro records
>> to a ByteBuffer and back?
>>  - If so, can anyone point me in the right direction?
>>  - If not, can anyone point me to code examples of alternative solutions?
>>
>> Thank you and have a good day.
>>
>> Pedro Cardoso
>>
>> Research Data Engineer
>>
>> pedro.card...@feedzai.com
>>
>>
>>
>>
>> [image: Follow Feedzai on Facebook.] 
>> <https://www.facebook.com/Feedzai/>[image:
>> Follow Feedzai on Twitter!] <https://twitter.com/feedzai>[image: Connect
>> with Feedzai on LinkedIn!] <https://www.linkedin.com/company/feedzai/>
>> <https://feedzai.com/>[image: Feedzai in Forbes Fintech 50!]
>> <https://www.forbes.com/fintech/list/>
>>
>> *The content of this email is confidential and intended for the recipient
>> specified in message only. It is strictly prohibited to share any part of
>> this message with any third party, without a written consent of the sender.
>> If you received this message by mistake, please reply to this message and
>> follow with its deletion, so that we can ensure such a mistake does not
>> occur in the future.*
>
>

-- 
The content of this email is confidential and 
intended for the recipient 
specified in message only. It is strictly 
prohibited to share any part of 
this message with any third party, 
without a written consent of the 
sender. If you received this message by
 mistake, please reply to this 
message and follow with its deletion, so 
that we can ensure such a mistake 
does not occur in the future.

Reply via email to