Re: Deserialization of BinaryObject's byte arrays

Vladimir Ozerov Tue, 07 Nov 2017 00:26:26 -0800

Regarding "sefl-imposed rules":
1) Binary protocol is essentially ignite's type system. It is used in many
places: Java, .NET, CPP, SQL (native, JDBC, ODBC), thin client. Of those 5
parties only Java has "ByteBuffer" concept. How other platforms should work
with this type?
2) We compare objects using binary representation
(BinaryArrayIdentityResolver#equals0). So if one user write data as byte
array, and another write the same data as ByteBuffer, resulting objects
will not be equal
These are clear examples on why type system should be as small as possible
- we simply cannot afford extending type system for every minor use case.
Binary protocol is internal thing and has nothing to do with user
experience.


What is worse, ByteBuffer is terrible abstraction as recognized by many
millions of Java users - it is stateful. One cannot read data from it
without changing it's state. And Ignite gives no guarantees that particular
object will be serialized only once per certain operation. One example is
BinaryMetadataCollector - Ignite could go through object twice per
serialization cycle to collect metadata. How are we going to maintain
idempotence in general case?

What could improve UX is public interfaces. Raw example of what we can do:
1) Define generic wrappers over our streams that will give users ability to
read/write byte arrays as they want. E.g.:
interface BinaryArrayWriter {
    write(byte[] src, int offset, int len);
    write(ByteBuffer src, ...);
}

interface BinaryArrayReader {
    int readLength();
    byte[] read();
    void read(byte[] dest, int off, int len);
    void read(ByteBuffer dest);
}

2) Then add new methods to reader/writer/builder. E.g.:
BinaryWriter.writeByteArray(String name, BiConsumer<Object,
BinaryArrayWriter> consumer); // Consumes original field and array writer;
BinaryReader.readByteArray(String name, Function<BinaryArrayReader>
consumer);            // Takes array reader, produces original field

3) Last, introduce field annotation for user convenience:
class MyClass {
    @BinaryArray(writerClass = MyWriterBiConsumer.class, readerClass =
MyReaderFunction.class)
    private ByteBuffer buf;
}

Now all user need to do is to implement write consumer and read function.

Vladimir.


On Tue, Nov 7, 2017 at 10:16 AM, Vladimir Ozerov <voze...@gridgain.com>
wrote:

> ByteBuffer is not fundamental data type. It is a thin wrapper over array.
> Protocol should contain only fundamental types.
>
> вт, 7 нояб. 2017 г. в 10:05, Andrey Kornev <andrewkor...@hotmail.com>:
>
>> Vladimir,
>>
>> Could you please elaborate as to why adding ByteBuffer to the Binary
>> protocol is “not a good idea”? What is inherently wrong about it?
>>
>> As for your suggestion, while I certainly see your point, it comes with
>> the inconvenience of implementing Binarylizable for every entity type in an
>> application, which makes me wonder if the self imposed rules such as “type
>> system should be kept as small as possible” could be relaxed in the name of
>> greater good and user friendliness. Ignite API is not easy to grok as it is
>> already.
>>
>> Thanks
>> Andrey
>> _____________________________
>> From: Vladimir Ozerov <voze...@gridgain.com<mailto:voze...@gridgain.com>>
>> Sent: Monday, November 6, 2017 10:22 PM
>> Subject: Re: Deserialization of BinaryObject's byte arrays
>> To: <dev@ignite.apache.org<mailto:dev@ignite.apache.org>>
>> Cc: Valentin Kulichenko <valentin.kuliche...@gmail.com<mailto:
>> valentin.kuliche...@gmail.com>>
>>
>>
>> Hi Andrey,
>>
>> While we deifnitely need to support byte buffers, adding new type to
>> protocol is not very good idea. Binary protocol is the heart of our
>> ssytem,
>> which is used not oly for plain Java, but for other platforms and SQL. And
>> ByteBuffer is essentially the same byte array, but with different API.
>> What
>> you describe is a special case of byte array read/write. For this reason
>> our type system should be kept as small as possible and it doesn't require
>> new type to be introduced. Instead, we'd better to handle everything on
>> API
>> level only. For example, we can add new methods to reader/writer/builder,
>> which will handle this. E.g.:
>>
>> BinaryWriter.writeByteArray(<some producer here>);
>> BinaryReader.readByteArray(<some consumer here>);
>>
>> Vladimir.
>>
>> On Mon, Nov 6, 2017 at 9:25 PM, Andrey Kornev <andrewkor...@hotmail.com<
>> mailto:andrewkor...@hotmail.com>>
>> wrote:
>>
>> > Will do! Thanks!
>> > ________________________________
>> > From: Valentin Kulichenko <valentin.kuliche...@gmail.com<mailto:
>> valentin.kuliche...@gmail.com>>
>> > Sent: Monday, November 6, 2017 9:17 AM
>> > To: dev@ignite.apache.org<mailto:dev@ignite.apache.org>
>> > Subject: Re: Deserialization of BinaryObject's byte arrays
>> >
>> > This use case was discussed couple of times already, so it seems to be
>> > pretty important for users. And I like the approach suggested by Andrey.
>> >
>> > Andrey, can you create a Jira ticket for this change? Would you like to
>> > contribute it?
>> >
>> > -Val
>> >
>> > On Fri, Nov 3, 2017 at 4:18 PM, Dmitriy Setrakyan <
>> dsetrak...@apache.org<mailto:dsetrak...@apache.org>>
>> > wrote:
>> >
>> > > This makes sense to me. Sounds like a useful feature.
>> > >
>> > > Would be nice to hear what others in the community think?
>> > >
>> > > D.
>> > >
>> > > On Fri, Nov 3, 2017 at 1:07 PM, Andrey Kornev <
>> andrewkor...@hotmail.com<mailto:andrewkor...@hotmail.com>>
>> > > wrote:
>> > >
>> > > > Hello,
>> > > >
>> > > > We store lots of byte arrays (serialized graph-like data
>> structures) as
>> > > > fields of BinaryObject. Later on, while reading such field,
>> > > > BinaryInputStream implementation creates an on-heap array and copies
>> > the
>> > > > bytes from the BinaryObject's internal backing array to the new
>> array.
>> > > >
>> > > > While in general case it works just fine, in our case, a lot of CPU
>> is
>> > > > spent on copying of the bytes and significant amount garbage is
>> > generated
>> > > > as well.
>> > > >
>> > > > I'd like Ignite Community to consider the following proposal:
>> introduce
>> > > > support for serialization/deserialization of the ByteBuffer class. A
>> > > > BinaryInputStream implementation would special case ByteBuffer
>> fields
>> > > same
>> > > > way it currently special cases various Java types (collections,
>> dates,
>> > > > timestamps, enums, etc.)
>> > > >
>> > > > The application developer would simply call
>> > > BinaryObjectBuilder.setField(...)
>> > > > passing in an instance of ByteBuffer. During serialization, the
>> > > > ByteBuffer's bytes would simply be copied to the backing array
>> (similar
>> > > to
>> > > > how regular byte arrays are serialized today) and in the binary
>> > object's
>> > > > metadata the field marked as, say, GridBinaryMarshaller.BYTE_
>> BUFFER.
>> > > >
>> > > > During deserialization of the field, instead of allocating a new
>> byte
>> > > > array and transferring the bytes from the BinaryObject's backing
>> array,
>> > > > BinaryInputStream would simply wrap the required sub-array into a
>> > > read-only
>> > > > instance of ByteBuffer using "public static ByteBuffer wrap(byte[]
>> > array,
>> > > > int offset, int length)" and return the instance to the application.
>> > > >
>> > > > This approach doesn't require any array allocation/data copying and
>> is
>> > > > faster and significantly reduces pressure on the garbage collector.
>> The
>> > > > benefits are especially great when application stores significant
>> > amount
>> > > of
>> > > > its data as byte arrays.
>> > > >
>> > > > Clearly, this approach equally applies to arrays of other primitive
>> > > types.
>> > > >
>> > > > A minor complication does arise when dealing with off-heap
>> > BinaryObjects,
>> > > > but nothing that can't solved with a little bit of Java reflection
>> > > wizardry
>> > > > (or by other means).
>> > > >
>> > > > Thanks,
>> > > > Andrey
>> > > >
>> > >
>> >
>>
>>
>>

Re: Deserialization of BinaryObject's byte arrays

Reply via email to