Re: [IMPORTANT] Future of Binary Objects

Alexey Zinoviev Tue, 20 Nov 2018 23:10:40 -0800

I'd like @Vyacheslav Daradur approach.

Maybe somebody could have a look at UnsafeRow in Spark
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
UnsafeRow is a concrete InternalRow that represents a mutable internal
raw-memory (and hence unsafe) binary row format.


P.S. If somebody is interested in this apporach, I could share more
information

вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin <sergi.vlady...@gmail.com>:

> I really like Protobuf format. It is probably not what we need for O(1)
> fields access,
> but for compact data representation we can derive lots from there.
>
> Also IMO, restricting field type change is absolutely sane idea.
> The correct way to evolve schema in common case is to add new fields and
> gradually
> deprecate the old ones, if you can skip default/null fields in binary
> format this approach
> will not introduce any noticeable performance/size overhead.
>
> Sergi
>
> вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur <daradu...@gmail.com>:
>
> > I think, one of a possible way to reduce overhead and TCO - SQL Scheme
> > approach.
> >
> > That assumes that metadata will be stored separately from serialized
> > data to reduce size.
> > In this case, the most advantages of Binary Objects like access in
> > O(1) and access without deserialization may be achieved.
> > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <voze...@gridgain.com>
> > wrote:
> > >
> > > Hi Alexey,
> > >
> > > Binary Objects only.
> > >
> > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> zaleslaw....@gmail.com
> > >
> > > wrote:
> > >
> > > > Do we discuss here Core features only or the roadmap for all
> > components?
> > > >
> > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <voze...@gridgain.com
> >:
> > > >
> > > > > Igniters,
> > > > >
> > > > > It is very likely that Apache Ignite 3.0 will be released next
> year.
> > So
> > > > we
> > > > > need to start thinking about major product improvements. I'd like
> to
> > > > start
> > > > > with binary objects.
> > > > >
> > > > > Currently they are one of the main limiting factors for the
> product.
> > They
> > > > > are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> > > > > comparing to other vendors. They are slow - not suitable for SQL at
> > all.
> > > > >
> > > > > I would like to ask all of you who worked with binary objects to
> > share
> > > > your
> > > > > feedback and ideas, so that we understand how they should look like
> > in AI
> > > > > 3.0. This is a brain storm - let's accumulate ideas first and
> > minimize
> > > > > critics. Then we will work on ideas in separate topics.
> > > > >
> > > > > 1) Historical background
> > > > >
> > > > > BO were implemented around 2014 (Apache Ignite 1.5) when we started
> > > > working
> > > > > on .NET and CPP clients. During design we had several ideas in
> mind:
> > > > > - ability to read object fields in O(1) without deserialization
> > > > > - interoperabillty between Java, .NET and CPP.
> > > > >
> > > > > Since then a number of other concepts were mixed to the cocktail:
> > > > > - Affinity key fields
> > > > > - Strict typing for existing fields (aka metadata)
> > > > > - Binary Object as storage format
> > > > >
> > > > > 2) My proposals
> > > > >
> > > > > 2.1) Introduce "Data Row Format" interface
> > > > > Binary Objects are terrible candidates for storage. Too fat, too
> > slow.
> > > > > Efficient storage typically has <10 bytes overhead per row (no
> > metadata,
> > > > no
> > > > > length, no hash code, etc), allow supper-fast field access, support
> > > > > different string formats (ASCII, UTF-8, etc), support different
> > temporal
> > > > > types (date, time, timestamp, timestamp with timezone, etc), and
> > store
> > > > > these types as efficiently as possible.
> > > > >
> > > > > What we need is to introduce an interface which will convert a pair
> > of
> > > > > key-value objects into a row. This row will be used to store data
> > and to
> > > > > get fields from it. Care about memory consumption, need SQL and
> > strict
> > > > > schema - use one format. Need flexibility and prefer key-value
> > access -
> > > > use
> > > > > another format which will store binary objects unchanged (current
> > > > > behavior).
> > > > >
> > > > > interface DataRowFormat {
> > > > >     DataRow create(Object key, Object value); // primitives or
> binary
> > > > > objects
> > > > >     DataRowMetadata metadata();
> > > > > }
> > > > >
> > > > > 2.2) Remove affinity field from metadata
> > > > > Affinity rules are governed by cache, not type. We should remove
> > > > > "affintiyFieldName" from metadata.
> > > > >
> > > > > 2.3) Remove restrictions on changing field type
> > > > > I do not know why we did that in the first place. This restriction
> > > > prevents
> > > > > type evolution and confuses users.
> > > > >
> > > > > 2.4) Use bitmaps for "null" and default values and for fixed-length
> > > > fields,
> > > > > put fixed-length fields before variable-length.
> > > > > Motivation: to save space.
> > > > >
> > > > > What else? Please share your ideas.
> > > > >
> > > > > Vladimir.
> > > > >
> > > >
> >
> >
> >
> > --
> > Best Regards, Vyacheslav D.
> >
>

Re: [IMPORTANT] Future of Binary Objects

Reply via email to