I'd like @Vyacheslav Daradur approach. Maybe somebody could have a look at UnsafeRow in Spark https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java UnsafeRow is a concrete InternalRow that represents a mutable internal raw-memory (and hence unsafe) binary row format.
P.S. If somebody is interested in this apporach, I could share more information вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin <sergi.vlady...@gmail.com>: > I really like Protobuf format. It is probably not what we need for O(1) > fields access, > but for compact data representation we can derive lots from there. > > Also IMO, restricting field type change is absolutely sane idea. > The correct way to evolve schema in common case is to add new fields and > gradually > deprecate the old ones, if you can skip default/null fields in binary > format this approach > will not introduce any noticeable performance/size overhead. > > Sergi > > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur <daradu...@gmail.com>: > > > I think, one of a possible way to reduce overhead and TCO - SQL Scheme > > approach. > > > > That assumes that metadata will be stored separately from serialized > > data to reduce size. > > In this case, the most advantages of Binary Objects like access in > > O(1) and access without deserialization may be achieved. > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <voze...@gridgain.com> > > wrote: > > > > > > Hi Alexey, > > > > > > Binary Objects only. > > > > > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev < > zaleslaw....@gmail.com > > > > > > wrote: > > > > > > > Do we discuss here Core features only or the roadmap for all > > components? > > > > > > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <voze...@gridgain.com > >: > > > > > > > > > Igniters, > > > > > > > > > > It is very likely that Apache Ignite 3.0 will be released next > year. > > So > > > > we > > > > > need to start thinking about major product improvements. I'd like > to > > > > start > > > > > with binary objects. > > > > > > > > > > Currently they are one of the main limiting factors for the > product. > > They > > > > > are fat - 30+ bytes overhead on average, high TCO of Apache Ignite > > > > > comparing to other vendors. They are slow - not suitable for SQL at > > all. > > > > > > > > > > I would like to ask all of you who worked with binary objects to > > share > > > > your > > > > > feedback and ideas, so that we understand how they should look like > > in AI > > > > > 3.0. This is a brain storm - let's accumulate ideas first and > > minimize > > > > > critics. Then we will work on ideas in separate topics. > > > > > > > > > > 1) Historical background > > > > > > > > > > BO were implemented around 2014 (Apache Ignite 1.5) when we started > > > > working > > > > > on .NET and CPP clients. During design we had several ideas in > mind: > > > > > - ability to read object fields in O(1) without deserialization > > > > > - interoperabillty between Java, .NET and CPP. > > > > > > > > > > Since then a number of other concepts were mixed to the cocktail: > > > > > - Affinity key fields > > > > > - Strict typing for existing fields (aka metadata) > > > > > - Binary Object as storage format > > > > > > > > > > 2) My proposals > > > > > > > > > > 2.1) Introduce "Data Row Format" interface > > > > > Binary Objects are terrible candidates for storage. Too fat, too > > slow. > > > > > Efficient storage typically has <10 bytes overhead per row (no > > metadata, > > > > no > > > > > length, no hash code, etc), allow supper-fast field access, support > > > > > different string formats (ASCII, UTF-8, etc), support different > > temporal > > > > > types (date, time, timestamp, timestamp with timezone, etc), and > > store > > > > > these types as efficiently as possible. > > > > > > > > > > What we need is to introduce an interface which will convert a pair > > of > > > > > key-value objects into a row. This row will be used to store data > > and to > > > > > get fields from it. Care about memory consumption, need SQL and > > strict > > > > > schema - use one format. Need flexibility and prefer key-value > > access - > > > > use > > > > > another format which will store binary objects unchanged (current > > > > > behavior). > > > > > > > > > > interface DataRowFormat { > > > > > DataRow create(Object key, Object value); // primitives or > binary > > > > > objects > > > > > DataRowMetadata metadata(); > > > > > } > > > > > > > > > > 2.2) Remove affinity field from metadata > > > > > Affinity rules are governed by cache, not type. We should remove > > > > > "affintiyFieldName" from metadata. > > > > > > > > > > 2.3) Remove restrictions on changing field type > > > > > I do not know why we did that in the first place. This restriction > > > > prevents > > > > > type evolution and confuses users. > > > > > > > > > > 2.4) Use bitmaps for "null" and default values and for fixed-length > > > > fields, > > > > > put fixed-length fields before variable-length. > > > > > Motivation: to save space. > > > > > > > > > > What else? Please share your ideas. > > > > > > > > > > Vladimir. > > > > > > > > > > > > > > > > > -- > > Best Regards, Vyacheslav D. > > >