In short, During marshalling a fields is represented as BinaryFieldAccessor which manages its marshalling. It checks if the field is marked by annotation @BinaryCompression, in that case - binary representation of field (bytes array) will be compressed. It will be marked as compressed by types constant (GridBinaryMarshaller.COMPRESSED), after this the compressed bytes array wiil be include in binary representation of whole object. Note, header of marshalled object will not be compressed. Compression affected only object's field representation.
Objects in IgniteCache is represented as BinaryObject which is wrapper over bytes array of marshalled object. BinaryObject provides some usefull methods, which are used by Ignite systems. For example, the Queries use BinaryObject#field method, which deserializes only field of object, without deserializing of whole object. BinaryObject#field method during deserialization, if meets the constant of compressed type, decompress this bytes array, then continue unmarshalling as usual. Now, I introduced the Compressor interface in IgniteConfigurations, it allows user to use own implementation of compressor - it is the requirement in the task[1]. As far as I know, Vladimir Ozerov doesn't like the idea of granting this opportunity to the user. In that case we can choose a compression algorithm which we will provide by default and will move the interface to internals of binary infractructure. For this case I've prepared benchmarked, which I've sent earlier. I vote for ZSTD algorithm[2], it provides good compression ratio and good throughput. It has implementation in Java, .NET and C++, and has ASF-friendly license, we can use it in the all Ignite platforms. You can look at an assessment of this algorithm in my benchmark's [1] https://issues.apache.org/jira/browse/IGNITE-3592 [2]https://github.com/facebook/zstd 2017-06-06 16:02 GMT+03:00 Антон Чураев <churaev...@gmail.com>: > Looks good for me. > > Could You propose design of implementation in couple of sentences? > So that we can estimate the completeness and complexity of the proposal. > > 2017-06-06 15:26 GMT+03:00 Vyacheslav Daradur <daradu...@gmail.com>: > > > Anton, > > > > Of course, the solution does not affect on existing implementation. I > mean, > > there is no changes if user not use the annotation @BinaryCompression. > (no > > performance changes) > > Only if user make decision to use compression on specific field or fields > > of a class - in that case compression will be used at marshalling in > > relation to annotated fields. > > > > 2017-06-06 15:10 GMT+03:00 Антон Чураев <churaev...@gmail.com>: > > > > > Vyacheslav, > > > > > > Is it possible to propose implementation that can be switched on > > on-demand? > > > In this case it should not affect performance of current solution. > > > > > > I mean, that users should make decision what is more important for > them: > > > throutput or memory/net usage. > > > May be they will be choose not all objects, or only some attributes of > > > objects for compress. > > > > > > 2017-06-06 14:48 GMT+03:00 Vyacheslav Daradur <daradu...@gmail.com>: > > > > > > > Conclusion: > > > > Provided solution allows reduce size of an object in IgniteCache at > the > > > > cost of throughput reduction (small - in some cases), it depends on > > part > > > of > > > > object which will be compressed and compression algorithm. > > > > I mean, we can make more effective use of memory, and in some cases > it > > > can > > > > reduce loading of the interconnect. (replication, rebalancing) > > > > > > > > Especially, it will be particularly useful for object's fields which > > are > > > > large text (>~ 250 bytes) and can be effectively compressed. > > > > > > > > 2017-06-06 12:00 GMT+03:00 Антон Чураев <churaev...@gmail.com>: > > > > > > > > > Vyacheslav, thank you! But could you please provide a conclusions > or > > > > > proposals based on this benchmarks? > > > > > > > > > > 2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur <daradu...@gmail.com > >: > > > > > > > > > > > Dmitry, > > > > > > > > > > > > Excel-pages: > > > > > > > > > > > > 1). "Compression ratio (2)" - shows object size, with compression > > and > > > > > > without compression. (Conditions: literal text) > > > > > > 1st graph shows compression ratios of using different compression > > > > > algrithms > > > > > > depending on size of compressed field. > > > > > > 2nd graph shows evaluation of size of objects depending on sizes > > and > > > > > > compression algorithms. > > > > > > > > > > > > 2). "Compression ratio (1)" - shows object size, with compression > > and > > > > > > without compression. (Conditions: badly compressed character > > > sequence) > > > > > > 1st graph shows compression ratios of using different compression > > > > > > algrithms depending on size of compressed field. > > > > > > 2nd graph shows evaluation of size of objects depending on sizes > > and > > > > > > compression algorithms. > > > > > > > > > > > > 3) 'put-avg" - shows average time of the "put" operation > depending > > on > > > > > size > > > > > > and compression algorithms. > > > > > > > > > > > > 4) 'put-thrpt" - shows throughput of the "put" operation > depending > > on > > > > > size > > > > > > and compression algorithms. > > > > > > > > > > > > 5) 'get-avg" - shows average time of the "get" operation > depending > > on > > > > > size > > > > > > and compression algorithms. > > > > > > > > > > > > 6) 'get-thrpt" - shows throughput of the "get" operation > depending > > on > > > > > size > > > > > > and compression algorithms. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan < > > dsetrak...@apache.org > > > >: > > > > > > > > > > > > > Vladimir, I am not sure how to interpret the graphs? What are > we > > > > > looking > > > > > > > at? > > > > > > > > > > > > > > On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur < > > > > > daradu...@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Hi, Igniters. > > > > > > > > > > > > > > > > I've prepared some benchmarking. Results [1]. > > > > > > > > > > > > > > > > And I've prepared the evaluation in the form of diagrams [2]. > > > > > > > > > > > > > > > > I hope that helps to interest the community and accelerates a > > > > > reaction > > > > > > to > > > > > > > > this improvment :) > > > > > > > > > > > > > > > > [1] > > > > > > > > https://github.com/daradurvs/ignite-compression/tree/ > > > > > > > > master/src/main/resources/result > > > > > > > > [2] https://drive.google.com/file/d/ > > > 0B2CeUAOgrHkoMklyZ25YTEdKcEk/ > > > > > view > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur < > > > daradu...@gmail.com > > > > >: > > > > > > > > > > > > > > > > > Guys, any thoughts? > > > > > > > > > > > > > > > > > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur < > > > > daradu...@gmail.com > > > > > >: > > > > > > > > > > > > > > > > > >> Hi guys, > > > > > > > > >> > > > > > > > > >> I've prepared the PR to show my idea. > > > > > > > > >> https://github.com/apache/ignite/pull/1951/files > > > > > > > > >> > > > > > > > > >> About querying - I've just copied existing tests and have > > > > > annotated > > > > > > > the > > > > > > > > >> testing data. > > > > > > > > >> https://github.com/apache/ignite/pull/1951/files#diff- > > c19a9d > > > > > > > > >> f4058141d059bb577e75244764 > > > > > > > > >> > > > > > > > > >> It means fields which will be marked by @BinaryCompression > > > will > > > > be > > > > > > > > >> compressed at marshalling via BinaryMarshaller. > > > > > > > > >> > > > > > > > > >> This solution has no effect on existing data or project > > > > > > architecture. > > > > > > > > >> > > > > > > > > >> I'll be glad to see your thougths. > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur < > > > > > daradu...@gmail.com > > > > > > >: > > > > > > > > >> > > > > > > > > >>> Dmitriy, > > > > > > > > >>> > > > > > > > > >>> I have ready prototype. I want to show it. > > > > > > > > >>> It is always easier to discuss on example. > > > > > > > > >>> > > > > > > > > >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan < > > > > > > dsetrak...@apache.org > > > > > > > >: > > > > > > > > >>> > > > > > > > > >>>> Vyacheslav, > > > > > > > > >>>> > > > > > > > > >>>> I think it is a bit premature to provide a PR without > > > getting > > > > a > > > > > > > > >>>> community > > > > > > > > >>>> consensus on the dev list. Please allow some time for > the > > > > > > community > > > > > > > to > > > > > > > > >>>> respond. > > > > > > > > >>>> > > > > > > > > >>>> D. > > > > > > > > >>>> > > > > > > > > >>>> On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur < > > > > > > > > >>>> daradu...@gmail.com> > > > > > > > > >>>> wrote: > > > > > > > > >>>> > > > > > > > > >>>> > I created the ticket: https://issues.apache.org/jira > > > > > > > > >>>> /browse/IGNITE-5226 > > > > > > > > >>>> > > > > > > > > > >>>> > I'll prepare a PR with described solution in couple of > > > days. > > > > > > > > >>>> > > > > > > > > > >>>> > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur < > > > > > > > daradu...@gmail.com > > > > > > > > >: > > > > > > > > >>>> > > > > > > > > > >>>> > > Hi, Igniters! > > > > > > > > >>>> > > > > > > > > > > >>>> > > Apache 2.0 is released. > > > > > > > > >>>> > > > > > > > > > > >>>> > > Let's continue the discussion about a compression > > > design. > > > > > > > > >>>> > > > > > > > > > > >>>> > > At the moment, I found only one solution which is > > > > compatible > > > > > > > with > > > > > > > > >>>> > querying > > > > > > > > >>>> > > and indexing, this is per-objects-field compression. > > > > > > > > >>>> > > Per-fields compression means that metadata (a > header) > > of > > > > an > > > > > > > object > > > > > > > > >>>> won't > > > > > > > > >>>> > > be compressed, only serialized values of an object > > > fields > > > > > (in > > > > > > > > bytes > > > > > > > > >>>> array > > > > > > > > >>>> > > form) will be compressed. > > > > > > > > >>>> > > > > > > > > > > >>>> > > This solution have some contentious issues: > > > > > > > > >>>> > > - small values, like primitives and short arrays - > > there > > > > > isn't > > > > > > > > >>>> sense to > > > > > > > > >>>> > > compress them; > > > > > > > > >>>> > > - there is no possible to use compression with > > > > > java-predefined > > > > > > > > >>>> types; > > > > > > > > >>>> > > > > > > > > > > >>>> > > We can provide an annotation, @IgniteCompression - > for > > > > > > example, > > > > > > > > >>>> which can > > > > > > > > >>>> > > be used by users for marking fields to compress. > > > > > > > > >>>> > > > > > > > > > > >>>> > > Any thoughts? > > > > > > > > >>>> > > > > > > > > > > >>>> > > Maybe someone already have ready design? > > > > > > > > >>>> > > > > > > > > > > >>>> > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur < > > > > > > > > daradu...@gmail.com > > > > > > > > >>>> >: > > > > > > > > >>>> > > > > > > > > > > >>>> > >> Alexey, > > > > > > > > >>>> > >> > > > > > > > > >>>> > >> Yes, I've read it. > > > > > > > > >>>> > >> > > > > > > > > >>>> > >> Ok, let's discuss about public API design. > > > > > > > > >>>> > >> > > > > > > > > >>>> > >> I think we need to add some a configure entity to > > > > > > > > >>>> CacheConfiguration, > > > > > > > > >>>> > >> which will contain the Compressor interface > > > > implementation > > > > > > and > > > > > > > > some > > > > > > > > >>>> > usefull > > > > > > > > >>>> > >> parameters. > > > > > > > > >>>> > >> Or maybe to provide a BinaryMarshaller decorator, > > which > > > > > will > > > > > > be > > > > > > > > >>>> compress > > > > > > > > >>>> > >> data after marshalling. > > > > > > > > >>>> > >> > > > > > > > > >>>> > >> > > > > > > > > >>>> > >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov < > > > > > > > > akuznet...@apache.org > > > > > > > > >>>> >: > > > > > > > > >>>> > >> > > > > > > > > >>>> > >>> Vyacheslav, > > > > > > > > >>>> > >>> > > > > > > > > >>>> > >>> Did you read initial discussion [1] about > > compression? > > > > > > > > >>>> > >>> As far as I remember we agreed to add only some > > > > > "top-level" > > > > > > > API > > > > > > > > in > > > > > > > > >>>> > order > > > > > > > > >>>> > >>> to > > > > > > > > >>>> > >>> provide a way for > > > > > > > > >>>> > >>> Ignite users to inject some sort of custom > > > compression. > > > > > > > > >>>> > >>> > > > > > > > > >>>> > >>> > > > > > > > > >>>> > >>> [1] > > > > > > > > >>>> > >>> http://apache-ignite-developers.2346864.n4.nabble > . > > > > > > com/Data-c > > > > > > > > >>>> > >>> ompression-in-Ignite-2-0-td10099.html > > > > > > > > >>>> > >>> > > > > > > > > >>>> > >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs < > > > > > > > daradu...@gmail.com > > > > > > > > > > > > > > > > > >>>> > wrote: > > > > > > > > >>>> > >>> > > > > > > > > >>>> > >>> > Hi Igniters! > > > > > > > > >>>> > >>> > > > > > > > > > >>>> > >>> > I am interested in this task. > > > > > > > > >>>> > >>> > Provide some kind of pluggable compression SPI > > > support > > > > > > > > >>>> > >>> > <https://issues.apache.org/ > > jira/browse/IGNITE-3592> > > > > > > > > >>>> > >>> > > > > > > > > > >>>> > >>> > I developed a solution on > BinaryMarshaller-level, > > > but > > > > > > > reviewer > > > > > > > > >>>> has > > > > > > > > >>>> > >>> rejected > > > > > > > > >>>> > >>> > it. > > > > > > > > >>>> > >>> > > > > > > > > > >>>> > >>> > Let's continue discussion of task goals and > > solution > > > > > > design. > > > > > > > > >>>> > >>> > As I understood that, the main goal of this task > > is > > > to > > > > > > store > > > > > > > > >>>> data in > > > > > > > > >>>> > >>> > compressed form. > > > > > > > > >>>> > >>> > This is what I need from Ignite as its user. > > > > Compression > > > > > > > > >>>> provides > > > > > > > > >>>> > >>> economy > > > > > > > > >>>> > >>> > on > > > > > > > > >>>> > >>> > servers. > > > > > > > > >>>> > >>> > We can store more data on same servers at the > cost > > > of > > > > > > > > >>>> increasing CPU > > > > > > > > >>>> > >>> > utilization. > > > > > > > > >>>> > >>> > > > > > > > > > >>>> > >>> > I'm researching a possibility of implementation > of > > > > > > > compression > > > > > > > > >>>> at the > > > > > > > > >>>> > >>> > cache-level. > > > > > > > > >>>> > >>> > > > > > > > > > >>>> > >>> > Any thoughts? > > > > > > > > >>>> > >>> > > > > > > > > > >>>> > >>> > -- > > > > > > > > >>>> > >>> > Best regards, > > > > > > > > >>>> > >>> > Vyacheslav > > > > > > > > >>>> > >>> > > > > > > > > > >>>> > >>> > > > > > > > > > >>>> > >>> > > > > > > > > > >>>> > >>> > > > > > > > > > >>>> > >>> > -- > > > > > > > > >>>> > >>> > View this message in context: > > http://apache-ignite- > > > > > > > > >>>> > >>> > developers.2346864.n4.nabble. > > > com/Data-compression-in- > > > > > > > > >>>> > >>> > Ignite-2-0-tp10099p16317.html > > > > > > > > >>>> > >>> > Sent from the Apache Ignite Developers mailing > > list > > > > > > archive > > > > > > > at > > > > > > > > >>>> > >>> Nabble.com. > > > > > > > > >>>> > >>> > > > > > > > > > >>>> > >>> > > > > > > > > >>>> > >>> > > > > > > > > >>>> > >>> > > > > > > > > >>>> > >>> -- > > > > > > > > >>>> > >>> Alexey Kuznetsov > > > > > > > > >>>> > >>> > > > > > > > > >>>> > >> > > > > > > > > >>>> > >> > > > > > > > > >>>> > >> > > > > > > > > >>>> > >> -- > > > > > > > > >>>> > >> Best Regards, Vyacheslav > > > > > > > > >>>> > >> > > > > > > > > >>>> > > > > > > > > > > >>>> > > > > > > > > > > >>>> > > > > > > > > > > >>>> > > -- > > > > > > > > >>>> > > Best Regards, Vyacheslav > > > > > > > > >>>> > > > > > > > > > > >>>> > > > > > > > > > >>>> > > > > > > > > > >>>> > > > > > > > > > >>>> > -- > > > > > > > > >>>> > Best Regards, Vyacheslav > > > > > > > > >>>> > > > > > > > > > >>>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> -- > > > > > > > > >>> Best Regards, Vyacheslav > > > > > > > > >>> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> -- > > > > > > > > >> Best Regards, Vyacheslav > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Best Regards, Vyacheslav > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Best Regards, Vyacheslav > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Best Regards, Vyacheslav > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Best Regards, Anton Churaev > > > > > > > > > > > > > > > > > > > > > -- > > > > Best Regards, Vyacheslav > > > > > > > > > > > > > > > > -- > > > > > > Best Regards, Anton Churaev > > > > > > > > > > > -- > > Best Regards, Vyacheslav > > > > > > -- > > Best Regards, Anton Churaev > -- Best Regards, Vyacheslav