Dmitriy, This approach will also work with byte[] array and binary objects as well. It will be just new addition to binary object types, but behaviour will be same. What do you mean saying custom logic of splittiing blob? By default blob will be splitted to chunks of some size. This will be configurable parameter. The mechanism of uploading blobs will be different than we use in our cache API. E.g. we will upload blob to primary and backup nodes in parallel on the client side, instead of forwarding pieces of data from primary to backups. To operate with huge data and streaming data I don't think that our cache API will be good choice, as it doesn't have instruments to manage striped blobs (blob spreaded across several nodes) or streaming data (with undefined size). Most of the functionality in our cache API is inapplicable or unnecessary for big data manipulating. Instead of it I suggest to agree with Vladimir approach and try to refactor and use IGFS, if you want to operate only with BLOBs. As conclusion the whole storage implementation can be divided to several pieces: 1) Internal storage implementation + metadata cache over it. 2) Communication SPI for storage, upload/download sessions. 3) Applied blobs as part of binary objects, java objects, etc. 4) Binary/Web API to store BLOBs directly to storage and operations with them (pre/post processing, transformation, Map-Reduce). 5) Intergration with SQL
Alexey, In systems that operates with huge data such concept as rebalance is hard to implement and there is no big impact of it. If cluster grows, metadata of the blobs will be rebalanced automatically using already implemented mechanisms. But data will remain on previous place. Mechanism of data rebalancing should be implemented separately and should be used manually. 2018-08-04 7:49 GMT+03:00 Denis Magda <dma...@apache.org>: > Dmitriy, > > I would suggest us not limiting blobs use case to a dedicated cache. If to > look at other databases they usually have BLOB/LONGBLOB/etc. as a type > meaning that users mix simple and BLOB types in the same tables. > > Should we start with Ignite SQL adding blobs through its APIs? > > -- > Denis > > On Fri, Aug 3, 2018 at 1:52 PM Dmitriy Setrakyan <dsetrak...@apache.org> > wrote: > > > Pavel, > > > > Not everything that gets put in Ignite has a class, and not everything > can > > be annotated. For example, what about byte[] or a binary object? > > > > I would prefer a separate cache with specific purpose of storing blobs. > It > > will be easier for users to understand and configure. It can also have a > > custom logic of splitting a blob into multiple batches to be able to > > transfer it over the network faster. > > > > D. > > > > > > On Fri, Aug 3, 2018 at 3:21 AM, Pavel Kovalenko <jokse...@gmail.com> > > wrote: > > > > > Dmitriy, > > > > > > I think we don't need a separate implementation of cache like BLOB > cache. > > > Instead of it, a user can mark value class or value class field with > the > > > special annotation "@BLOB". > > > During cache put, marshaller will place a special placeholder on such > > > fields, write byte[] array payload of a field to special internal blob > > > storage and place the only reference to actual DataEntry in the page > > > memory. > > > During cache get, marshaller will place a special proxy instead of an > > > actual class that can be downloaded and unmarshalled by demand from > > > internal storage on the user side. > > > Using such approach we will also solve eager/lazy problem, this will > also > > > give user possibility to adjust his own marshallers (like Jackson, > Jaxb, > > > etc.) to marshal/unmarshal his blob classes from/to byte[] arrays. > > > No major changes in public API are required, it can be pluggable > > component. > > > > > > > > > 2018-08-03 0:25 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>: > > > > > > > On Thu, Aug 2, 2018 at 1:08 AM, Pavel Kovalenko <jokse...@gmail.com> > > > > wrote: > > > > > > > > > Dmitriy, > > > > > > > > > > I still don't understand why do you think that it will be file > > system? > > > > > In all my previous messages I emphasized that this storage > shouldn't > > be > > > > > considered as a file system. It's just a large data storage, whose > > > > entities > > > > > can be easily accessed using key/link (internally, or externally > > using > > > > > web/binary protocol interfaces). > > > > > > > > > > > Instead, if we must focus on large blobs, I would solve the > problem > > > of > > > > > supporting large blobs in regular Ignite caches, as I suggested > > before. > > > > > > > > > > This is impossible. Our page memory can't handle efficiently it by > > > > design. > > > > > > > > > > > > > But our API does. What is stopping us from creating a cache as a BLOB > > > cache > > > > and using whatever storage we need? > > > > > > > > > >