Re: Ignite as distributed file storage

Pavel Kovalenko Sat, 04 Aug 2018 07:24:40 -0700

Dmitriy,

This approach will also work with byte[] array and binary objects as well.
It will be just new addition to binary object types, but behaviour will be
same.
What do you mean saying custom logic of splittiing blob? By default blob
will be splitted to chunks of some size. This will be configurable
parameter. The mechanism of uploading blobs will be different than we use
in our cache API. E.g. we will upload blob to primary and backup nodes in
parallel on the client side, instead of forwarding pieces of data from
primary to backups.
To operate with huge data and streaming data I don't think that our cache
API will be good choice, as it doesn't have instruments to manage striped
blobs (blob spreaded across several nodes) or streaming data (with
undefined size).
Most of the functionality in our cache API is inapplicable or unnecessary
for big data manipulating. Instead of it I suggest to agree with Vladimir
approach and try to refactor and use IGFS, if you want to operate only with
BLOBs.
As conclusion the whole storage implementation can be divided to several
pieces:
1) Internal storage implementation + metadata cache over it.
2) Communication SPI for storage, upload/download sessions.
3) Applied blobs as part of binary objects, java objects, etc.
4) Binary/Web API to store BLOBs directly to storage and operations with
them (pre/post processing, transformation, Map-Reduce).
5) Intergration with SQL


Alexey,

In systems that operates with huge data such concept as rebalance is hard
to implement and there is no big impact of it. If cluster grows, metadata
of the blobs will be rebalanced automatically using already implemented
mechanisms. But data will remain on previous place. Mechanism of data
rebalancing should be implemented separately and should be used manually.




2018-08-04 7:49 GMT+03:00 Denis Magda <dma...@apache.org>:

> Dmitriy,
>
> I would suggest us not limiting blobs use case to a dedicated cache. If to
> look at other databases they usually have BLOB/LONGBLOB/etc. as a type
> meaning that users mix simple and BLOB types in the same tables.
>
> Should we start with Ignite SQL adding blobs through its APIs?
>
> --
> Denis
>
> On Fri, Aug 3, 2018 at 1:52 PM Dmitriy Setrakyan <dsetrak...@apache.org>
> wrote:
>
> > Pavel,
> >
> > Not everything that gets put in Ignite has a class, and not everything
> can
> > be annotated. For example, what about byte[] or a binary object?
> >
> > I would prefer a separate cache with specific purpose of storing blobs.
> It
> > will be easier for users to understand and configure.  It can also have a
> > custom logic of splitting a blob into multiple batches to be able to
> > transfer it over the network faster.
> >
> > D.
> >
> >
> > On Fri, Aug 3, 2018 at 3:21 AM, Pavel Kovalenko <jokse...@gmail.com>
> > wrote:
> >
> > > Dmitriy,
> > >
> > > I think we don't need a separate implementation of cache like BLOB
> cache.
> > > Instead of it, a user can mark value class or value class field with
> the
> > > special annotation "@BLOB".
> > > During cache put, marshaller will place a special placeholder on such
> > > fields, write byte[] array payload of a field to special internal blob
> > > storage and place the only reference to actual DataEntry in the page
> > > memory.
> > > During cache get, marshaller will place a special proxy instead of an
> > > actual class that can be downloaded and unmarshalled by demand from
> > > internal storage on the user side.
> > > Using such approach we will also solve eager/lazy problem, this will
> also
> > > give user possibility to adjust his own marshallers (like Jackson,
> Jaxb,
> > > etc.) to marshal/unmarshal his blob classes from/to byte[] arrays.
> > > No major changes in public API are required, it can be pluggable
> > component.
> > >
> > >
> > > 2018-08-03 0:25 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>:
> > >
> > > > On Thu, Aug 2, 2018 at 1:08 AM, Pavel Kovalenko <jokse...@gmail.com>
> > > > wrote:
> > > >
> > > > > Dmitriy,
> > > > >
> > > > > I still don't understand why do you think that it will be file
> > system?
> > > > > In all my previous messages I emphasized that this storage
> shouldn't
> > be
> > > > > considered as a file system. It's just a large data storage, whose
> > > > entities
> > > > > can be easily accessed using key/link (internally, or externally
> > using
> > > > > web/binary protocol interfaces).
> > > > >
> > > > > > Instead, if we must focus on large blobs, I would solve the
> problem
> > > of
> > > > > supporting large blobs in regular Ignite caches, as I suggested
> > before.
> > > > >
> > > > > This is impossible. Our page memory can't handle efficiently it by
> > > > design.
> > > > >
> > > >
> > > > But our API does. What is stopping us from creating a cache as a BLOB
> > > cache
> > > > and using whatever storage we need?
> > > >
> > >
> >
>

Re: Ignite as distributed file storage

Reply via email to