Re: Ignite as distributed file storage

Vladimir Ozerov Sat, 30 Jun 2018 12:20:19 -0700

Pavel,

Can you provide competitive analysis with other storage solutions? What
products will we compete with? What would be our advantages against them?


I talked to several folks working on solutions involving video and image
processing. They are rarely use any databases or grids. Neither they need
transactions, sync backups, etc. Instead, this is more about hardware, load
balancing, etc. IMO this is completely out of the scope of Ignite. This is
why we need concrete usage scenarios to explain why we need it.

Also please note that most use cases of full text search, XML and JSON do
not need any special storage. They only need new index types. And efficient
CLOBs / BLOBs is a matter of moving these pieces out of BinaryObject. None
of these require anything radically new in the product.

сб, 30 июня 2018 г. в 20:59, Pavel Kovalenko <jokse...@gmail.com>:

> Dmitriy,
>
> Yes, I have approximate design in my mind. The main idea is that we already
> have distributed cache for files metadata (our Atomic cache), the data flow
> and distribution will be controlled by our AffinityFunction and Baseline.
> We're already have discovery and communication to make such local files
> storages to be synced. The files data will be separated to large blocks
> (64-128Mb) (which looks very similar to our WAL). Each block can contain
> one or more file chunks. The tablespace (segment ids, offsets and etc.)
> will be stored to our regular page memory. This is key ideas to implement
> first version of such storage. We already have similiar components in our
> persistence, so this experience can be reused to develop such storage.
>
> Denis,
>
> Nothing significant should be changed at our memory level. It will be
> separate, pluggable component over cache. Most of the functions which give
> performance boost can be delegated to OS level (Memory mapped files, DMA,
> Direct write from Socket to disk and vice versa). Ignite and File Storage
> can develop independetly of each other.
>
> Alexey Stelmak, which has a great experience with developing such systems
> can provide more low level information about how it should look.
>
> сб, 30 июн. 2018 г. в 19:40, Dmitriy Setrakyan <dsetrak...@apache.org>:
>
> > Pavel, it definitely makes sense. Do you have a design in mind?
> >
> > D.
> >
> > On Sat, Jun 30, 2018, 07:24 Pavel Kovalenko <jokse...@gmail.com> wrote:
> >
> > > Igniters,
> > >
> > > I would like to start a discussion about designing a new feature
> because
> > I
> > > think it's time to start making steps towards it.
> > > I noticed, that some of our users have tried to store large homogenous
> > > entries (> 1, 10, 100 Mb/Gb/Tb) to our caches, but without big success.
> > >
> > > IGFS project has the possibility to do it, but as for me it has one big
> > > disadvantage - it's in-memory only, so users have a strict size limit
> of
> > > their data and have data loss problem.
> > >
> > > Our durable memory has a possibility to persist a data that doesn't fit
> > to
> > > RAM to disk, but page structure of it is not supposed to store large
> > pieces
> > > of data.
> > >
> > > There are a lot of projects of distributed file systems like HDFS,
> > > GlusterFS, etc. But all of them concentrate to implement high-grade
> file
> > > protocol, rather than user-friendly API which leads to high entry
> > threshold
> > > to start implementing something over it.
> > > We shouldn't go in this way. Our main goal should be providing to user
> > easy
> > > and fast way to use file storage and processing here and now.
> > >
> > > If take HDFS as closest possible by functionality project, we have one
> > big
> > > advantage against it. We can use our caches as files metadata storage
> and
> > > have the infinite possibility to scale it, while HDFS is bounded by
> > > Namenode capacity and has big problems with keeping a large number of
> > files
> > > in the system.
> > >
> > > We achieved very good experience with persistence when we developed our
> > > durable memory, and we can couple together it and experience with
> > services,
> > > binary protocol, I/O and start to design a new IEP.
> > >
> > > Use cases and features of the project:
> > > 1) Storing XML, JSON, BLOB, CLOB, images, videos, text, etc without
> > > overhead and data loss possibility.
> > > 2) Easy, pluggable, fast and distributed file processing,
> transformation
> > > and analysis. (E.g. ImageMagick processor for images transformation,
> > > LuceneIndex for texts, whatever, it's bounded only by your
> imagination).
> > > 3) Scalability out of the box.
> > > 4) User-friendly API and minimal steps to start using this storage in
> > > production.
> > >
> > > I repeated again, this project is not supposed to be a high-grade
> > > distributed file system with full file protocol support.
> > > This project should primarily focus on target users, which would like
> to
> > > use it without complex preparation.
> > >
> > > As for example, a user can deploy Ignite with such storage and
> web-server
> > > with REST API as Ignite service and get scalable, performant image
> server
> > > out of the box which can be accessed using any programming language.
> > >
> > > As a far target goal, we should focus on storing and processing a very
> > > large amount of the data like movies, streaming, which is the big trend
> > > today.
> > >
> > > I would like to say special thanks to our community members Alexey
> > Stelmak
> > > and Dmitriy Govorukhin which significantly helped me to put together
> all
> > > pieces of that puzzle.
> > >
> > > So, I want to hear your opinions about this proposal.
> > >
> >
>

Re: Ignite as distributed file storage

Reply via email to