It's worth mentioning that there are also already two FUSE drivers written
to work with Riak [1].  They haven't been touched for a while, but at least
one was used heavily in production [2], and they might be a good place to
start for your use case.

Mark

1 -
http://wiki.basho.com/Community-Developed-Libraries-and-Projects.html#Other-Tools-and-Projects
(towards the bottom of the list)
2 - https://github.com/crucially/riakfuse


On Mon, Sep 26, 2011 at 9:23 AM, Jonathan Langevin <
jlange...@loomlearning.com> wrote:

> If you were to continue to pursue the use of Riak for a distributed FS, and
> if you have any resources to toss at development, it may be possible to
> build a FUSE driver that acts as a Riak client. FUSE = filesystem in
> userspace, and can function across most any Linux/BSD variant (including Mac
> OS X).
>
> More info: http://en.wikipedia.org/wiki/Filesystem_in_Userspace
>
> There is also a list of FUSE drivers at the above URL, several of which
> mention "distributed" in the description. One of those may suffice for you
> (if you've not already reviewed them). Otherwise, you could possibly use
> their FUSE drivers as a basis for your own custom FUSE Riak driver.
>
>  <http://www.loomlearning.com/>
> * Jonathan Langevin
> Systems Administrator
> Loom Inc.
> Wilmington, NC: (910) 241-0433 - jlange...@loomlearning.com -
> www.loomlearning.com - Skype: intel352 *
>
>
>
> On Sun, Sep 25, 2011 at 4:29 PM, Jeremiah Peschka <
> jeremiah.pesc...@gmail.com> wrote:
>
>> Responses inline
>> ---
>> Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
>> Microsoft SQL Server MVP
>>
>> On Sep 25, 2011, at 5:30 AM, pille wrote:
>>
>> > hi,
>> >
>> > i'm quite new to riak and only know it from the docs available online.
>> > to be honest, i did not search for a key/value store, but for a reliable
>> (HA) distributed, replicated filesystem that allows dynamic growth.
>>
>> To be honest, what you're looking for is a SAN. EMC's Isilon line, Dell's
>> Equallogic, and HP's Lefthand devices all meet your needs very well. They
>> don't require a lot of administrative knowledge, they're easy to set up and
>> maintain, and they are very easy to expand. SANs provide the features and
>> functionality that you're looking for and won't require any additional
>> development or maintenance. Yes, they cost money, but they do just sorta
>> work straight out of the box.
>>
>> That being said, I answered the rest of these questions as if you weren't
>> willing to just throw a bucket of money and SAN gear at your problem.
>>
>> >
>> > all these filesystems i've dealt with are either immature, abandoned, or
>> are limited in features like dynamic scaling, snapshotting or fail in
>> out-of-diskspace scenarios (as they don't give you high availability and
>> data protection at the same time).
>> >
>> > somehow i stumbled upon this project and liked its features, despite not
>> being a filesystem at all. i can live with its flat structure if it'll bring
>> me all the other features i need.
>> >
>> > so i'm now at the point that after reading the online docs without any
>> hands-on experience leaves some questions unanswered.
>> > since i'm used to storing all data in a filesystem, our application's
>> storage interface would need a complete rewrite to interface with riak and
>> provide the same services as before. therefore i'd like to ask you to share
>> your knowledge and experience.
>> >
>> > 1) are snapshots provided?
>> >   i guess they aren't, but i'm more interested weather i can use the
>> vectorclocks for that.
>> >   i only need one snapshot and live data to provide an consistent old
>> view of the data for our staging instance.
>>
>> Snapshots are not provided. You could probably cook something up yourself,
>> but there's no snapshotting involved that I know of. Vector clocks are used
>> for determining object lineage and conflict resolution.
>>
>> >
>> > 2) how does riak deal with different storage capacities of the different
>> nodes? is it a problem, if some nodes provide less space than others? is
>> data distributed uniformly accross all nodes or is its capacity taken into
>> account?
>>
>> AFAIK, data is distributed evenly across a number of virtual nodes (64 by
>> default). Those virtual nodes are then distributed evenly across your
>> physical nodes. I don't know of a way to change this, but I've been very
>> wrong before.
>> >
>> > 3) we've got quite huge files for a database to store. is that a
>> problem? what storage backend do you propose?
>> >   currently we see the following distribution, but i expect more in the
>> range from 512MB to 4GB to come in future:
>> >         <   1KB: 64053
>> >     1KB -   1MB: 873795
>> >     1MB -   2MB: 4776
>> >     2MB -   4MB: 3131
>> >     4MB -   8MB: 3136
>> >     8MB -  16MB: 2842
>> >    16MB -  32MB: 3136
>> >    32MB -  64MB: 4032
>> >    64MB - 128MB: 3118
>> >   128MB - 256MB: 3361
>> >   256MB - 512MB: 3221
>> >   512MB -   1GB: 1423
>> >     1GB -   2GB: 75
>>
>> Riak KV's max acceptable performance size is about 64MB for a file, but
>> performance would probably start degrading before that. Luwak is an
>> application built on top of Riak that probably meets your needs a lot better
>> than plain old Riak KV: http://wiki.basho.com/Luwak.html
>>
>>
>> >
>> > 4) is range access possible to read parts of a file^W value or do i need
>> to stream the whole file through? this would not perform well on guge
>> values.
>>
>> With Luwak it's possible to get a portion of the object using the option
>> Range parameter: http://wiki.basho.com/HTTP-Fetch-Luwak-Object.html
>>
>> >
>> > 5) to reduce the impact of a disk failure on the storage backend and i'd
>> like each disk of a server to be assigned to its own riak-node. i guess
>> healing the failed node ofter replacement is faster than raid recovery and
>> less data is at risk.
>> >   is it possible to reflect the hardware hierarchy in some way to
>> influence the place for replicas? CephFS offers this to make sure replicas
>> are hold on different hardware or even in different locations.
>> >   e.g. a STORAGE is in a SERVER, which is in a RACK, which is in a
>> DATACENTER. replicas of a file in a STORAGE should never be placed inside
>> the same SERVER, (or RACK, or DATACENTER).
>>
>> You can purchase Riak EDS which has multi-site replication. Otherwise,
>> Riak is just going to throw data into N nodes in your cluster and it will be
>> up to you to make sure those nodes are in different racks.
>>
>> >
>> > 6) what happens, if less that R or W nodes report data? does it mean not
>> found or not available? even if the data is on an currently offline node.
>>
>> If less than R nodes are present, your write will fail. The R value means
>> "this many nodes have to respond with data for it to be considered a
>> successful read." Anything less than R would, thusly, mean there was a
>> failure.
>>
>> If less than W nodes are able to write data, a hinted handoff will occur.
>>
>> >
>> > 7) can he client applications connect to some random node?
>> >   should it simply retry the next one in the list upon failure?
>>
>> Client applications should connect to a random node, yes. Even better, you
>> should put a load balancing proxy server in front of your Riak cluster so
>> developers don't have to worry about writing their own load balancing code.
>>
>> I'd retry on failure, but that's up to you. ;)
>>
>> >
>> > 8) is the data reported back on read is compared/verifies with all
>> replicas to ensure consistency or just its metadata (if R>1)
>>
>> Yes, R nodes have to respond with *the same* copy of the data before a
>> read is successful. You can quickly do this by comparing vector clocks and
>> other assorted metadata.
>>
>> >
>> > 9) is data integrity in storage backend is secured through checksums?
>>
>> I think depends on the storage backend implementation. doing a quick grep
>> through the source code turns up the word "checksum" a lot, though.
>>
>> >
>> > these are the questions puzzling me at the moment.
>> > if you know some filesystem that matches my featurelist, please don't
>> hesitate to answer them off-topic ;-)
>>
>> Other options include HDFS and MogileFS (http://danga.com/mogilefs/).
>> Last.fm use MogileFS
>>
>> >
>> > cheers
>> >  pille
>> >
>> > _______________________________________________
>> > riak-users mailing list
>> > riak-users@lists.basho.com
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to