Re: diverting riak as a filesystem replacement

Jonathan Langevin Mon, 26 Sep 2011 10:45:42 -0700

*
That fuse driver appears to be a bit more complete that the ruby-version I
linked earlier, nice find Mark.


Another possible option, is to simply use a solution such as Amazon's
EBS<http://aws.amazon.com/ebs/>+ S3. You would use S3 for snapshot
backups to ensure data persistence.

 <http://www.loomlearning.com/>
 Jonathan Langevin
I.T. Manager
Loom Inc.
Wilmington, NC: (910) 241-0433 - jlange...@loomlearning.com -
www.loomlearning.com - Skype: intel352
*


On Mon, Sep 26, 2011 at 12:39 PM, Mark Phillips <m...@basho.com> wrote:

> It's worth mentioning that there are also already two FUSE drivers written
> to work with Riak [1].  They haven't been touched for a while, but at least
> one was used heavily in production [2], and they might be a good place to
> start for your use case.
>
> Mark
>
> 1 -
> http://wiki.basho.com/Community-Developed-Libraries-and-Projects.html#Other-Tools-and-Projects
>  (towards the bottom of the list)
> 2 - https://github.com/crucially/riakfuse
>
>
> On Mon, Sep 26, 2011 at 9:23 AM, Jonathan Langevin <
> jlange...@loomlearning.com> wrote:
>
>> If you were to continue to pursue the use of Riak for a distributed FS,
>> and if you have any resources to toss at development, it may be possible to
>> build a FUSE driver that acts as a Riak client. FUSE = filesystem in
>> userspace, and can function across most any Linux/BSD variant (including Mac
>> OS X).
>>
>> More info: http://en.wikipedia.org/wiki/Filesystem_in_Userspace
>>
>> There is also a list of FUSE drivers at the above URL, several of which
>> mention "distributed" in the description. One of those may suffice for you
>> (if you've not already reviewed them). Otherwise, you could possibly use
>> their FUSE drivers as a basis for your own custom FUSE Riak driver.
>>
>>  <http://www.loomlearning.com/>
>> * Jonathan Langevin
>> Systems Administrator
>> Loom Inc.
>> Wilmington, NC: (910) 241-0433 - jlange...@loomlearning.com -
>> www.loomlearning.com - Skype: intel352 *
>>
>>
>>
>> On Sun, Sep 25, 2011 at 4:29 PM, Jeremiah Peschka <
>> jeremiah.pesc...@gmail.com> wrote:
>>
>>> Responses inline
>>> ---
>>> Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
>>> Microsoft SQL Server MVP
>>>
>>> On Sep 25, 2011, at 5:30 AM, pille wrote:
>>>
>>> > hi,
>>> >
>>> > i'm quite new to riak and only know it from the docs available online.
>>> > to be honest, i did not search for a key/value store, but for a
>>> reliable (HA) distributed, replicated filesystem that allows dynamic growth.
>>>
>>> To be honest, what you're looking for is a SAN. EMC's Isilon line, Dell's
>>> Equallogic, and HP's Lefthand devices all meet your needs very well. They
>>> don't require a lot of administrative knowledge, they're easy to set up and
>>> maintain, and they are very easy to expand. SANs provide the features and
>>> functionality that you're looking for and won't require any additional
>>> development or maintenance. Yes, they cost money, but they do just sorta
>>> work straight out of the box.
>>>
>>> That being said, I answered the rest of these questions as if you weren't
>>> willing to just throw a bucket of money and SAN gear at your problem.
>>>
>>> >
>>> > all these filesystems i've dealt with are either immature, abandoned,
>>> or are limited in features like dynamic scaling, snapshotting or fail in
>>> out-of-diskspace scenarios (as they don't give you high availability and
>>> data protection at the same time).
>>> >
>>> > somehow i stumbled upon this project and liked its features, despite
>>> not being a filesystem at all. i can live with its flat structure if it'll
>>> bring me all the other features i need.
>>> >
>>> > so i'm now at the point that after reading the online docs without any
>>> hands-on experience leaves some questions unanswered.
>>> > since i'm used to storing all data in a filesystem, our application's
>>> storage interface would need a complete rewrite to interface with riak and
>>> provide the same services as before. therefore i'd like to ask you to share
>>> your knowledge and experience.
>>> >
>>> > 1) are snapshots provided?
>>> >   i guess they aren't, but i'm more interested weather i can use the
>>> vectorclocks for that.
>>> >   i only need one snapshot and live data to provide an consistent old
>>> view of the data for our staging instance.
>>>
>>> Snapshots are not provided. You could probably cook something up
>>> yourself, but there's no snapshotting involved that I know of. Vector clocks
>>> are used for determining object lineage and conflict resolution.
>>>
>>> >
>>> > 2) how does riak deal with different storage capacities of the
>>> different nodes? is it a problem, if some nodes provide less space than
>>> others? is data distributed uniformly accross all nodes or is its capacity
>>> taken into account?
>>>
>>> AFAIK, data is distributed evenly across a number of virtual nodes (64 by
>>> default). Those virtual nodes are then distributed evenly across your
>>> physical nodes. I don't know of a way to change this, but I've been very
>>> wrong before.
>>> >
>>> > 3) we've got quite huge files for a database to store. is that a
>>> problem? what storage backend do you propose?
>>> >   currently we see the following distribution, but i expect more in the
>>> range from 512MB to 4GB to come in future:
>>> >         <   1KB: 64053
>>> >     1KB -   1MB: 873795
>>> >     1MB -   2MB: 4776
>>> >     2MB -   4MB: 3131
>>> >     4MB -   8MB: 3136
>>> >     8MB -  16MB: 2842
>>> >    16MB -  32MB: 3136
>>> >    32MB -  64MB: 4032
>>> >    64MB - 128MB: 3118
>>> >   128MB - 256MB: 3361
>>> >   256MB - 512MB: 3221
>>> >   512MB -   1GB: 1423
>>> >     1GB -   2GB: 75
>>>
>>> Riak KV's max acceptable performance size is about 64MB for a file, but
>>> performance would probably start degrading before that. Luwak is an
>>> application built on top of Riak that probably meets your needs a lot better
>>> than plain old Riak KV: http://wiki.basho.com/Luwak.html
>>>
>>>
>>> >
>>> > 4) is range access possible to read parts of a file^W value or do i
>>> need to stream the whole file through? this would not perform well on guge
>>> values.
>>>
>>> With Luwak it's possible to get a portion of the object using the option
>>> Range parameter: http://wiki.basho.com/HTTP-Fetch-Luwak-Object.html
>>>
>>> >
>>> > 5) to reduce the impact of a disk failure on the storage backend and
>>> i'd like each disk of a server to be assigned to its own riak-node. i guess
>>> healing the failed node ofter replacement is faster than raid recovery and
>>> less data is at risk.
>>> >   is it possible to reflect the hardware hierarchy in some way to
>>> influence the place for replicas? CephFS offers this to make sure replicas
>>> are hold on different hardware or even in different locations.
>>> >   e.g. a STORAGE is in a SERVER, which is in a RACK, which is in a
>>> DATACENTER. replicas of a file in a STORAGE should never be placed inside
>>> the same SERVER, (or RACK, or DATACENTER).
>>>
>>> You can purchase Riak EDS which has multi-site replication. Otherwise,
>>> Riak is just going to throw data into N nodes in your cluster and it will be
>>> up to you to make sure those nodes are in different racks.
>>>
>>> >
>>> > 6) what happens, if less that R or W nodes report data? does it mean
>>> not found or not available? even if the data is on an currently offline
>>> node.
>>>
>>> If less than R nodes are present, your write will fail. The R value means
>>> "this many nodes have to respond with data for it to be considered a
>>> successful read." Anything less than R would, thusly, mean there was a
>>> failure.
>>>
>>> If less than W nodes are able to write data, a hinted handoff will occur.
>>>
>>> >
>>> > 7) can he client applications connect to some random node?
>>> >   should it simply retry the next one in the list upon failure?
>>>
>>> Client applications should connect to a random node, yes. Even better,
>>> you should put a load balancing proxy server in front of your Riak cluster
>>> so developers don't have to worry about writing their own load balancing
>>> code.
>>>
>>> I'd retry on failure, but that's up to you. ;)
>>>
>>> >
>>> > 8) is the data reported back on read is compared/verifies with all
>>> replicas to ensure consistency or just its metadata (if R>1)
>>>
>>> Yes, R nodes have to respond with *the same* copy of the data before a
>>> read is successful. You can quickly do this by comparing vector clocks and
>>> other assorted metadata.
>>>
>>> >
>>> > 9) is data integrity in storage backend is secured through checksums?
>>>
>>> I think depends on the storage backend implementation. doing a quick grep
>>> through the source code turns up the word "checksum" a lot, though.
>>>
>>> >
>>> > these are the questions puzzling me at the moment.
>>> > if you know some filesystem that matches my featurelist, please don't
>>> hesitate to answer them off-topic ;-)
>>>
>>> Other options include HDFS and MogileFS (http://danga.com/mogilefs/).
>>> Last.fm use MogileFS
>>>
>>> >
>>> > cheers
>>> >  pille
>>> >
>>> > _______________________________________________
>>> > riak-users mailing list
>>> > riak-users@lists.basho.com
>>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: diverting riak as a filesystem replacement

Reply via email to