Well... perhaps my use case isn't well suited for riak, not sure, but my reasoning was that using riak as file storage with attached metadata would be a good way of ensuring file availability. I've also considered using S3 for this which also seems like a pretty good choice, but I would then need to store metadata somewhere else. I really liked the way I could query riak through mapred etc and of course the automatic replication which seemed like a good fit for ensuring file availability and storage. I guess what I would like to be able to do is a file listing, of course not of ALL the files at once but I would still need to run a query against ALL the files to get a listing of ones I want and the only way this seems possible is to create an index of the files to query - which of course leads to the trouble of maintaining that index (i.e updating an index which is very large). Using links might be a good choice but as I understand it, it isnt very wise to attach so many links to a key.
I was thinking of doing something like this when adding a new file: /riak/metadata/somekey LINKS to /riak/files/somefile PUT file10001 content in /riak/files/file10001 (adding it to another 10 000 file keys) PUT file10001 metadata in /riak/metadata/file10001 LINKING it to /riak/files/file10001 (adding it to another 10 000 metadata keys) what I don't know how to do is: either: data = GET /riak/listing/index (which would contain 10 000 keys - i.e a large dataset) data.push(file10001-link) (or something like that - updating the json document) PUT /riak/listing/index (getting the index back into riak) or using LINKS just add the link to /riak/listing/index then I should be able to run mapred against the index to get links to files out of if... the problem is that it is not recommended to run queries against an entire bucket which I otherwise could by querying the metadata bucket adding 10 000 LINKS to a key isn't recommended and updating an index as large as mine would get, would mean GET large dataset to application, update large dataset in application, PUT large dataset back into riak - seems like an expensive and foolish way of doing it. If I could run mapred against an entire bucket I'd be fine I suppose, but that is discouraged since it would mean listing all the keys which is an expensive operation. So the only solution to this would be to store the index in some other kind of database, like MongoDB or MySQL which I could of course - but I would much rather use riak for it if reasonable. J 4 sep 2010 kl. 21.22 skrev Matthew Scott: > You may want to consider how often you will be accessing all 10,000 files in > one query. > > It is my understanding that with a key/value store such as Riak, it's a good > idea to analyze what your common queries will be as long as your frequency of > reading vs writing, and then to structure your data so that when you do a > write, you precompute and denormalize as needed to satisfy those queries. > > For less frequent queries, you use map/reduce as needed to get those results, > then perhaps determine whether and for how long to cache those results. > > This is of course different than the managed indexes given by other styles of > data stores, but is part of the tradeoff involved. > > -- > Matthew Scott > ElevenCraft, Inc. > http://11craft.com/ > +1 360 389-2512 > > > On Sat, Sep 4, 2010 at 11:51, John Axel Eriksson <j...@insane.se> wrote: > Yes, I've thought of this but as I understand it there is a bit of a problem > attaching a large amount of links to a key which > would be necessary here am I right? If I had 10 000 files in riak that would > mean 10 000 links attached to the "listing" key. > > 4 sep 2010 kl. 20.44 skrev Matthew Scott: > >> >> On Sat, Sep 4, 2010 at 11:31, John Axel Eriksson <j...@insane.se> wrote: >> Listing keys in a bucket has been described as "bad" and something you use >> in development but >> not in production. I'm just starting out on Riak so I'm a newbie... >> >> I'm thinking of building an application using Riak as filestorage and >> possibly much more than that, but it would >> at least store lots of files with, perhaps, metadata attached. How would I >> then list files for display in a webapp if >> I don't use key listing? >> >> I'm still very much a Riak newb myself, but I'll take a shot at answering >> this one by suggesting the use of links. >> >> From what I understand, links some quantity limits when you attach a large >> number, but adding and removing them is an inexpensive operation. (Someone >> fact check me on that please :) >> >> - Create a key called 'listing', perhaps even in its own bucket to prevent >> namespace collision. >> >> - Create links from that 'listing' key to metadata keys. Remember that you >> can attach a tag to each link to differentiate different types of links, >> such as "metadata". >> >> - The metadata keys' values would contain file metadata in JSON form, and in >> turn have a link to the file contents key, tagged "contents". >> >> - Remember to attach the proper mime type to the key containing your file >> contents. >> >> - To get a file listing, do a map/reduce starting at "listing", following >> its "metadata" links and grabbing those values, then reduce to sort by a key. >> >> - To get file contents, follow the "contents" link from the metadata key. >> >> -- >> Matthew Scott >> ElevenCraft, Inc. >> http://11craft.com/ >> +1 360 389-2512 >> >> > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com