Re: Listing keys in a bucket

John Axel Eriksson Sat, 04 Sep 2010 12:53:05 -0700

Well... perhaps my use case isn't well suited for riak, not sure, but my 
reasoning was that using riak as file storage with attached
metadata would be a good way of ensuring file availability. I've also 
considered using S3 for this which also seems like a pretty
good choice, but I would then need to store metadata somewhere else.
I really liked the way I could query riak through mapred etc and of course the 
automatic replication which seemed like a good fit
for ensuring file availability and storage. I guess what I would like to be 
able to do is a file listing, of course not of ALL the files at
once but I would still need to run a query against ALL the files to get a 
listing of ones I want and the only way this seems possible
is to create an index of the files to query - which of course leads to the 
trouble of maintaining that index (i.e updating an index which is very
large). Using links might be a good choice but as I understand it, it isnt very 
wise to attach so many links to a key.


I was thinking of doing something like this when adding a new file:

/riak/metadata/somekey LINKS to /riak/files/somefile

PUT file10001 content in /riak/files/file10001 (adding it to another 10 000 
file keys)
PUT file10001 metadata in /riak/metadata/file10001 LINKING it to 
/riak/files/file10001 (adding it to another 10 000 metadata keys)

what I don't know how to do is:

either:
data = GET /riak/listing/index (which would contain 10 000 keys - i.e a large 
dataset)
data.push(file10001-link) (or something like that - updating the json document)

PUT /riak/listing/index (getting the index back into riak)

or using LINKS just
add the link to /riak/listing/index

then I should be able to run mapred against the index to get links to files out 
of if...

the problem is that it is not recommended to run queries against an entire 
bucket which I otherwise could
by querying the metadata bucket

adding 10 000 LINKS to a key isn't recommended

and updating an index as large as mine would get, would mean GET large dataset 
to application, update
large dataset in application, PUT large dataset back into riak - seems like an 
expensive and foolish way
of doing it.

If I could run mapred against an entire bucket I'd be fine I suppose, but that 
is discouraged since it would mean
listing all the keys which is an expensive operation.

So the only solution to this would be to store the index in some other kind of 
database, like MongoDB or MySQL which
I could of course - but I would much rather use riak for it if reasonable.

J


4 sep 2010 kl. 21.22 skrev Matthew Scott:

> You may want to consider how often you will be accessing all 10,000 files in 
> one query.
> 
> It is my understanding that with a key/value store such as Riak, it's a good 
> idea to analyze what your common queries will be as long as your frequency of 
> reading vs writing, and then to structure your data so that when you do a 
> write, you precompute and denormalize as needed to satisfy those queries.
> 
> For less frequent queries, you use map/reduce as needed to get those results, 
> then perhaps determine whether and for how long to cache those results.
> 
> This is of course different than the managed indexes given by other styles of 
> data stores, but is part of the tradeoff involved.
> 
> --
> Matthew Scott
> ElevenCraft, Inc.
> http://11craft.com/
> +1 360 389-2512
> 
> 
> On Sat, Sep 4, 2010 at 11:51, John Axel Eriksson <j...@insane.se> wrote:
> Yes, I've thought of this but as I understand it there is a bit of a problem 
> attaching a large amount of links to a key which
> would be necessary here am I right? If I had 10 000 files in riak that would 
> mean 10 000 links attached to the "listing" key.
> 
> 4 sep 2010 kl. 20.44 skrev Matthew Scott:
> 
>> 
>> On Sat, Sep 4, 2010 at 11:31, John Axel Eriksson <j...@insane.se> wrote:
>> Listing keys in a bucket has been described as "bad" and something you use 
>> in development but
>> not in production. I'm just starting out on Riak so I'm a newbie...
>> 
>> I'm thinking of building an application using Riak as filestorage and 
>> possibly much more than that, but it would
>> at least store lots of files with, perhaps, metadata attached. How would I 
>> then list files for display in a webapp if
>> I don't use key listing?
>> 
>> I'm still very much a Riak newb myself, but I'll take a shot at answering 
>> this one by suggesting the use of links.
>> 
>> From what I understand, links some quantity limits when you attach a large 
>> number, but adding and removing them is an inexpensive operation.  (Someone 
>> fact check me on that please :)
>> 
>> - Create a key called 'listing', perhaps even in its own bucket to prevent 
>> namespace collision.
>> 
>> - Create links from that 'listing' key to metadata keys.  Remember that you 
>> can attach a tag to each link to differentiate different types of links, 
>> such as "metadata".
>> 
>> - The metadata keys' values would contain file metadata in JSON form, and in 
>> turn have a link to the file contents key, tagged "contents".
>> 
>> - Remember to attach the proper mime type to the key containing your file 
>> contents.
>> 
>> - To get a file listing, do a map/reduce starting at "listing", following 
>> its "metadata" links and grabbing those values, then reduce to sort by a key.
>> 
>> - To get file contents, follow the "contents" link from the metadata key.
>> 
>> --
>> Matthew Scott
>> ElevenCraft, Inc.
>> http://11craft.com/
>> +1 360 389-2512
>> 
>>  
> 
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Listing keys in a bucket

Reply via email to