On Nov 12, 2011, at 2:32 PM, Gordon Tillman wrote: > Keith I have an idea that might work for you. This is a bit vague but I > would be glad to put together a more concrete example if you like.
Okay, thanks! Not sure I understand everything, though. > Use secondary indexes to tag each entry with the device id. I get the tagging part, but I'm not sure what the bucket and key being tagged would look like. Are you taking a single bucket for all data? put /buckets/mydata/keys/<device>-<timestamp> x-riak-index-device_bin: FF06541287AB Something like that? > You can then find all of the entries for a given device by using the the > secondary index to feed into a simple map phase operation that returns only > the entries that you want; i.e., those that are in a given time range. This I don't know how to do based on my reading of the docs. Something like: get /buckets/mydata/index/device_bin/FF345678912 which would return a list of .... what, device-timestamp compound keys? And then would I feed a potentially huge list of "bucket/key" pairs into a gigantic javascript query for the map-reduce phase? > In addition, to easily find all of the registered device ids easily you can > create one entry for each device. The key can be most anything (even the > device id if you encode it properly -- hash it), and you could tag each of > those entries with a secondary index whose field is something like "type" or > whatever and whose value is "deviceid". The value for each entry could be > just a simple text/plain value whose contents is just the device id of the > registered device. Okay, I think I get this: When a device comes in, just do something like: put /buckets/devices/<device-id> x-riak-index-type_bin: "device" When I want a list of device IDs, I can: get /buckets/devices/index/type_bin/device and get them all, right? This is more efficient than the various list functions? That makes sense to me. I guess I'll have to try a few examples and see what happens. What you're telling me is that what I want to do is possible, or is at least not pressing against Riak's particular trade-offs too much. Or at least I hope that's what you're telling me. ;) Keith > > --gordon > > On Nov 12, 2011, at 16:19 , Keith Irwin wrote: > >> Folks-- >> >> (Apologies up front for the length of this.) >> >> I'm wondering if you can let me know if Riak is a good fit for a simple >> not-quite-key-value scenario described below. MongoDB or (say) Postgresql >> seem a more natural fit conceptually, but I really, really like Riak's >> distribution strategy. >> >> ## context >> >> The basic overview is this: >> >> 50K devices push data once a second to web services which need to store that >> data in short-term storage (Riak). Once an hour, a sweeper needs to take an >> hour's worth of data per device (if there is any) and ship it off to long >> term storage, then delete it from short-term storage. Ideally, there'd only >> ever be slightly more than 1 hour's worth of data still in short-term >> storage for any given device. The goal is to write down the data as simply >> and safely as possible, with little or no processing on that data. >> >> Each second's worth of data is: >> >> * A device identifier >> * A timestamp (epoch seconds, integer) for the slice of time the data >> represents >> * An opaque blob of binary data (2 to 4k) >> >> Once an hour, I'd like to do something like: >> >> * For each device: >> * Find (and concat) all the data between time1 and time2 (an hour). >> * Move that data to long-term storage (not Riak) as a single blob. >> * Delete that data from Riak. >> >> For an SQL db, this is a really simple problem, conceptually. You can have a >> table with three columns: device-id, timestamp, blob. You can index the >> first two columns and roll up the data easily enough and then delete it via >> single SQL statements (or buffer as needed). The harder part is >> partitioning, replication, etc, etc. >> >> For MongoDB, it's also fairly simple. Just use a document with the same >> device-id, timestamp and binary-array data (as JSON), make sure indexes are >> declared, and query/delete just as in SQL. MongoDB provides sharding, >> replica-sets, recovery, etc. Set up, while less complicated than an RDBMS, >> still seems way more complicated than necessary. >> >> These solutions also provide sorting (which, while nice, isn't a requirement >> for my case). >> >> ## question >> >> I've been reading the Riak docs, and I'm just not sure if this simple >> "queryable" case can really fit all that well. I'm not so concerned about >> having to send 50K "deletes" to delete data. I'm more concerned about being >> able to find it. Given what I've written above, I may be blocked >> conceptually by the above index/query mentality such that I'm just not >> seeing the Riak way of doing things. >> >> Anyway, I can "tag" (via the secondary index feature) each blob of data with >> the device-id and the timestamp. I could then do a range query similar to: >> >> GET /buckets/devices/index/timestamp/start/end >> >> However, this doesn't allow me to group based on device-id. I could create a >> separate bucket for every device, such that I could do: >> >> GET /buckets/device-id/index/timestamp/start/end >> >> but if I do this, how can I get a list of the device-ids I need so that I >> can create that specific URL? The docs say listing buckets and keys is >> problematic. >> >> Might be that Riak just isn't a good case for this sort of thing, especially >> given I want to use it for short-term transient data, and that's fine. But I >> wanted to ask you all just to make sure that I'm not missing something >> somewhere. >> >> For instance, might link walking help? How about a map/reduce to find a >> unique list of device-ids within a given time-horizon, and a streaming map >> job to gather the data for export? Does that seem pretty reasonable? >> >> Thanks! >> >> Keith >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com