Hi, I come up with this idea of riak schema and wanted to know if you see
some bad sides of that.

So, I have those devices, which generate logs (these are documents with
stats). Each log is tagged with timestamp.

If we store it like that, in a bucket some-device-name, retrieving data is
pretty hard. I know that mapred bucket is faster and faster but it's not
there, yet.
User needs data from Time1 to Time2. Potentially we can keep Timestamps as
ints and just execute reads on every integer in this interval - but its not
good if it is long interval and we could have e.g. only 2 logs inside.

So the idea is to keep something like index per bucket. Logs would be
sequenced and PUT in device-name-bucket with key = counter. Counter gets
incremented with every message.
In the meantime, there is a parallel process (on timer?) measuring time.
Every minute (or any other frequency = highest resolution of the User's
query) it asks about current counter value and saves it under current
time-stamp in index_bucket. When User comes and asks: "Give me logs from
device1 from Time1 to Time2" we go to index_bucket retrieve Time1.value =
counter1 and Time2.value = counter2. And that's it! We now have exact
interval of keys to query on device-name-bucket. It may be 2 messages when
i.e. counter1=113 and counter2 = 115 or a million messages, but we won't
have any mis-hits in read. It looks pretty efficient.
Sequencing writes may be troubling (why to maintain some queues), so we can
have many buckets for a device each with its own index and counter.

Time1 or Time2 can be human readable so it also helps.

Do you like it? Or you maybe have some different approaches to this kind if
problem? Maybe I miss something?

cheers
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to