Is that a realistic strategy for low latency requirements? Imagine this were some web service, and people generate this query at some reasonable frequency.
(not that I know what Andrew is looking for, exactly) 2012/7/25 Yousuf Fauzan <yousuffau...@gmail.com> > Since 500 is not that big a number, I think you can run that many M/Rs > with each emitting only records having "time" greater than specified. Input > would be {index, <<"bucket">>, <<"from_bin">>, <<"from_field_value">>} > > If you decide to split the data into separate buckets based on "from" > field, input would be {index, <<"from_field_value">>, <<"time_bin">>, > <<"time_low">>, <<"time_high">>} > > > -- > Yousuf > > On Wed, Jul 25, 2012 at 6:35 PM, Andrew Kondratovich < > andrew.kondratov...@gmail.com> wrote: > >> Hello, Yousuf. >> >> Thanks for your reply. >> >> We have several millions of items. It's about 10 000 of unique 'from' >> fields (about 1000 items for each). Usually, we need to get items for about >> 500 'from' identifiers with 'time' limit (about 5% of items is >> corresponding). >> >> On Wed, Jul 25, 2012 at 1:02 PM, Yousuf Fauzan <yousuffau...@gmail.com>wrote: >> >>> Hi Andrew, >>> >>> First of all, the correct answer to your question is the proverbial "it >>> depends". Having said that, here is what I could do in your case >>> >>> 1. If there are enough data points with the same "from" field, I will >>> make it a bucket and then index on time. >>> 2. If the above is not true, I will index on "from" and "time" field. >>> a. If number of records where "time" is greater than the one your >>> require is small, I will run a map/reduce with the initial input as those >>> records. >>> b. If number of records having a particular "from" is small, I will >>> do the above with the initial input as records having that "from" field. >>> This could be a problem as Riak only supports range and exact queries so if >>> you want to query multiple identifiers, you will have to run multiple >>> queries. >>> In both the above cases, I will use secondary indexes to get the >>> initial records. >>> Note that we are using M/R as Riak does not support querying by >>> multiple indexes. >>> >>> What I would also suggest is to partition your data into different >>> buckets. You will need to understand the queries that you will be >>> supporting and partition it accordingly. >>> >>> -- >>> Yousuf >>> >>> On Wed, Jul 25, 2012 at 2:50 PM, Andrew Kondratovich < >>> andrew.kondratov...@gmail.com> wrote: >>> >>>> Good afternoon. >>>> >>>> I am considering several storage solutions for my project, and now I >>>> look at Riak. >>>> We work with the following pattern of data: >>>> { >>>> time: unixtime >>>> from: int >>>> data: binary >>>> ... >>>> } >>>> >>>> The amount of data is about several millions items for now, but it's >>>> growing. It is necessary to handle the folloring requests: for a list of >>>> identifiers (about 500 items) return all records where id = from and time >>>> greater than a certain value. >>>> >>>> How to store such data and to effectively handle such requests with the >>>> Riak? >>>> >>>> Thanks. >>>> >>>> -- >>>> Andrew Kondratovich >>>> >>>> >>>> _______________________________________________ >>>> riak-users mailing list >>>> riak-users@lists.basho.com >>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>> >>>> >>> >> >> >> -- >> Andrew Kondratovich >> >> > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com