Is that a realistic strategy for low latency requirements? Imagine this
were some web service, and people generate this query at some reasonable
frequency.

(not that I know what Andrew is looking for, exactly)

2012/7/25 Yousuf Fauzan <yousuffau...@gmail.com>

> Since 500 is not that big a number, I think you can run that many M/Rs
> with each emitting only records having "time" greater than specified. Input
> would be {index, <<"bucket">>, <<"from_bin">>, <<"from_field_value">>}
>
> If you decide to split the data into separate buckets based on "from"
> field, input would be {index, <<"from_field_value">>, <<"time_bin">>,
> <<"time_low">>, <<"time_high">>}
>
>
> --
> Yousuf
>
> On Wed, Jul 25, 2012 at 6:35 PM, Andrew Kondratovich <
> andrew.kondratov...@gmail.com> wrote:
>
>> Hello,  Yousuf.
>>
>> Thanks for your reply.
>>
>> We have several millions of items. It's about 10 000 of unique 'from'
>> fields (about 1000 items for each). Usually, we need to get items for about
>> 500 'from' identifiers with 'time' limit (about 5% of items is
>> corresponding).
>>
>> On Wed, Jul 25, 2012 at 1:02 PM, Yousuf Fauzan <yousuffau...@gmail.com>wrote:
>>
>>> Hi Andrew,
>>>
>>> First of all, the correct answer to your question is the proverbial "it
>>> depends". Having said that, here is what I could do in your case
>>>
>>> 1. If there are enough data points with the same "from" field, I will
>>> make it a bucket and then index on time.
>>> 2. If the above is not true, I will index on "from" and "time" field.
>>>     a. If number of records where "time" is greater than the one your
>>> require is small, I will run a map/reduce with the initial input as those
>>> records.
>>>     b. If number of records having a particular "from" is small, I will
>>> do the above with the initial input as records having that "from" field.
>>> This could be a problem as Riak only supports range and exact queries so if
>>> you want to query multiple identifiers, you will have to run multiple
>>> queries.
>>>     In both the above cases, I will use secondary indexes to get the
>>> initial records.
>>>     Note that we are using M/R as Riak does not support querying by
>>> multiple indexes.
>>>
>>> What I would also suggest is to partition your data into different
>>> buckets. You will need to understand the queries that you will be
>>> supporting and partition it accordingly.
>>>
>>> --
>>> Yousuf
>>>
>>> On Wed, Jul 25, 2012 at 2:50 PM, Andrew Kondratovich <
>>> andrew.kondratov...@gmail.com> wrote:
>>>
>>>> Good afternoon.
>>>>
>>>> I am considering several storage solutions for my project, and now I
>>>> look at Riak.
>>>> We work with the following pattern of data:
>>>> {
>>>>   time: unixtime
>>>>   from: int
>>>>   data: binary
>>>>   ...
>>>> }
>>>>
>>>> The amount of data is about several millions items for now, but it's
>>>> growing. It is necessary to handle the folloring requests: for a list of
>>>> identifiers (about 500 items) return all records where id = from and time
>>>> greater than a certain value.
>>>>
>>>> How to store such data and to effectively handle such requests with the
>>>> Riak?
>>>>
>>>> Thanks.
>>>>
>>>> --
>>>> Andrew Kondratovich
>>>>
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users@lists.basho.com
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>>>
>>>
>>
>>
>> --
>> Andrew Kondratovich
>>
>>
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to