Hi All.
I am currently looking at using Riak as a data store for time series
data. Currently we get about 1.5T of data in JSON format that I intend
to persist in Riak. I am having some difficulty figuring out how to
model it such that I can fulfill the use cases I have been handed.
The data
S3 and we get notified (usually
on an hourly basis, some logs on a 10-min basis) so I can massage it
further but I am concerned that every place where I buffer is another
opportunity for losing data and I would like to avoid reprocessing as
much as possible.
Messages will already have the
erts off of the data whose
granularity is most likely going to be of the order of 10 mins . These
are just counters on a single time dimension so I am assuming that if I
get the model right I will this will be easy. Yes we can do this via EMR
but it also requires additional moving parts that we
any other questions.
Thanks so much for all the help. I think I have a pretty good idea as to
how to move forward.
Thanks again.
AM
Jason
On 24 Feb 2015, at 05:24, AM wrote:
On 2/22/15 6:16 PM, Jason Campbell wrote:
Coming at this from another angle, if you already have a permanent data st