subject:"Re\: best practices for time\-series data with massive amounts of records"

Re: best practices for time-series data with massive amounts of records

2015-03-07 Thread Eric Stevens

It's probably quite rare for extremely large time series data to be querying the whole set of data. Instead there's almost always a "Between X and Y dates" aspect to nearly every real time query you might have against a table like this (with the exception of "most recent N events"). Because of th

Re: best practices for time-series data with massive amounts of records

2015-03-06 Thread graham sanderson

Note that using static column(s) for the “head” value, and trailing TTLed values behind is something we’re considering. Note this is especially nice if your head state includes say a map which is updated by small deltas (individual keys) We have not yet studied the effect of static columns on s

Re: best practices for time-series data with massive amounts of records

2015-03-06 Thread Clint Kelly

Hi all, Thanks for the responses, this was very helpful. I don't know yet what the distribution of clicks and users will be, but I expect to see a few users with an enormous amount of interactions and most users having very few. The idea of doing some additional manual partitioning, and then mai

Re: best practices for time-series data with massive amounts of records

2015-03-03 Thread mck

> Here "partition" is a random digit from 0 to (N*M) > where N=nodes in cluster, and M=arbitrary number. Hopefully it was obvious, but here (unless you've got hot partitions), you don't need N. ~mck

Re: best practices for time-series data with massive amounts of records

2015-03-03 Thread Yulian Oifa

Hello You can use timeuuid as raw key and create sepate CF to be used for indexing Indexing CF may be either with user_id as key , or a better approach is to partition row by timestamp. In case of partition you can create compound key , in which you will store user_id and timestamp base ( for examp

Re: best practices for time-series data with massive amounts of records

2015-03-03 Thread mck

Clint, > CREATE TABLE events ( > id text, > date text, // Could also use year+month here or year+week or something else > event_time timestamp, > event blob, > PRIMARY KEY ((id, date), event_time)) > WITH CLUSTERING ORDER BY (event_time DESC); > > The downside of this approach is that w

Re: best practices for time-series data with massive amounts of records

2015-03-03 Thread Jack Krupansky

I'd recommend using 100K and 10M as rough guidelines for the maximum number of rows and bytes in a single partition. Sure, Cassandra can technically handle a lot more than that, but very large partitions can make your life more difficult. Of course you will have to do a POC to validate the sweet sp

Re: best practices for time-series data with massive amounts of records

2015-03-03 Thread Jens Rantil

Hi, I have not done something similar, however I have some comments: On Mon, Mar 2, 2015 at 8:47 PM, Clint Kelly wrote: > The downside of this approach is that we can no longer do a simple > continuous scan to get all of the events for a given user. > Sure, but would you really do that real ti

Re: best practices for time-series data with massive amounts of records

Re: best practices for time-series data with massive amounts of records

Re: best practices for time-series data with massive amounts of records

Re: best practices for time-series data with massive amounts of records

Re: best practices for time-series data with massive amounts of records

Re: best practices for time-series data with massive amounts of records

Re: best practices for time-series data with massive amounts of records

Re: best practices for time-series data with massive amounts of records

8 matches

Site Navigation

Mail list logo

Footer information