Re: Need Column Family Schema Suggestion

2016-01-27 Thread Jack Krupansky
If the goal is to maximize performance/throughput, you need to assure that data is contiguous as much as possible. IOW, so you can ask Cassandra for a slice of consecutive rows rather than require slow and expensive scanning. Typically this means careful attention to partition keys so that the data

Re: Need Column Family Schema Suggestion

2016-01-26 Thread srungarapu vamsi
Jack, This is one of the analytics jobs i have to run. For the given problem, i want to optimize the schema so that instead of loading the data as rdd to spark machines , i want to get the direct number from cassandra queries. The rationale behind this logic is i want to save on spark machine types

Re: Need Column Family Schema Suggestion

2016-01-26 Thread Jack Krupansky
Step 1 in data modeling in Cassandra is to define all of your queries. Are these in fact the ONLY queries that you need? If you are doing significant analytics, Spark is indeed the way to go. Cassandra works best for point queries and narrow slice queries (sequence of consecutive rows within a si

Need Column Family Schema Suggestion

2016-01-26 Thread srungarapu vamsi
Hi, I have the following use case: A product (P) has 3 or more Devices associated with it. Each device (Di) emits a set of names (size of the set is less than or equal to 250) every minute. Now the ask is: Compute the function f(product,hour) which is defined as follows: *foo*(*product*,*hour*) = N