Solr Search With Apache Cassandra

2017-11-19 Thread @Nandan@
Hi All, How Solr Search affect the READ operation from Cassandra? I am having a table with 100 columns with Primary Key as UUID. Note:- I am having 100 columns in a single table because of implemented Advance search on multiple columns like E-commerce. Now my concerns are:- 1) whenever do READ fro

Re: How quickly we can bootstrap

2017-11-19 Thread Justin Cameron
If fast bootstrapping is a high priority then using less dense nodes is the tradeoff. For example, rather than having 5 nodes with 3TB you could deploy 15 (less powerful) nodes with 1TB each. Your cluster will have higher overall disk throughput which would allow you to increase the stream through

Re: Time series modeling in C* for range queries

2017-11-19 Thread Jon Haddad
Hi Junaid, I wrote a blog post a few months ago on massively scalable time series, going into a couple techniques on bucketing that you might find helpful. http://thelastpickle.com/blog/2017/08/02/time-series-data-modeling-massive-scale.html

Re: How quickly we can bootstrap

2017-11-19 Thread Jon Haddad
It sounds like you’re asking how to bootstrap without paying the cost of bootstrapping :) If you want to scale out, you’ll need to deal with the time it takes. You can’t add a node and have it up in 15 minutes, if you’re running 3 TB it’ll take a while. The exact amount of time depends largel

Re: How quickly we can bootstrap

2017-11-19 Thread Anshu Vajpayee
Adding more compute power means again vertical scaling. I understand this is one method to handle the load in case of increasing demand. But it doesn't match with philosophy of Cassandra for horizontal scaling. Hitting capacity cannot be restricted to only compute power. Also in case of node fai

Re: Time series modeling in C* for range queries

2017-11-19 Thread Justin Cameron
Hi Junaid, Using a "bucketing" key ("day") is the recommended way to limit the size of partitions. In your case you would probably need something like: PRIMARY KEY ((deviceid, day), datetime). Have you considered computing a running aggregate as the data comes into Cassandra? Rather than execute

Time series modeling in C* for range queries

2017-11-19 Thread Junaid Nasir
We are building a IoT platform where time series data from millions of devices is to be collected and then used to do some analytics pertaining to Business Intelligence/Analytics (BI/BA). Within the above context, we are running into the issue of have range based queries, where the granularity of