Cassandra Database using too much space

2014-12-14 Thread Chamila Wijayarathna
Hello all, We are trying to develop a language corpus by using Cassandra as its storage medium. https://gist.github.com/cdwijayarathna/7550176443ad2229fae0 shows the types of information we need to extract from corpus interface. So we designed schema at https://gist.github.com/cdwijayarathna/6491

Hinted handoff not working

2014-12-14 Thread Robert Wille
I have a cluster with RF=3. If I shut down one node, add a bunch of data to the cluster, I don’t see a bunch of records added to system.hints. Also, du of /var/lib/cassandra/data/system/hints of the nodes that are up shows that hints aren’t being stored. When I start the down node, its data does

Re: Hinted handoff not working

2014-12-14 Thread Rahul Neelakantan
http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__hinted_handoff_enabled Rahul > On Dec 14, 2014, at 9:46 AM, Robert Wille wrote: > > I have a cluster with RF=3. If I shut down one node, add a bunch of data to

Re: Cassandra Database using too much space

2014-12-14 Thread Ryan Svihla
Well your data model looks fine at a glance, a lot of tables, but they appear to be mapping to logically obvious query paths. This denormalization will make your queries fast but eat up more disk, and if disk is really a pain point, Id suggest looking at your economics a bit, and look at your trade

Re: Cassandra Database using too much space

2014-12-14 Thread Chamila Wijayarathna
Hi Ryan, Thank you very much. This helps a lot. On Sun, Dec 14, 2014 at 9:14 PM, Ryan Svihla wrote: > > Well your data model looks fine at a glance, a lot of tables, but they > appear to be mapping to logically obvious query paths. This denormalization > will make your queries fast but eat up mo

Re: Cassandra Database using too much space

2014-12-14 Thread Jack Krupansky
It looks like you will have quite a few “combinatoric explosions” to cope with. In addition to 1.5M words, you have bigrams – combinations of two and three words. You need to get a handle on the cardinality of each of your tables. Bigrams and trigrams could give you who knows how many millions

Access to locally partitioned data

2014-12-14 Thread Jason Kania
Hello, I am wondering if there is a way to obtain results from a table where only the results from the local partition are returned in the query? To give some background, my application requires millions of timers and since queue-like implementations are a bad fit/anti-pattern for Cassandra, I

Re: Cassandra Database using too much space

2014-12-14 Thread Chamila Wijayarathna
Hi Jack , Thanks for replying. Here what I meant by 1.5M words is not 1.5 Distincts words, it is the count of all words we added to the corpus (total word instances). Then in word_frequency and word_ordered_frequency CFs, we have a row for each distinct word with its frequency (two CFs have same

Re: Hinted handoff not working

2014-12-14 Thread Robert Wille
I’ve got "hinted_handoff_enabled: true" in cassandra.yaml. My settings are all default except for the DC, listen addresses and snitch. I should have mentioned this in my original post. On Dec 14, 2014, at 8:02 AM, Rahul Neelakantan wrote: > http://www.datastax.com/documentation/cassandra/2.0/c

Re: Hinted handoff not working

2014-12-14 Thread Jens Rantil
Hi Robert , Maybe you need to flush your memtables to actually see the disk usage increase? This applies to both hosts. Cheers, Jens On Sun, Dec 14, 2014 at 3:52 PM, Robert Wille wrote: > I have a cluster with RF=3. If I shut down one node, add a bunch of data to > the cluster, I don’t see a

What does the -node argument mean in Cassandra stress tool?

2014-12-14 Thread 孔嘉林
Hi, I am using Cassandra stress tool provided in the distribution 2.1.2. I wonder what does the "-node" argument mean. Dose it specify the cluster server node or stress client node? In the document, it says "Splitting up a load over multiple cassandra-stress instances on different nodes: This is