Re: Tables showing up as our_table-147a2090ed4211e480153bc81e542ebd/ in data dir

2015-04-29 Thread Yuki Morishita
Directory structure is changed in 2.1 to prevent various problems caused by DROP/re-CREATE the same table ( >From NEWS.txt: 2.1 === New features ... - SSTable data directory name is slightly changed. Each directory will

RE: Tables showing up as our_table-147a2090ed4211e480153bc81e542ebd/ in data dir

2015-04-29 Thread Peer, Oded
See SSTable data directory name is slightly changed. Each directory will have hex string appended after CF name, e.g. ks/cf-5be396077b811e3a3ab9dc4b9ac088d/ This hex string part represents unique ColumnFamily ID. Note that existing

Re: Tables showing up as our_table-147a2090ed4211e480153bc81e542ebd/ in data dir

2015-04-29 Thread Phil Yang
see SSTable data directory name will have hex string appended after CF name 2015-04-29 13:04 GMT+08:00 Donald Smith : > Using 2.1.4, tables in our data/ directory are showing up as > > > our_table-147a2090ed4211e48

Inserting null values

2015-04-29 Thread Matthew Johnson
Hi all, I have some fields that I am storing into Cassandra, but some of them could be null at any given point. As there are quite a lot of them, it makes the code much more readable if I don’t check each one for null before adding it to the INSERT. I can see a few Jiras around CQL 3 supporti

RE: Inserting null values

2015-04-29 Thread Peer, Oded
Inserting a null value creates a tombstone. Tombstones can have major performance implications. You can see the tombstones using sstable2json. If you have a small number of records with null values this seems OK, otherwise I recommend using the QueryBuilder (for Java clients) and waiting for htt

Re: minimum bandwidth requirement between two Geo Redundant sites of Cassandra database

2015-04-29 Thread Alex Major
We run between US/EU regions on AWS with more than 45ms latency without any issues. Just use an appropriate amount of replicas in each datacenter and make use of the appropriate consistency level (e.g local_quoram) On Tue, Apr 28, 2015 at 2:43 PM, Daniels, Kelly wrote: > We will be anxious to co

Query returning tombstones

2015-04-29 Thread horschi
Hi, did anybody ever raise a feature request for selecting tombstones in CQL/thrift? It would be nice if I could use CQLSH to see where my tombstones are coming from. This would much more convenient than using sstable2json. Maybe someone can point me to an existing jira-ticket, but I also apprec

Re: Inserting null values

2015-04-29 Thread Ali Akhtar
Have you considered adding a 'toSafe' method which checks if the item is null, and if so, returns a default value? E.g String too = safe(bar, ""); . On Apr 29, 2015 3:14 PM, "Matthew Johnson" wrote: > Hi all, > > > > I have some fields that I am storing into Cassandra, but some of them > could be

Re: Inserting null values

2015-04-29 Thread DuyHai Doan
The problem of NULL insert is already solved long time ago with Insert Strategy in Achilles: However, it's nice to see there will be a flag on the protocol side to handle this problem On Wed, Apr 29, 2015 at 2:27 PM, Ali Akhtar wrot

Re: Inserting null values

2015-04-29 Thread Robert Wille
I’ve come across the same thing. I have a table with at least half a dozen columns that could be null, in any combination. Having a prepared statement for each permutation of null columns just isn’t going to happen. I don’t want to build custom queries each time because I have a really cool syst

RE: Inserting null values

2015-04-29 Thread Matthew Johnson
Thank you all for the advice! I have decided to use the Insert query builder ( *com.datastax.driver.core.querybuilder.Insert*) which allows me to dynamically insert as many or as few columns as I need, and doesn’t require multiple prepared statements. Then, I will look at Ali’s suggestion – I wi

Cassandra hanging in IntervalTree.comparePoints() and in CompactionController.maxPurgeableTimestamp()

2015-04-29 Thread Donald Smith
We deployed a brand new 13 node 2.1.4 C* cluster and used sstabloader to stream about 500GB into cassandra. The streaming took less than a day but afterwards pending compactions do not decrease. The Cassandra nodes (which have about 500 pending compactions each) seem to spend most of their t


2015-04-29 Thread Nikolay Tikhonov
Hi, I try to understand how to Cassandra supports data consistency and compare it with other distributed caches. Hazelcast and Apache Ignite products have primary and backups. This approach allows to support read/write consistency if client code will read/write to primary node. User's operation wi

RE: Data Modelling Help

2015-04-29 Thread Donald Smith
Secondary indicies are inefficient and are deprecated, as far as I know. Unless you store many thousands of emails for a long time (which I recommend against), just use a single table with the partition key being the userid and the timestamp being the clustering (column) key, as in your schema.

Re: Inserting null values

2015-04-29 Thread Eric Stevens
Correct me if I'm wrong, but tombstones are only really problematic if you have them going into clustering keys, then perform a range select on that column, right (assuming it's not a symptom of the antipattern of indefinitely overwriting the same value)? I.E. you're deleting clusters off of a par

Re: Inserting null values

2015-04-29 Thread Jonathan Haddad
Enough tombstones can inflate the size of an SSTable causing issues during compaction (imagine a multi tb sstable w/ 99% tombstones) even if there's no clustering key defined. Perhaps an edge case, but worth considering. On Wed, Apr 29, 2015 at 9:17 AM Eric Stevens wrote: > Correct me if I'm wr

Re: Consistency

2015-04-29 Thread Jonathan Haddad
There's a lot going on, reading through some docs is probably your best bet: On Wed, Apr 29, 2015 at 8:57 AM Nikolay Tikhonov wrote: > Hi, > > I try to understand how to Cassandra supports data consistency and

[RELEASE] Apache Cassandra 2.1.5 released

2015-04-29 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra version 2.1.5. Apache Cassandra is a fully distributed database. It is the right choice when you need scalability and high availability without compromising performance. Downloads of source an

Re: Inserting null values

2015-04-29 Thread Eric Stevens
But we're talking about a single tombstone on each of a finite (small) set of values, right? We're not talking about INSERTs which are 99% nulls (at least I don't think that's what Matthew was suggesting). Unless you're engaging in the antipattern of repeated overwrite, I'm still struggling to se

Re: Inserting null values

2015-04-29 Thread Philip Thompson
In a way, yes. A tombstone will only be removed after gc_grace iff the compaction is sure that it contains all rows which that tombstone might shadow. When two non-tombstone conflicting rows are compacted, it's always just LWW. On Wed, Apr 29, 2015 at 2:42 PM, Eric Stevens wrote: > But we're tal

Re: Inserting null values

2015-04-29 Thread Robert Coli
On Wed, Apr 29, 2015 at 9:16 AM, Eric Stevens wrote: > In the end, inserting a tombstone into a non-clustered column shouldn't be > appreciably worse (if it is at all) than inserting a value instead. Or am > I missing something here? > There's thresholds (log messages, etc.) which operate on to

Re: Consistency

2015-04-29 Thread Robert Coli
On Wed, Apr 29, 2015 at 8:56 AM, Nikolay Tikhonov wrote: > I try to understand how to Cassandra supports data consistency and compare > it with other distributed caches. > For the record, Cassandra is not a distributed cache. =Rob

Re: Cassandra hanging in IntervalTree.comparePoints() and in CompactionController.maxPurgeableTimestamp()

2015-04-29 Thread Robert Coli
On Wed, Apr 29, 2015 at 8:40 AM, Donald Smith <> wrote: > We deployed a brand new 13 node 2.1.4 C* cluster and used sstabloader to > stream about 500GB into cassandra. The streaming took less than a day but > afterwards pending compactions do not decrease. The

Re: Data Modelling Help

2015-04-29 Thread Robert Coli
On Wed, Apr 29, 2015 at 9:01 AM, Donald Smith <> wrote: > Secondary indicies are inefficient and are deprecated, as far as I know. > They are not deprecated, the correct summary is that they should only be used in very particular circumstances. If you're not sure

calculation of disk size

2015-04-29 Thread Rahul Bhardwaj
Hi All, We are planning to set up a cluster of 5 nodes with RF 3 for write heavy project, our current database size is around 500 GB. And it is growing at rate of 15 GB every day. We learnt that cassandra consumes space for compaction processes, So how can we calculate the amount of disk space we

Re: calculation of disk size

2015-04-29 Thread arun sirimalla
Hi Rahul, If you are expecting 15 GB of data per day, here is the calculation. 1 Day = 15 GB, 1 Month = 450 GB, 1 Year = 5.4 TB, so your raw data size for one year is 5.4 TB with replication factor of 3 it would be around 16.2 TB of data for one year. Taking compaction into consideration and you