Directory structure is changed in 2.1 to prevent various problems
caused by DROP/re-CREATE the same table
(https://issues.apache.org/jira/browse/CASSANDRA-5202).
>From NEWS.txt:
2.1
===
New features
...
- SSTable data directory name is slightly changed. Each directory will
See https://github.com/apache/cassandra/blob/trunk/NEWS.txt#L173
SSTable data directory name is slightly changed. Each directory will have hex
string appended after CF name, e.g. ks/cf-5be396077b811e3a3ab9dc4b9ac088d/
This hex string part represents unique ColumnFamily ID.
Note that existing
see https://github.com/apache/cassandra/blob/cassandra-2.1/NEWS.txt#L77
SSTable data directory name will have hex string appended after CF name
2015-04-29 13:04 GMT+08:00 Donald Smith :
> Using 2.1.4, tables in our data/ directory are showing up as
>
>
> our_table-147a2090ed4211e48
Hi all,
I have some fields that I am storing into Cassandra, but some of them could
be null at any given point. As there are quite a lot of them, it makes the
code much more readable if I don’t check each one for null before adding it
to the INSERT.
I can see a few Jiras around CQL 3 supporti
Inserting a null value creates a tombstone. Tombstones can have major
performance implications.
You can see the tombstones using sstable2json.
If you have a small number of records with null values this seems OK, otherwise
I recommend using the QueryBuilder (for Java clients) and waiting for
htt
We run between US/EU regions on AWS with more than 45ms latency without any
issues. Just use an appropriate amount of replicas in each datacenter and
make use of the appropriate consistency level (e.g local_quoram)
On Tue, Apr 28, 2015 at 2:43 PM, Daniels, Kelly
wrote:
> We will be anxious to co
Hi,
did anybody ever raise a feature request for selecting tombstones in
CQL/thrift?
It would be nice if I could use CQLSH to see where my tombstones are coming
from. This would much more convenient than using sstable2json.
Maybe someone can point me to an existing jira-ticket, but I also
apprec
Have you considered adding a 'toSafe' method which checks if the item is
null, and if so, returns a default value? E.g String too = safe(bar, ""); .
On Apr 29, 2015 3:14 PM, "Matthew Johnson" wrote:
> Hi all,
>
>
>
> I have some fields that I am storing into Cassandra, but some of them
> could be
The problem of NULL insert is already solved long time ago with Insert
Strategy in Achilles:
https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy
However, it's nice to see there will be a flag on the protocol side to
handle this problem
On Wed, Apr 29, 2015 at 2:27 PM, Ali Akhtar wrot
I’ve come across the same thing. I have a table with at least half a dozen
columns that could be null, in any combination. Having a prepared statement for
each permutation of null columns just isn’t going to happen. I don’t want to
build custom queries each time because I have a really cool syst
Thank you all for the advice!
I have decided to use the Insert query builder (
*com.datastax.driver.core.querybuilder.Insert*) which allows me to
dynamically insert as many or as few columns as I need, and doesn’t require
multiple prepared statements. Then, I will look at Ali’s suggestion – I
wi
We deployed a brand new 13 node 2.1.4 C* cluster and used sstabloader to stream
about 500GB into cassandra. The streaming took less than a day but afterwards
pending compactions do not decrease. The Cassandra nodes (which have about
500 pending compactions each) seem to spend most of their t
Hi,
I try to understand how to Cassandra supports data consistency and compare
it with other distributed caches. Hazelcast and Apache Ignite products have
primary and backups. This approach allows to support read/write consistency
if client code will read/write to primary node. User's operation wi
Secondary indicies are inefficient and are deprecated, as far as I know.
Unless you store many thousands of emails for a long time (which I recommend
against), just use a single table with the partition key being the userid and
the timestamp being the clustering (column) key, as in your schema.
Correct me if I'm wrong, but tombstones are only really problematic if you
have them going into clustering keys, then perform a range select on that
column, right (assuming it's not a symptom of the antipattern of
indefinitely overwriting the same value)? I.E. you're deleting clusters
off of a par
Enough tombstones can inflate the size of an SSTable causing issues during
compaction (imagine a multi tb sstable w/ 99% tombstones) even if there's
no clustering key defined.
Perhaps an edge case, but worth considering.
On Wed, Apr 29, 2015 at 9:17 AM Eric Stevens wrote:
> Correct me if I'm wr
There's a lot going on, reading through some docs is probably your best
bet:
http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
On Wed, Apr 29, 2015 at 8:57 AM Nikolay Tikhonov
wrote:
> Hi,
>
> I try to understand how to Cassandra supports data consistency and
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.5.
Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.
http://cassandra.apache.org/
Downloads of source an
But we're talking about a single tombstone on each of a finite (small) set
of values, right? We're not talking about INSERTs which are 99% nulls (at
least I don't think that's what Matthew was suggesting). Unless you're
engaging in the antipattern of repeated overwrite, I'm still struggling to
se
In a way, yes. A tombstone will only be removed after gc_grace iff the
compaction is sure that it contains all rows which that tombstone might
shadow. When two non-tombstone conflicting rows are compacted, it's always
just LWW.
On Wed, Apr 29, 2015 at 2:42 PM, Eric Stevens wrote:
> But we're tal
On Wed, Apr 29, 2015 at 9:16 AM, Eric Stevens wrote:
> In the end, inserting a tombstone into a non-clustered column shouldn't be
> appreciably worse (if it is at all) than inserting a value instead. Or am
> I missing something here?
>
There's thresholds (log messages, etc.) which operate on to
On Wed, Apr 29, 2015 at 8:56 AM, Nikolay Tikhonov wrote:
> I try to understand how to Cassandra supports data consistency and compare
> it with other distributed caches.
>
For the record, Cassandra is not a distributed cache.
=Rob
On Wed, Apr 29, 2015 at 8:40 AM, Donald Smith <
donald.sm...@audiencescience.com> wrote:
> We deployed a brand new 13 node 2.1.4 C* cluster and used sstabloader to
> stream about 500GB into cassandra. The streaming took less than a day but
> afterwards pending compactions do not decrease. The
On Wed, Apr 29, 2015 at 9:01 AM, Donald Smith <
donald.sm...@audiencescience.com> wrote:
> Secondary indicies are inefficient and are deprecated, as far as I know.
>
They are not deprecated, the correct summary is that they should only be
used in very particular circumstances. If you're not sure
Hi All,
We are planning to set up a cluster of 5 nodes with RF 3 for write heavy
project, our current database size is around 500 GB. And it is growing at
rate of 15 GB every day. We learnt that cassandra consumes space for
compaction processes, So how can we calculate the amount of disk space we
Hi Rahul,
If you are expecting 15 GB of data per day, here is the calculation.
1 Day = 15 GB, 1 Month = 450 GB, 1 Year = 5.4 TB, so your raw data size for
one year is 5.4 TB with replication factor of 3 it would be around 16.2 TB
of data for one year.
Taking compaction into consideration and you
26 matches
Mail list logo