Re: Deduplicating data on a node (RF=1)

2014-11-18 Thread Robert Coli
On Mon, Nov 17, 2014 at 12:04 PM, Alain Vandendorpe wrote: > With bootstrapping and initial compactions finished that node now has what > seems to be duplicate data, with almost exactly 2x the expected disk usage. > CQL returns correct results but we depend on the ability to directly read > the S

Re: Force purging of tombstones

2014-11-18 Thread Rahul Neelakantan
Is this page incorrect then and needs to be updated or am I interpreting it incorrectly ? http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_about_deletes_c.html Particularly this sentence "After data is marked with a tombstone, the data is automatically removed during the

Re: IF NOT EXISTS on UPDATE statements?

2014-11-18 Thread Sylvain Lebresne
On Mon, Nov 17, 2014 at 10:52 PM, Kevin Burton wrote: > There’s still a lot of weirdness in CQL. > > For example, you can do an INSERT with an UPDATE .. .which I’m generally > fine with. Kind of make sense. > > However, with INSERT you can do IF NOT EXISTS. > > … but you can’t do the same thing

Re: Repair completes successfully but data is still inconsistent

2014-11-18 Thread André Cruz
On 18 Nov 2014, at 01:08, Michael Shuler wrote: > > André, does `nodetool gossipinfo` show all the nodes in schema agreement? > Yes: $ nodetool -h XXX.XXX.XXX.XXX gossipinfo |grep -i schema SCHEMA:8ef63726-c845-3565-9851-91c0074a9b5e SCHEMA:8ef63726-c845-3565-9851-91c0074a9b5e SCHEMA:8ef

LCS: sstables grow larger

2014-11-18 Thread Andrei Ivanov
Dear all, I have the following problem: - C* 2.0.11 - LCS with default 160MB - Compacted partition maximum bytes: 785939 (for cf/table xxx.xxx) - Compacted partition mean bytes: 6750 (for cf/table xxx.xxx) I would expect the sstables to be of +- maximum 160MB. Despite this I see files like: 192M

Re: LCS: sstables grow larger

2014-11-18 Thread Marcus Eriksson
I suspect they are getting size tiered in L0 - if you have too many sstables in L0, we will do size tiered compaction on sstables in L0 to improve performance Use tools/bin/sstablemetadata to get the level for those sstables, if they are in L0, that is probably the reason. /Marcus On Tue, Nov 18

Re: LCS: sstables grow larger

2014-11-18 Thread Andrei Ivanov
Marcus, thanks a lot! It explains a lot those huge tables are indeed at L0. It seems that they start to appear as a result of some "massive" operations (join, repair, rebuild). What's their fate in the future? Will they continue to propagate like this through levels? Is there anything that can be

Re: LCS: sstables grow larger

2014-11-18 Thread Marcus Eriksson
No, they will get compacted into smaller sstables in L1+ eventually (once you have less than 32 sstables in L0, an ordinary L0 -> L1 compaction will happen) But, if you consistently get many files in L0 it means that compaction is not keeping up with your inserts and you should probably expand you

Re: LCS: sstables grow larger

2014-11-18 Thread Andrei Ivanov
OK, got it. Actually, my problem is not that we constantly having many files at L0. Normally, quite a few of them - that is, nodes are managing to compact incoming writes in a timely manner. But it looks like when we join a new node, it receives tons of files from existing nodes (and they end up

Counter Deletion in a window

2014-11-18 Thread Ahmy Yulrizka
Hi I'm very new with cassandra and I've created a schema like this CREATE TABLE statistics.stats_count ( id text, metric text, resolution text, time timestamp, value counter, PRIMARY KEY ((id, metric, resolution), time) ) Below is an example of data. This data basically j

Cassandra backup via snapshots in production

2014-11-18 Thread Ngoc Minh VO
Hello all, We are looking for a solution to backup data in our C* cluster (v2.0.x, 16 nodes, 4 x 500GB SSD, RF = 6 over 2 datacenters). The main purpose is to protect us from human errors (eg. unexpected manipulations: delete, drop tables, …). We are thinking of: - Backup: add a 2TB H

mysql based columnar DB to Cassandra DB - Migration

2014-11-18 Thread Akshay Ballarpure
I have one mysql based columnar DB, i want to migrate it to Cassandra. How its possible ? Best Regards Akshay Ballarpure Tata Consultancy Services Cell:- 9985084075 Mailto: akshay.ballarp...@tcs.com Website: http://www.tcs.com Experience

Re: IF NOT EXISTS on UPDATE statements?

2014-11-18 Thread Kevin Burton
> There is no way to mimic IF NOT EXISTS on UPDATE and it's not a bug. INSERT and UPDATE are not totally orthogonal in CQL and you should use INSERT for actual insertion and UPDATE for updates (granted, the database will not reject our query if you break this rule but it's nonetheless the way it's

Re: IF NOT EXISTS on UPDATE statements?

2014-11-18 Thread Robert Stupp
> > There is no way to mimic IF NOT EXISTS on UPDATE and it's not a bug. INSERT > > and UPDATE are not totally orthogonal > in CQL and you should use INSERT for actual insertion and UPDATE for updates > (granted, the database will not reject > our query if you break this rule but it's nonetheles

Re: LCS: sstables grow larger

2014-11-18 Thread Marcus Eriksson
you should stick to as small nodes as possible yes :) There are a few relevant tickets related to bootstrap and LCS: https://issues.apache.org/jira/browse/CASSANDRA-6621 - startup with -Dcassandra.disable_stcs_in_l0=true to not do STCS in L0 https://issues.apache.org/jira/browse/CASSANDRA-7460 - (

Re: Deduplicating data on a node (RF=1)

2014-11-18 Thread Alain Vandendorpe
Thanks all - a little clarification: - The node has fully joined at this point with the duplicates - Cleanup has been run on older nodes - Currently using LCS Rob - thanks for that, I was wondering whether either of those would successfully deduplicate the data. We were hypothesizing that a decom

Re: IF NOT EXISTS on UPDATE statements?

2014-11-18 Thread Brian O'Neill
FWIW ‹ we have the exact same need. And we have been struggling with the differences in CQL between UPDATE and INSERT. Our use case: We do in-memory dimensional aggregations that we want to write to C* using LWT. (so, it¹s a low-volume of writes, because we are doing aggregations across time w

Re: LCS: sstables grow larger

2014-11-18 Thread Andrei Ivanov
Thanks a lot for your support, Marcus - that is useful beyond all recognition!;-) And I will try #6621 right away. Sincerely, Andrei. On Tue, Nov 18, 2014 at 8:50 PM, Marcus Eriksson wrote: > you should stick to as small nodes as possible yes :) > > There are a few relevant tickets related to bo

Re: LCS: sstables grow larger

2014-11-18 Thread Andrei Ivanov
Amazing how I missed the -Dcassandra.disable_stcs_in_l0=true option - I have LeveledManifest source opened the whole day;-) On Tue, Nov 18, 2014 at 9:15 PM, Andrei Ivanov wrote: > Thanks a lot for your support, Marcus - that is useful beyond all > recognition!;-) And I will try #6621 right away.

Re: IF NOT EXISTS on UPDATE statements?

2014-11-18 Thread Robert Stupp
> > For (2), we would love to see: > UPSERT value=new_value where (not exists || value=read_value) > That would be something like "UPDATE … IF column=value OR NOT EXISTS“. Took at the C* source and that feels like a LHF (for 3.0) so I opened https://issues.apache.org/jira/browse/CASSANDRA-8335

Re: IF NOT EXISTS on UPDATE statements?

2014-11-18 Thread Brian O'Neill
Exactly. Perfect. Will do. Thanks Robert. -brian --- Brian O'Neill Chief Technology Officer Health Market Science The Science of Better Results 2700 Horizon Drive € King of Prussia, PA € 19406 M: 215.588.6024 € @boneill42 € healthmarketscience.com This i

Re: Trying to build Cassandra for FreeBSD 10.1

2014-11-18 Thread Michael Shuler
William and Graham - I appreciate the notes! Would both of you be so kind as to comment/attach/etc on that jira? I'm a bit out of my element on this particular topic, so this would be super helpful to get your insight :) https://issues.apache.org/jira/browse/CASSANDRA-8325 -- Michael On 11/

Re: Repair completes successfully but data is still inconsistent

2014-11-18 Thread Michael Shuler
`nodetool cleanup` also looks interesting as an option. -- Michael

Re: Trying to build Cassandra for FreeBSD 10.1

2014-11-18 Thread William Arbaugh
Happy to do so - but the ticket indicates that FreeBSD is unsupported and thus this is unlikely to get fixed. > On Nov 18, 2014, at 3:45 PM, Michael Shuler wrote: > > William and Graham - I appreciate the notes! > > Would both of you be so kind as to comment/attach/etc on that jira? I'm a bit

Re: Counter Deletion in a window

2014-11-18 Thread Tyler Hobbs
On Tue, Nov 18, 2014 at 8:38 AM, Ahmy Yulrizka wrote: > > > 1. Is this a proper use of a counter > It seems reasonable. > 2. would the delete operation has impact on performance ? > Depending on how you query the data, no. If you restrict the query to not cover times where you have deleted t

Re: read repair across DC and latency

2014-11-18 Thread Tyler Hobbs
On Sun, Nov 16, 2014 at 5:13 PM, Jimmy Lin wrote: > I have read that read repair suppose to be running as background, but > does the co-ordinator node need to wait for the response(along with other > normal read tasks) before return the entire result back to the caller? > For the 10% of request

Re: Working with legacy data via CQL

2014-11-18 Thread Tyler Hobbs
Thanks, I can reproduce the issue with that, and I should be able to look into it tomorrow. FWIW, I believe the issue is server-side, not in the driver. I may be able to suggest a workaround once I figure out what's going on. On Mon, Nov 17, 2014 at 6:02 AM, Erik Forsberg wrote: > On 2014-11-1

Re: Working with legacy data via CQL

2014-11-18 Thread Robert Coli
On Tue, Nov 18, 2014 at 4:02 PM, Tyler Hobbs wrote: > Thanks, I can reproduce the issue with that, and I should be able to look > into it tomorrow. FWIW, I believe the issue is server-side, not in the > driver. I may be able to suggest a workaround once I figure out what's > going on. > Is the

Re: Repair completes successfully but data is still inconsistent

2014-11-18 Thread Robert Coli
On Tue, Nov 18, 2014 at 12:46 PM, Michael Shuler wrote: > `nodetool cleanup` also looks interesting as an option. I don't understand why cleanup or scrub would help with a case where data is being un-tombstoned. " 1 November - column is deleted - gc_grace_period is 10 days 8 November - all 3 r

Removing commit log files

2014-11-18 Thread Jacob Rhoden
Hi Guys, Is it correct to assume that if you do a “nodetool drain” on a node and then shutdown a node, you can safely remove all commit logs on that node as long as all nodes are up? I have some VPS’s with low amounts of disk space that could do with it being recovered, I also assume this mea