RE: nodetool cleanup - compaction remaining time

2018-09-07 Thread Steinmaurer, Thomas
I have created https://issues.apache.org/jira/browse/CASSANDRA-14701 Please adapt as needed. Thanks! Thomas From: Jeff Jirsa Sent: Donnerstag, 06. September 2018 07:52 To: cassandra Subject: Re: nodetool cleanup - compaction remaining time Probably worth a JIRA (especially if you can repro

RE: nodetool cleanup - compaction remaining time

2018-09-06 Thread Steinmaurer, Thomas
Alain, compaction throughput is set to 32. Regards, Thomas From: Alain RODRIGUEZ Sent: Donnerstag, 06. September 2018 11:50 To: user cassandra.apache.org Subject: Re: nodetool cleanup - compaction remaining time Hello Thomas. Be aware that this behavior happens when the compaction

Re: nodetool cleanup - compaction remaining time

2018-09-06 Thread Alain RODRIGUEZ
> > As far as I can remember, if you have unthrottled compaction, then the > message is different: it says "n/a". Ah right! I am now completely convinced this needs a JIRA as well (indeed, if it's not fixed in C*3+, as Jeff mentioned). Thanks for the feedback Alex. Le jeu. 6 sept. 2018 à 11:06,

Re: nodetool cleanup - compaction remaining time

2018-09-06 Thread Oleksandr Shulgin
On Thu, Sep 6, 2018 at 11:50 AM Alain RODRIGUEZ wrote: > > Be aware that this behavior happens when the compaction throughput is set > to *0 *(unthrottled/unlimited). I believe the estimate uses the speed > limit for calculation (which is often very much wrong anyway). > As far as I can remember

Re: nodetool cleanup - compaction remaining time

2018-09-06 Thread Alain RODRIGUEZ
Hello Thomas. Be aware that this behavior happens when the compaction throughput is set to *0 *(unthrottled/unlimited). I believe the estimate uses the speed limit for calculation (which is often very much wrong anyway). I just meant to say, you might want to make sure that it's due to cleanup ty

Re: nodetool cleanup - compaction remaining time

2018-09-05 Thread Jeff Jirsa
Probably worth a JIRA (especially if you can repro in 3.0 or higher, since 2.1 is critical fixes only) On Wed, Sep 5, 2018 at 10:46 PM Steinmaurer, Thomas < thomas.steinmau...@dynatrace.com> wrote: > Hello, > > > > is it a known issue / limitation that cleanup compactions aren’t counted > in the

Re: nodetool cleanup in parallel

2017-09-26 Thread kurt greaves
correct. you can run it in parallel across many nodes if you have capacity. generally see about a 10% CPU increase from cleanups which isn't a big deal if you have the capacity to handle it + the io. on that note on later versions you can specify -j to run multiple cleanup compactions at the same

Re: Nodetool cleanup doesn't work

2017-05-11 Thread Jai Bheemsen Rao Dhanwada
ok thank you On Thu, May 11, 2017 at 1:11 PM, Jeff Jirsa wrote: > No, it's not expected, but it's pretty obvious from reading the code > what'll happen. Opened https://issues.apache.org/jira/browse/CASSANDRA- > 13526 > > > > > > On Thu, May 11, 2017 at 12:53 PM, Jai Bheemsen Rao Dhanwada < > jai

Re: Nodetool cleanup doesn't work

2017-05-11 Thread Jeff Jirsa
No, it's not expected, but it's pretty obvious from reading the code what'll happen. Opened https://issues.apache.org/jira/browse/CASSANDRA-13526 On Thu, May 11, 2017 at 12:53 PM, Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > Yes I have many keyspaces which are not spread across

Re: Nodetool cleanup doesn't work

2017-05-11 Thread Jai Bheemsen Rao Dhanwada
Yes I have many keyspaces which are not spread across all the data centers(expected by design). In this case, is this the expected behavior cleanup will not work for all the keyspaces(nodetool cleanup)? is it going to be fixed in the latest versions? P.S: Thanks for the tip, I can workaround this

Re: Nodetool cleanup doesn't work

2017-05-11 Thread Jeff Jirsa
If you didn't explicitly remove a keyspace from one of your datacenters, the next most likely cause is that you have one keyspace that's NOT replicated to one of the datacenters. You can work around this by running 'nodetool cleanup ' on all of your other keyspaces individually, skipping the one th

Re: Nodetool cleanup doesn't work

2017-05-11 Thread Jai Bheemsen Rao Dhanwada
Thanks Jeff, I have a C* cluster spread across multiple datacenter. reason for cleanup : I added multiple nodes to cluster and need to run cleanup on old nodes so that the redundant data is cleaned-up. On Thu, May 11, 2017 at 11:08 AM, Jeff Jirsa wrote: > > > On 2017-05-10 22:44 (-0700), Jai Bh

Re: Nodetool cleanup doesn't work

2017-05-11 Thread Jeff Jirsa
On 2017-05-10 22:44 (-0700), Jai Bheemsen Rao Dhanwada wrote: > Hello, > > I am running into an issue where *nodetool cleanup *fails to cleanup data. > We are running 2.1.16 version of Cassandra. > > > [user@host ~]$ nodetool cleanup > Aborted cleaning up atleast one column family in keyspa

Re: Nodetool cleanup error - cannot run before a node has joined the ring

2017-02-10 Thread Simone Franzini
Thank you Michael. Well, this was apparently my bad. 1. nodetool connects to the local JMX port 7199, which is indeed running on localhost in my case. 2. I did a few more attempts, the message "Aborted cleaning up atleast one column family in keyspace " only appears in the DC where is not repli

Re: Nodetool cleanup error - cannot run before a node has joined the ring

2017-02-10 Thread Michael Shuler
By default, yes, nodetool connects to localhost, which your log entries show. Use `nodetool -h $PRIV_IP cleanup ...` to connect to that private IP it's listening on. `nodetool help cleanup` for all options. -- Kind regards, Michael On 02/10/2017 02:22 PM, Simone Franzini wrote: > I am running DS

Re: Re : Nodetool Cleanup on multiple nodes in parallel

2015-10-09 Thread sai krishnam raju potturi
thanks Jonathan. I see a advantage in doing it one AZ or rack at a time. On Thu, Oct 8, 2015 at 6:41 PM, Jonathan Haddad wrote: > My hunch is the bigger your cluster the less impact it will have, as each > node takes part in smaller and smaller % of total queries. Considering > that compaction

Re: Re : Nodetool Cleanup on multiple nodes in parallel

2015-10-08 Thread Jonathan Haddad
My hunch is the bigger your cluster the less impact it will have, as each node takes part in smaller and smaller % of total queries. Considering that compaction is always happening, I'd wager if you've got a big cluster (as you say you do) you'll probably be ok running several cleanups at a time.

Re: Re : Nodetool Cleanup on multiple nodes in parallel

2015-10-08 Thread sai krishnam raju potturi
We plan to do it during non-peak hours when customer traffic is less. That sums up to 10 nodes a day, which is concerning as we have other data centers to be expanded eventually. Since cleanup is similar to compaction, which is CPU intensive and will effect reads if this data center were to serve

Re: Re : Nodetool Cleanup on multiple nodes in parallel

2015-10-08 Thread Jonathan Haddad
Unless you're close to running out of disk space, what's the harm in it taking a while? How big is your DC? At 45 min per node, you can do 32 nodes a day. Diverting traffic away from a DC just to run cleanup feels like overkill to me. On Thu, Oct 8, 2015 at 2:39 PM sai krishnam raju potturi <

Re : Nodetool Cleanup on multiple nodes in parallel

2015-10-08 Thread sai krishnam raju potturi
hi; our cassandra cluster currently uses DSE 4.6. The underlying cassandra version is 2.0.14. We are planning on adding multiple nodes to one of our datacenters. This requires "nodetool cleanup". The "nodetool cleanup" operation takes around 45 mins for each node. Datastax documentation recomm

Re: Nodetool cleanup takes long time and no progress

2015-07-24 Thread Robert Coli
On Fri, Jul 24, 2015 at 5:03 PM, rock zhang wrote: > It already 2 hours, only progress is 6%, seems it is very slow. Is there > any way to speedup ? > Cleanup is a type of compaction; it obeys the compaction throttle. > If I interrupted the process, what gonna happen ? Next time it just > co

Re: Nodetool cleanup takes long time and no progress

2015-07-24 Thread rock zhang
Hi Jeff, It already 2 hours, only progress is 6%, seems it is very slow. Is there any way to speedup ? If I interrupted the process, what gonna happen ? Next time it just compact again, right ? I think by default setting is the compaction occurs every day . Thanks Rock On Jul 24, 2015,

Re: Nodetool cleanup takes long time and no progress

2015-07-24 Thread rock zhang
Thank you Jeff. I just added one more node, so i want to delete moved tokens. ubuntu@ip-172-31-30-145:~$ nodetool compactionstats pending tasks: 1413 compaction type keyspace table completed totalunit progress Cleanuprawdata raw_data 25817918778 5

Re: Nodetool cleanup takes long time and no progress

2015-07-24 Thread Jeff Jirsa
You can check for progress using `nodetool compactionstats` (which will show Cleanup tasks), or check for ‘Cleaned up’ messages in the log (/var/log/cassandra/system.log). However, `nodetool cleanup` has a very specific and limited task - it deletes data no longer owned by the node, typically a

Re: nodetool cleanup error

2015-03-31 Thread Marcus Eriksson
It should work on 2.0.13. If it fails with that assertion, you should just retry. If that does not work, and you can reproduce this, please file a ticket /Marcus On Tue, Mar 31, 2015 at 9:33 AM, Amlan Roy wrote: > Hi, > > Thanks for the reply. Since nodetool cleanup is not working even after >

Re: nodetool cleanup error

2015-03-31 Thread Amlan Roy
Hi, Thanks for the reply. Since nodetool cleanup is not working even after upgrading to 2.0.13, is it recommended to go to an older version (2.0.11 for example, with 2.0.12 also it did not work). Is there any other way of cleaning data from existing nodes after adding a new node. Regards, Amla

Re: nodetool cleanup error

2015-03-30 Thread Yuki Morishita
Looks like the issue is https://issues.apache.org/jira/browse/CASSANDRA-9070. On Mon, Mar 30, 2015 at 6:25 PM, Robert Coli wrote: > On Mon, Mar 30, 2015 at 4:21 PM, Amlan Roy wrote: >> >> Thanks for the reply. I have upgraded to 2.0.13. Now I get the following >> error. > > > If cleanup is still

Re: nodetool cleanup error

2015-03-30 Thread Robert Coli
On Mon, Mar 30, 2015 at 4:21 PM, Amlan Roy wrote: > Thanks for the reply. I have upgraded to 2.0.13. Now I get the following > error. > If cleanup is still excepting for you on 2.0.13 with some sstables you have, I would strongly consider : 1) file a JIRA (http://issues.apache.org) and attach /

Re: nodetool cleanup error

2015-03-30 Thread Amlan Roy
Hi, Thanks for the reply. I have upgraded to 2.0.13. Now I get the following error. Regards, Amlan Exception in thread "main" java.lang.AssertionError: [SSTableReader(path='/data/1/cassandra/data/xxx/xxx/xxx.db'), SSTableReader(path='/data/1/cassandra/data/xxx/xxx/xxx.db')] at org.apa

Re: nodetool cleanup error

2015-03-30 Thread Jeff Ferland
Code problem that was patched in https://issues.apache.org/jira/browse/CASSANDRA-8716 . Upgrade to 2.0.13 > On Mar 30, 2015, at 1:12 PM, Amlan Roy wrote: > > Hi, > > I have added new nodes to an existing cluster and ran the “nodetool clea

Re: nodetool cleanup error

2015-03-30 Thread Duncan Sands
Hi Amlan, On 30/03/15 22:12, Amlan Roy wrote: Hi, I have added new nodes to an existing cluster and ran the “nodetool cleanup”. I am getting the following error. Wanted to know if there is any solution to it. Regards, Amlan Error occurred during cleanup java.util.concurrent.ExecutionException

Re: Nodetool cleanup on vnode cluster removes more data then wanted

2014-01-30 Thread Sylvain Lebresne
On Thu, Jan 30, 2014 at 3:23 AM, Edward Capriolo wrote: > Is this only a ByteOrderPartitioner problem? > No, see the comments on https://issues.apache.org/jira/browse/CASSANDRA-6638for more details. -- Sylvain > > > On Wed, Jan 29, 2014 at 7:34 PM, Tyler Hobbs wrote: > >> Ignace, >> >> Thanks

Re: Nodetool cleanup on vnode cluster removes more data then wanted

2014-01-29 Thread Edward Capriolo
Is this only a ByteOrderPartitioner problem? On Wed, Jan 29, 2014 at 7:34 PM, Tyler Hobbs wrote: > Ignace, > > Thanks for reporting this. I've been able to reproduce the issue with a > unit test, so I opened > https://issues.apache.org/jira/browse/CASSANDRA-6638. I'm not 100% sure > if your f

Re: Nodetool cleanup on vnode cluster removes more data then wanted

2014-01-29 Thread Tyler Hobbs
Ignace, Thanks for reporting this. I've been able to reproduce the issue with a unit test, so I opened https://issues.apache.org/jira/browse/CASSANDRA-6638. I'm not 100% sure if your fix is the correct one, but I should be able to get it fixed quickly and figure out the full set of cases where a

Re: nodetool cleanup / TTL

2014-01-09 Thread Aaron Morton
> Is there some other mechanism for forcing expired data to be removed without > also compacting? (major compaction having obvious problematic side effects, > and user defined compaction being significant work to script up). Tombstone compactions may help here https://issues.apache.org/jira/brow

Re: nodetool cleanup / TTL

2014-01-08 Thread Sylvain Lebresne
> > >> Is there some other mechanism for forcing expired data to be removed > without also compacting? (major compaction having obvious problematic side > effects, and user defined compaction being significant work to script up). > > Online scrubs will, as a side effect, purge expired tombstones *w

Re: nodetool cleanup / TTL

2014-01-07 Thread Chris Burroughs
On 01/07/2014 01:38 PM, Tyler Hobbs wrote: On Tue, Jan 7, 2014 at 7:49 AM, Chris Burroughs wrote: This has not reached a consensus in #cassandra in the past. Does `nodetool cleanup` also remove data that has expired from a TTL? No, cleanup only removes rows that the node is not a replica fo

Re: nodetool cleanup / TTL

2014-01-07 Thread Tyler Hobbs
On Tue, Jan 7, 2014 at 7:49 AM, Chris Burroughs wrote: > This has not reached a consensus in #cassandra in the past. Does > `nodetool cleanup` also remove data that has expired from a TTL? No, cleanup only removes rows that the node is not a replica for. -- Tyler Hobbs DataStax

Re: Nodetool cleanup

2013-11-29 Thread Julien Campan
Thanks a lot for yours answers. 2013/11/29 John Sanda > Couldn't another reason for doing cleanup sequentially be to avoid data > loss? If data is being streamed from a node during bootstrap and cleanup is > run too soon, couldn't you wind up in a situation with data loss if the new > node be

Re: Nodetool cleanup

2013-11-28 Thread John Sanda
Couldn't another reason for doing cleanup sequentially be to avoid data loss? If data is being streamed from a node during bootstrap and cleanup is run too soon, couldn't you wind up in a situation with data loss if the new node being bootstrapped goes down (permanently)? On Thu, Nov 28, 2013 at

Re: Nodetool cleanup

2013-11-28 Thread Aaron Morton
> I hope I get this right :) Thanks for contributing :) > a repair will trigger a mayor compaction on your node which will take up a > lot of CPU and IO performance. It needs to do this to build up the data > structure that is used for the repair. After the compaction this is streamed > to the

Re: Nodetool cleanup

2013-11-25 Thread Artur Kronenberg
Hi Julien, I hope I get this right :) a repair will trigger a mayor compaction on your node which will take up a lot of CPU and IO performance. It needs to do this to build up the data structure that is used for the repair. After the compaction this is streamed to the different nodes in order

Re: nodetool cleanup

2012-10-23 Thread B. Todd Burruss
since SSTABLEs are immutable, it must create new SSTABLEs without the data that the node is no longer a replica for ... but it doesn't remove deleted data. seems like a possible optimization to also removed deleted data and tombstone cleanup ... but i guess cleanup shouldn't really be used that mu

Re: nodetool cleanup

2012-10-23 Thread aaron morton
> what is the internal memory model used? It sounds like it doesn't have a page > manager? Nodetool cleanup is a maintenance process to remove data on disk that the node is no longer a replica for. It is typically used after the token assignments have been changed. Cheers - Aa

Re: nodetool cleanup

2012-10-22 Thread Will @ SOHO
On 10/23/2012 01:25 AM, Peter Schuller wrote: On Oct 22, 2012 11:54 AM, "B. Todd Burruss" > wrote: > > does "nodetool cleanup" perform a major compaction in the process of > removing unwanted data? No. what is the internal memory model used? It sounds like it doesn't

Re: nodetool cleanup

2012-10-22 Thread Peter Schuller
On Oct 22, 2012 11:54 AM, "B. Todd Burruss" wrote: > > does "nodetool cleanup" perform a major compaction in the process of > removing unwanted data? No.

Re: Re: nodetool cleanup - results in more disk use?

2011-04-05 Thread jonathan . colby
I think the key thing to remember is that compaction is performed on *similar* sized sstables. so it makes sense that over time this will have a cascading effect. I think by default it starts out with compacting 4 flushed sstables, then the cycle begins. On Apr 4, 2011 3:42pm, shimi wrote:

Re: nodetool cleanup - results in more disk use?

2011-04-04 Thread shimi
The bigger the file the longer it will take for it to be part of a compaction again. Compacting bucket of large files takes longer then compacting bucket of small files Shimi On Mon, Apr 4, 2011 at 3:58 PM, aaron morton wrote: > mmm, interesting. My theory was > > t0 - major compaction runs,

Re: nodetool cleanup - results in more disk use?

2011-04-04 Thread aaron morton
mmm, interesting. My theory was t0 - major compaction runs, there is now one sstable t1 - x new sstables have been created t2 - minor compaction runs and determines there are two buckets, one with the x new sstables and one with the single big file. The bucket of many files is compacted int

Re: nodetool cleanup - results in more disk use?

2011-04-04 Thread Jonathan Colby
hi Aaron - The Datastax documentation brought to light the fact that over time, major compactions will be performed on bigger and bigger SSTables. They actually recommend against performing too many major compactions. Which is why I am wary to trigger too many major compactions ... http://

Re: nodetool cleanup - results in more disk use?

2011-04-04 Thread aaron morton
cleanup reads each SSTable on disk and writes a new file that contains the same data with the exception of rows that are no longer in a token range the node is a replica for. It's not compacting the files into fewer files or purging tombstones. But it is re-writing all the data for the CF. Par

Re: nodetool cleanup - results in more disk use?

2011-04-01 Thread Jonathan Colby
I discovered that a Garbage collection cleans up the unused old SSTables. But I still wonder whether cleanup really does a full compaction. This would be undesirable if so. On Apr 1, 2011, at 4:08 PM, Jonathan Colby wrote: > I ran node cleanup on a node in my cluster and discovered the disk

Re: nodetool cleanup isn't cleaning up?

2010-06-02 Thread Ran Tavory
getRangeToEndpointMap is very useful, thanks, I didn't know about it... however, I've reconfigured my cluster since (moved some nodes and tokens) so not the problem is gone. I guess I'll use getRangeToEndpointMap next time I see something like this... On Thu, Jun 3, 2010 at 9:15 AM, Jonathan Ellis

Re: nodetool cleanup isn't cleaning up?

2010-06-02 Thread Jonathan Ellis
Then the next step is to check StorageService.getRangeToEndpointMap via jmx On Tue, Jun 1, 2010 at 11:56 AM, Ran Tavory wrote: > I'm using RackAwareStrategy. But it still doesn't make sense I think... > let's see what did I miss... > According to http://wiki.apache.org/cassandra/Operations > > Ra

Re: nodetool cleanup isn't cleaning up?

2010-06-01 Thread Ran Tavory
I'm using RackAwareStrategy. But it still doesn't make sense I think... let's see what did I miss... According to http://wiki.apache.org/cassandra/Operations - RackAwareStrategy: replica 2 is placed in the first node along the ring the belongs in *another* data center than the first; th

Re: nodetool cleanup isn't cleaning up?

2010-06-01 Thread Jonathan Ellis
I'm saying that .99 is getting a copy of all the data for which .124 is the primary. (If you are using RackUnawarePartitioner. If you are using RackAware it is some other node.) On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory wrote: > ok, let me try and translate your answer ;) > Are you saying that

Re: nodetool cleanup isn't cleaning up?

2010-05-31 Thread Ran Tavory
ok, let me try and translate your answer ;) Are you saying that the data that was left on the node is non-primary-replicas of rows from the time before the move? So this implies that when a node moves in the ring, it will affect distribution of: - new keys - old keys primary node -- but will not a

Re: nodetool cleanup isn't cleaning up?

2010-05-31 Thread Jonathan Ellis
well, there you are then. On Mon, May 31, 2010 at 2:34 PM, Ran Tavory wrote: > yes, replication factor = 2 > > On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis wrote: >> >> you have replication factor > 1 ? >> >> On Mon, May 31, 2010 at 7:23 AM, Ran Tavory wrote: >> > I hope I understand nodeto

Re: nodetool cleanup isn't cleaning up?

2010-05-31 Thread Ran Tavory
yes, replication factor = 2 On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis wrote: > you have replication factor > 1 ? > > On Mon, May 31, 2010 at 7:23 AM, Ran Tavory wrote: > > I hope I understand nodetool cleanup correctly - it should clean up all > data > > that does not (currently) belong

Re: nodetool cleanup isn't cleaning up?

2010-05-31 Thread Jonathan Ellis
you have replication factor > 1 ? On Mon, May 31, 2010 at 7:23 AM, Ran Tavory wrote: > I hope I understand nodetool cleanup correctly - it should clean up all data > that does not (currently) belong to this node. If so, I think it might not > be working correctly. > Look at nodes 192.168.252.124

Re: nodetool cleanup isn't cleaning up?

2010-05-31 Thread Maxim Kramarenko
Hello! I think (but not sure, please correct me if required), that after you change token, nodes just receive new data, but don't immediate deletes old one. It seems like "clean" will mark them as tombstone and it will be deleted when you run "compact" after GCGraceSeconds seconds. On 31.05.

Re: nodetool cleanup isn't cleaning up?

2010-05-31 Thread Ran Tavory
Do you think it's the tombstones that take up the disk space? Shouldn't the tombstones be moved along with the data? On Mon, May 31, 2010 at 3:29 PM, Maxim Kramarenko wrote: > Hello! > > You likely need wait for GCGraceSeconds seconds or modify this param. > > http://spyced.blogspot.com/2010/02/d

Re: nodetool cleanup isn't cleaning up?

2010-05-31 Thread Maxim Kramarenko
Hello! You likely need wait for GCGraceSeconds seconds or modify this param. http://spyced.blogspot.com/2010/02/distributed-deletes-in-cassandra.html === Thus, a delete operation can't just wipe out all traces of the data being removed immediately: if we did, and a replica did not receive the