Re: Performance degradation observed through embedded cassandra server - pointers needed

aaron morton Sun, 02 Oct 2011 15:52:20 -0700

Deleting the data may not be the right approach here if you want to have a 
clean slate to start the next test. It will leave tombstones around, which may 
reduce your performance if you make a lot of deletes. It's pedantic, but it's 
different to truncate or drop.


Truncate is doing a few more things that result in something a bit more like a 
clean slate 
(https://github.com/apache/cassandra/blob/cassandra-0.8.6/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1969)

* flushes CF changes to disk
* discards commit logs
* snapshots existing SSTables
* marks the existing SSTables as compacted so they are no longer used in reads. 

(drop keyspace is not too different)

If the slate you wish to clear, truncate or drop keyspace will be your friends. 

Cheers


-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 1/10/2011, at 5:56 AM, Roshan Dawrani wrote:

> Hi,
> 
> For our Grails + Cassandra application's clean-DB-for-every-test needs, we 
> finally went back from using costly "truncate" calls to 
> "range-scans-and-delete" approach, and found such a great different between 
> the performance of the two approaches, that wrote a small blog post here 
> about it: "Grails, Cassandra: Giving each test a clean DB to work with" For 
> someone in a similar situation, it may present an alternative.
> 
> Cheers.
> 
> On Fri, Sep 23, 2011 at 1:29 PM, Roshan Dawrani <roshandawr...@gmail.com> 
> wrote:
> Thanks for sharing your inputs, Edward. Some comments inline below:
> 
> On Thu, Sep 22, 2011 at 7:31 PM, Edward Capriolo <edlinuxg...@gmail.com> 
> wrote:
> 
> 1) Should should try to dig in an determine why the truncate is slower. Look 
> for related jira issues on truncation. 
> 
> I should give it a try. I thought I might get some readymade pointers from 
> people already knowing about 0.7.2 / 0.8.5 differences on whether our 
> approach to truncate every test has gone even worse due to some changes in 
> that area.
>  
> Cassandra had some re-entrant code you could fork a JVM each test and use the 
> CassandraServiceDataCleaner. (However multiple startups could end up causing 
> more overhead then the truncation)
> 
> I avoid this problem by using a different column family and or a different 
> keyspaces for all my unit tests in a single class. Each class bring up a new 
> embedded cluster and uses the data cleaner to sanitize the data directories. 
> So essentially I never call truncate.
> 
> In both these approaches, won't I need to re-build the schema for every test 
> too? Certainly in the 2nd case, if I end up creating new keyspace or 
> different column families for each test. I am not sure what I will gain there 
> in terms of performance. I was hoping data truncation leaving schema there 
> would be faster than that.
> 
> -- 
> Roshan
> Blog: http://roshandawrani.wordpress.com/
> Twitter: @roshandawrani
> Skype: roshandawrani
> 
> 
> 
> 
> -- 
> Roshan
> Blog: http://roshandawrani.wordpress.com/
> Twitter: @roshandawrani
> Skype: roshandawrani
>

Re: Performance degradation observed through embedded cassandra server - pointers needed

Reply via email to