This is https://issues.apache.org/jira/browse/CASSANDRA-5677.
-- Sylvain On Tue, Jul 2, 2013 at 6:04 AM, Mohica Jasha <mohica.ja...@gmail.com> wrote: > Querying a table with 5000 thousands tombstones take 3 minutes to complete! > But Querying the same table with the same data pattern with 10,000 entries > takes a fraction of second to complete! > > > Details: > 1. created the following table: > CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': '1'}; > use test; > CREATE TABLE job_index ( stage text, "timestamp" text, PRIMARY KEY > (stage, "timestamp")); > > 2. inserted 5000 entries to the table: > INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '00000001' ); > INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '00000002' ); > .... > INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '00004999' ); > INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '00005000' ); > > 3. flushed the table: > nodetool flush test job_index > > 4. deleted the 5000 entries: > DELETE from job_index WHERE stage ='a' AND timestamp = '00000001' ; > DELETE from job_index WHERE stage ='a' AND timestamp = '00000002' ; > ... > DELETE from job_index WHERE stage ='a' AND timestamp = '00004999' ; > DELETE from job_index WHERE stage ='a' AND timestamp = '00005000' ; > > 5. flushed the table: > nodetool flush test job_index > > 6. querying the table takes 3 minutes to complete: > cqlsh:test> SELECT * from job_index limit 20000; > tracing: > http://pastebin.com/jH2rZN2X > > while query was getting executed I saw a lot of GC entries in cassandra's > log: > DEBUG [ScheduledTasks:1] 2013-07-01 23:47:59,221 GCInspector.java (line > 121) GC for ParNew: 30 ms for 6 collections, 263993608 used; max is > 2093809664 > DEBUG [ScheduledTasks:1] 2013-07-01 23:48:00,222 GCInspector.java (line > 121) GC for ParNew: 29 ms for 6 collections, 186209616 used; max is > 2093809664 > DEBUG [ScheduledTasks:1] 2013-07-01 23:48:01,223 GCInspector.java (line > 121) GC for ParNew: 29 ms for 6 collections, 108731464 used; max is > 2093809664 > > It seems that something very inefficient is happening in managing > tombstones. > > If I start with a clean table and do the following: > 1. insert 5000 entries > 2. flush to disk > 3. insert new 5000 entries > 4. flush to disk > Querying the job_index for all the 10,000 entries takes a fraction of > second to complete: > tracing: > http://pastebin.com/scUN9JrP > > The fact that iterating over 5000 tombstones takes 3 minutes but iterating > over 10,000 live cells takes fraction of a second to suggest that something > very inefficient is happening in managing tombstones. > > I appreciate if any developer can look into this. > > -M > > > > > > > >