You’re probably hitting https://issues.apache.org/jira/browse/CASSANDRA-8940: Inconsistent select count and select distinct It’s resolved (as I understand, a non-thread-safe object was shared between threads) and the patch will be included in 2.1.6 and 2.0.16
It’s a showstopper for me too: while developing I sometimes need to rebuild stuff based on the complete dataset (should become *very* rare in production, but still). However, as long as this bug is around, I can never be sure all records are included. Unfortunately, I don’t see any schedule for releasing either version… Luc From: Josef Lindman Hörnlund [mailto:jo...@appdata.biz] Sent: woensdag 3 juni 2015 12:16 To: user@cassandra.apache.org Subject: Re: Different number of records from COPY command I ran into that issue a while ago and it was because I hit the tombstone limit on one of the nodes. Try running `nodetool compact adlog 'adclicklog20150528.csv` and see if that helps. Josef Lindman Hörnlund On 02 Jun 2015, at 17:48, Saurabh Chandolia <s.chando...@gmail.com<mailto:s.chando...@gmail.com>> wrote: Still getting inconsistent number of records on consistency ALL and QUORUM. Following is the output of consistency ALL and QUORUM. cqlsh:adlog> CONSISTENCY ALL; Consistency level set to ALL. cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv'; Processed 58000 rows; Write: 3065.60 rows/s 58463 rows exported in 21.353 seconds. cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv'; Processed 63000 rows; Write: 3517.03 rows/s 63972 rows exported in 22.885 seconds. cqlsh:adlog> CONSISTENCY QUORUM ; Consistency level set to QUORUM. cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv'; Processed 63000 rows; Write: 3443.37 rows/s 63440 rows exported in 21.987 seconds. cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv'; Processed 65000 rows; Write: 3405.90 rows/s 65524 rows exported in 24.053 seconds. - Saurabh On Tue, Jun 2, 2015 at 9:09 PM, Anuj Wadehra <anujw_2...@yahoo.co.in<mailto:anujw_2...@yahoo.co.in>> wrote: I have never exported data myself but can u just try setting 'consistency ALL' on cqlsh before executing command? Thanks Anuj Wadehra Sent from Yahoo Mail on Android<https://overview.mail.yahoo.com/mobile/?.src=Android> ________________________________ From:"Saurabh Chandolia" <s.chando...@gmail.com<mailto:s.chando...@gmail.com>> Date:Tue, 2 Jun, 2015 at 8:47 pm Subject:Different number of records from COPY command I am seeing different number of records each time I export a particular table. There were no writes/reads in this table while exporting the data. I am not able to understand why it is happening. Am I missing something here? Cassandra version: 2.1.4 Java driver version: 2.1.5 Cluster Size: 4 Nodes in same DC Keyspace Replication factor: 2 Following commands were issued: cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv'; Processed 68000 rows; Write: 3025.93 rows/s 68682 rows exported in 27.737 seconds. cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv'; Processed 65000 rows; Write: 2821.06 rows/s 65535 rows exported in 26.667 seconds. cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv'; Processed 66000 rows; Write: 3285.07 rows/s 66055 rows exported in 26.269 seconds. cfstats for adlog.adclicklog20150528: ------------------------------------------- $ nodetool cfstats adlog.adclicklog20150528 Keyspace: adlog Read Count: 217 Read Latency: 2.773073732718894 ms. Write Count: 103191 Write Latency: 0.10233075558915021 ms. Pending Flushes: 0 Table: adclicklog20150528 SSTable count: 11 Space used (live): 37981202 Space used (total): 37981202 Space used by snapshots (total): 13407843 Off heap memory used (total): 25580 SSTable Compression Ratio: 0.26684147550494164 Number of keys (estimate): 5627 Memtable cell count: 94620 Memtable data size: 13459445 Memtable off heap memory used: 0 Memtable switch count: 19 Local read count: 217 Local read latency: 2.774 ms Local write count: 103191 Local write latency: 0.103 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 7192 Bloom filter off heap memory used: 7104 Index summary off heap memory used: 980 Compression metadata off heap memory used: 17496 Compacted partition minimum bytes: 1110 Compacted partition maximum bytes: 182785 Compacted partition mean bytes: 27808 Average live cells per slice (last five minutes): 44.663594470046085 Maximum live cells per slice (last five minutes): 86.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 ---------------- - Saurabh