Thanks Bryan. I believe I have a different problem with the Datastax 2.1.6 driver. My problem is not that I make huge selects. My problem seems more to occur on some inserts. I inserts MANY rows and with the version 2.1.6 of the driver I seem to be loosing some records.
But thanks anyway I will remember your mail when I bump into the select problem. Cheers Jean On 15 Jun 2015, at 19:13 , Bryan Holladay <holla...@longsight.com<mailto:holla...@longsight.com>> wrote: Theres your problem, you're using the DataStax java driver :) I just ran into this issue in the last week and it was incredibly frustrating. If you are doing a simple loop on a "select * " query, then the DataStax java driver will only process 2^31 rows (e.g. the Java Integer Max (2,147,483,647)) before it stops w/o any error or output in the logs. The fact that you said you only had about 2 billion rows but you are seeing missing data is a red flag. I found the only way around this is to do your "select *" in chunks based on the token range (see this gist for an example: https://gist.github.com/baholladay/21eb4c61ea8905302195 ) Just loop for every 100million rows and make a new query "select * from TABLE where token(key) > lastToken" Thanks, Bryan On Mon, Jun 15, 2015 at 12:50 PM, Jean Tremblay <jean.tremb...@zen-innovations.com<mailto:jean.tremb...@zen-innovations.com>> wrote: Dear all, I identified a bit more closely the root cause of my missing data. The problem is occurring when I use <dependency> <groupId>com.datastax.cassandra</groupId> <artifactId>cassandra-driver-core</artifactId> <version>2.1.6</version> </dependency> on my client against Cassandra 2.1.6. I did not have the problem when I was using the driver 2.1.4 with C* 2.1.4. Interestingly enough I don’t have the problem with the driver 2.1.4 with C* 2.1.6. !!!!!! So as far as I can locate the problem, I would say that the version 2.1.6 of the driver is not working properly and is loosing some of my records.!!! —————— As far as my tombstones are concerned I don’t understand their origin. I removed all location in my code where I delete items, and I do not use TTL anywhere ( I don’t need this feature in my project). And yet I have many tombstones building up. Is there another origin for tombstone beside TTL, and deleting items? Could the compaction of LeveledCompactionStrategy be the origin of them? @Carlos thanks for your guidance. Kind regards Jean On 15 Jun 2015, at 11:17 , Carlos Rolo <r...@pythian.com<mailto:r...@pythian.com>> wrote: Hi Jean, The problem of that Warning is that you are reading too many tombstones per request. If you do have Tombstones without doing DELETE it because you probably TTL'ed the data when inserting (By mistake? Or did you set default_time_to_live in your table?). You can use nodetool cfstats to see how many tombstones per read slice you have. This is, probably, also the cause of your missing data. Data was tombstoned, so it is not available. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/carlosjuzarterolo> Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649<tel:%2B1%20613%20565%208696%20x1649> www.pythian.com<http://www.pythian.com/> On Mon, Jun 15, 2015 at 10:54 AM, Jean Tremblay <jean.tremb...@zen-innovations.com<mailto:jean.tremb...@zen-innovations.com>> wrote: Hi, I have reloaded the data in my cluster of 3 nodes RF: 2. I have loaded about 2 billion rows in one table. I use LeveledCompactionStrategy on my table. I use version 2.1.6. I use the default cassandra.yaml, only the ip address for seeds and throughput has been change. I loaded my data with simple insert statements. This took a bit more than one day to load the data… and one more day to compact the data on all nodes. For me this is quite acceptable since I should not be doing this again. I have done this with previous versions like 2.1.3 and others and I basically had absolutely no problems. Now I read the log files on the client side, there I see no warning and no errors. On the nodes side there I see many WARNING, all related with tombstones, but there are no ERRORS. My problem is that I see some *many missing records* in the DB, and I have never observed this with previous versions. 1) Is this a know problem? 2) Do you have any idea how I could track down this problem? 3) What is the meaning of this WARNING (the only type of ERROR | WARN I could find)? WARN [SharedPool-Worker-2] 2015-06-15 10:12:00,866 SliceQueryFilter.java:319 - Read 2990 live and 16016 tombstone cells in gttdata.alltrades_co_rep_pcode for key: D:07 (see tombstone_warn_threshold). 5000 columns were requested, slices=[388:201001-388:201412:!] 4) Is it possible to have Tombstone when we make no DELETE statements? I’m lost… Thanks for your help. --