Theres your problem, you're using the DataStax java driver :) I just ran into this issue in the last week and it was incredibly frustrating. If you are doing a simple loop on a "select * " query, then the DataStax java driver will only process 2^31 rows (e.g. the Java Integer Max (2,147,483,647)) before it stops w/o any error or output in the logs. The fact that you said you only had about 2 billion rows but you are seeing missing data is a red flag.
I found the only way around this is to do your "select *" in chunks based on the token range (see this gist for an example: https://gist.github.com/baholladay/21eb4c61ea8905302195 ) Just loop for every 100million rows and make a new query "select * from TABLE where token(key) > lastToken" Thanks, Bryan On Mon, Jun 15, 2015 at 12:50 PM, Jean Tremblay < jean.tremb...@zen-innovations.com> wrote: > Dear all, > > I identified a bit more closely the root cause of my missing data. > > The problem is occurring when I use > > <dependency> > <groupId>com.datastax.cassandra</groupId> > <artifactId>cassandra-driver-core</artifactId> > <version>2.1.6</version> > </dependency> > > on my client against Cassandra 2.1.6. > > I did not have the problem when I was using the driver 2.1.4 with C* > 2.1.4. > Interestingly enough I don’t have the problem with the driver 2.1.4 with > C* 2.1.6. !!!!!! > > So as far as I can locate the problem, I would say that the version > 2.1.6 of the driver is not working properly and is loosing some of my > records.!!! > > —————— > > As far as my tombstones are concerned I don’t understand their origin. > I removed all location in my code where I delete items, and I do not use > TTL anywhere ( I don’t need this feature in my project). > > And yet I have many tombstones building up. > > Is there another origin for tombstone beside TTL, and deleting items? > Could the compaction of LeveledCompactionStrategy be the origin of them? > > @Carlos thanks for your guidance. > > Kind regards > > Jean > > > > On 15 Jun 2015, at 11:17 , Carlos Rolo <r...@pythian.com> wrote: > > Hi Jean, > > The problem of that Warning is that you are reading too many tombstones > per request. > > If you do have Tombstones without doing DELETE it because you probably > TTL'ed the data when inserting (By mistake? Or did you set > default_time_to_live in your table?). You can use nodetool cfstats to see > how many tombstones per read slice you have. This is, probably, also the > cause of your missing data. Data was tombstoned, so it is not available. > > > > Regards, > > Carlos Juzarte Rolo > Cassandra Consultant > > Pythian - Love your data > > rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo > <http://linkedin.com/in/carlosjuzarterolo>* > Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 > www.pythian.com > > On Mon, Jun 15, 2015 at 10:54 AM, Jean Tremblay < > jean.tremb...@zen-innovations.com> wrote: > >> Hi, >> >> I have reloaded the data in my cluster of 3 nodes RF: 2. >> I have loaded about 2 billion rows in one table. >> I use LeveledCompactionStrategy on my table. >> I use version 2.1.6. >> I use the default cassandra.yaml, only the ip address for seeds and >> throughput has been change. >> >> I loaded my data with simple insert statements. This took a bit more >> than one day to load the data… and one more day to compact the data on all >> nodes. >> For me this is quite acceptable since I should not be doing this again. >> I have done this with previous versions like 2.1.3 and others and I >> basically had absolutely no problems. >> >> Now I read the log files on the client side, there I see no warning and >> no errors. >> On the nodes side there I see many WARNING, all related with tombstones, >> but there are no ERRORS. >> >> My problem is that I see some *many missing records* in the DB, and I >> have never observed this with previous versions. >> >> 1) Is this a know problem? >> 2) Do you have any idea how I could track down this problem? >> 3) What is the meaning of this WARNING (the only type of ERROR | WARN I >> could find)? >> >> WARN [SharedPool-Worker-2] 2015-06-15 10:12:00,866 >> SliceQueryFilter.java:319 - Read 2990 live and 16016 tombstone cells in >> gttdata.alltrades_co_rep_pcode for key: D:07 (see >> tombstone_warn_threshold). 5000 columns were requested, >> slices=[388:201001-388:201412:!] >> >> >> 4) Is it possible to have Tombstone when we make no DELETE statements? >> >> I’m lost… >> >> Thanks for your help. >> > > > -- > > > > > >