Hi all, Unfortunately, we still have a problem. I've modified my code, so that it explicitly sets the consistency level to *QUORUM* for each query. However, we found out a few cases when the record is deleted on only *1 node of 3*. In this cases the *delete* query executed ok, and the *select* query that we do right after delete returned *0* rows. Later when we ran a global daily check *select* returned *1* row. How can that be? What can we be missing?
2013/10/7 Jon Haddad <j...@jonhaddad.com> > I haven't used VMWare but it seems odd that it would lock up the ntp port. > try "ps aux | grep ntp" to see if ntpd it's already running. > > On Oct 7, 2013, at 12:23 AM, Alexander Shutyaev <shuty...@gmail.com> > wrote: > > Hi Michał, > > I didn't notice your message at first.. Well this seems like a real cause > candidate.. I'll add an explicit consistency level QUORUM and see if that > helps. Thanks > > > 2013/10/7 Alexander Shutyaev <shuty...@gmail.com> > >> Hi Nick, >> >> Thanks for the note! We have our cassanra instances installed on virtual >> hosts in VMWare and the clock synchronization is handled by the latter, so >> I can't use ntpdate (says that NTP socket is in use). Is there any way to >> check if the clocks are really synchronized? My best attempt was using >> three shell windows with commands already typed thus requiring only >> clicking on the window and hitting enter. The results varied by 100-200 >> msec which I guess is just about the time I need to click and press enter :) >> >> Thanks in advance, >> Alexander >> >> >> 2013/10/7 Nikolay Mihaylov <n...@nmmm.nu> >> >>> Hi >>> >>> my two cents - before doing anything else, make sure clocks are >>> synchronized to the millisecond. >>> ntp will do so. >>> >>> Nick. >>> >>> >>> On Mon, Oct 7, 2013 at 9:02 AM, Alexander Shutyaev >>> <shuty...@gmail.com>wrote: >>> >>>> Hi all, >>>> >>>> We have encountered the following problem with cassandra. >>>> >>>> * We use *cassandra v2.0.0* from *Datastax* community repo. >>>> >>>> * We have *3 nodes* in a cluster, all of them are seed providers. >>>> >>>> * We have a *single keyspace* with *replication factor = 3*: >>>> >>>> *CREATE KEYSPACE bof WITH replication = {* >>>> * 'class': 'SimpleStrategy',* >>>> * 'replication_factor': '3'* >>>> *};* >>>> >>>> * We use *Datastax Java CQL Driver v1.0.3* in our application. >>>> >>>> * We have not modified any *consistency settings* in our app, so I >>>> assume we have the *default QUORUM* (2 out of 3 in our case) >>>> consistency *for reads and writes*. >>>> >>>> * We have 400+ tables which can be divided in two groups (*main* and * >>>> uids*). All tables in a group have the same definition, they vary only >>>> by name. The sample definitions are: >>>> >>>> *CREATE TABLE bookingfile (* >>>> * key text,* >>>> * entity_created timestamp,* >>>> * entity_createdby text,* >>>> * entity_entitytype text,* >>>> * entity_modified timestamp,* >>>> * entity_modifiedby text,* >>>> * entity_status text,* >>>> * entity_uid text,* >>>> * entity_updatepolicy text,* >>>> * version_created timestamp,* >>>> * version_createdby text,* >>>> * version_data blob,* >>>> * version_dataformat text,* >>>> * version_datasource text,* >>>> * version_modified timestamp,* >>>> * version_modifiedby text,* >>>> * version_uid text,* >>>> * version_versionnotes text,* >>>> * version_versionnumber int,* >>>> * versionscount int,* >>>> * PRIMARY KEY (key)* >>>> *) WITH* >>>> * bloom_filter_fp_chance=0.010000 AND* >>>> * caching='KEYS_ONLY' AND* >>>> * comment='' AND* >>>> * dclocal_read_repair_chance=0.000000 AND* >>>> * gc_grace_seconds=864000 AND* >>>> * index_interval=128 AND* >>>> * read_repair_chance=0.100000 AND* >>>> * replicate_on_write='true' AND* >>>> * populate_io_cache_on_flush='false' AND* >>>> * default_time_to_live=0 AND* >>>> * speculative_retry='NONE' AND* >>>> * memtable_flush_period_in_ms=0 AND* >>>> * compaction={'class': 'SizeTieredCompactionStrategy'} AND* >>>> * compression={'sstable_compression': 'LZ4Compressor'};* >>>> >>>> *CREATE TABLE bookingfile_uids (* >>>> * date text,* >>>> * timeanduid text,* >>>> * deleted boolean,* >>>> * PRIMARY KEY (date, timeanduid)* >>>> *) WITH* >>>> * bloom_filter_fp_chance=0.010000 AND* >>>> * caching='KEYS_ONLY' AND* >>>> * comment='' AND* >>>> * dclocal_read_repair_chance=0.000000 AND* >>>> * gc_grace_seconds=864000 AND* >>>> * index_interval=128 AND* >>>> * read_repair_chance=0.100000 AND* >>>> * replicate_on_write='true' AND* >>>> * populate_io_cache_on_flush='false' AND* >>>> * default_time_to_live=0 AND* >>>> * speculative_retry='NONE' AND* >>>> * memtable_flush_period_in_ms=0 AND* >>>> * compaction={'class': 'SizeTieredCompactionStrategy'} AND* >>>> * compression={'sstable_compression': 'LZ4Compressor'};* >>>> * >>>> * >>>> *CREATE INDEX BookingFile_uids_deleted ON bookingfile_uids (deleted);* >>>> >>>> * We don't have any problems with the tables from the *main* group. >>>> >>>> * As for the tables from the *uids* group we have noticed that >>>> sometimes deletes from these tables do not do their job. They don't fail, >>>> they just do nothing. We have confirmed this by adding a select query after >>>> deletes. Most times everything is ok and select returns 0 records. But >>>> sometimes (~5 out of 100,000) it returns the supposedly deleted row. >>>> >>>> * We have logged the ExecutionInfo objects with query tracing that are >>>> returned by Datastax's driver. Here are the details >>>> >>>> *DELETE FROM bookingfile_uids WHERE date=C20131006 AND >>>> timeAndUid=195248590_4762ce41-d2d2-448d-be8c-c7fcb6b7394e* >>>> >>>> *ExecutionInfo: [* >>>> *triedHosts=/10.10.30.23;* >>>> *queriedHost=/10.10.30.23;* >>>> *achievedConsistencyLevel=null;* >>>> *queryTrace=* >>>> * Message received from /10.10.30.23 on /10.10.30.19[Thread-56] at Sun >>>> Oct 06 19:55:57 MSK 2013* >>>> * Acquiring switchLock read lock on /10.10.30.19[MutationStage:49] at >>>> Sun Oct 06 19:55:57 MSK 2013* >>>> * Appending to commitlog on /10.10.30.19[MutationStage:49] at Sun Oct >>>> 06 19:55:57 MSK 2013* >>>> * Adding to bookingfile_uids memtable on /10.10.30.19[MutationStage:49] >>>> at Sun Oct 06 19:55:57 MSK 2013* >>>> * Enqueuing response to /10.10.30.23 on /10.10.30.19[MutationStage:49] >>>> at Sun Oct 06 19:55:57 MSK 2013* >>>> * Sending message to /10.10.30.23 on /10.10.30.19[WRITE-/10.10.30.23] >>>> at Sun Oct 06 19:55:57 MSK 2013* >>>> * Message received from /10.10.30.23 on /10.10.30.20[Thread-34] at Sun >>>> Oct 06 19:55:57 MSK 2013* >>>> * Acquiring switchLock read lock on /10.10.30.20[MutationStage:43] at >>>> Sun Oct 06 19:55:57 MSK 2013* >>>> * Appending to commitlog on /10.10.30.20[MutationStage:43] at Sun Oct >>>> 06 19:55:57 MSK 2013* >>>> * Adding to bookingfile_uids memtable on /10.10.30.20[MutationStage:43] >>>> at Sun Oct 06 19:55:57 MSK 2013* >>>> * Enqueuing response to /10.10.30.23 on /10.10.30.20[MutationStage:43] >>>> at Sun Oct 06 19:55:57 MSK 2013* >>>> * Sending message to /10.10.30.23 on /10.10.30.20[WRITE-/10.10.30.23] >>>> at Sun Oct 06 19:55:57 MSK 2013* >>>> * Determining replicas for mutation on >>>> /10.10.30.23[Native-Transport-Requests:1387368] >>>> at Sun Oct 06 19:55:57 MSK 2013* >>>> * Sending message to /10.10.30.19 on /10.10.30.23[WRITE-/10.10.30.19] >>>> at Sun Oct 06 19:55:57 MSK 2013* >>>> * Acquiring switchLock read lock on /10.10.30.23[MutationStage:46] at >>>> Sun Oct 06 19:55:57 MSK 2013* >>>> * Sending message to /10.10.30.20 on /10.10.30.23[WRITE-/10.10.30.20] >>>> at Sun Oct 06 19:55:57 MSK 2013* >>>> * Message received from /10.10.30.20 on /10.10.30.23[Thread-5] at Sun >>>> Oct 06 19:55:57 MSK 2013* >>>> * Processing response from /10.10.30.20 on >>>> /10.10.30.23[RequestResponseStage:4] >>>> at Sun Oct 06 19:55:57 MSK 2013* >>>> * Message received from /10.10.30.19 on /10.10.30.23[Thread-7] at Sun >>>> Oct 06 19:55:57 MSK 2013* >>>> * Processing response from /10.10.30.19 on >>>> /10.10.30.23[RequestResponseStage:4] >>>> at Sun Oct 06 19:55:57 MSK 2013;* >>>> *]* >>>> >>>> *SELECT * FROM bookingfile_uids WHERE date=C20131006 AND >>>> timeAndUid=195248590_4762ce41-d2d2-448d-be8c-c7fcb6b7394e returned 1 record >>>> * >>>> >>>> the same query 1 second later: >>>> >>>> *DELETE FROM bookingfile_uids WHERE date=C20131006 AND >>>> timeAndUid=195248590_4762ce41-d2d2-448d-be8c-c7fcb6b7394e* >>>> * >>>> * >>>> *ExecutionInfo: [* >>>> *triedHosts=/10.10.30.20;* >>>> *queriedHost=/10.10.30.20;* >>>> *achievedConsistencyLevel=null;* >>>> *queryTrace=* >>>> * Message received from /10.10.30.20 on /10.10.30.19[Thread-57] at Sun >>>> Oct 06 19:55:58 MSK 2013* >>>> * Determining replicas for mutation on >>>> /10.10.30.20[Native-Transport-Requests:1387705] >>>> at Sun Oct 06 19:55:58 MSK 2013* >>>> * Acquiring switchLock read lock on /10.10.30.20[MutationStage:43] at >>>> Sun Oct 06 19:55:58 MSK 2013* >>>> * Appending to commitlog on /10.10.30.20[MutationStage:43] at Sun Oct >>>> 06 19:55:58 MSK 2013* >>>> * Adding to bookingfile_uids memtable on /10.10.30.20[MutationStage:43] >>>> at Sun Oct 06 19:55:58 MSK 2013* >>>> * Sending message to /10.10.30.19 on /10.10.30.20[WRITE-/10.10.30.19] >>>> at Sun Oct 06 19:55:58 MSK 2013* >>>> * Sending message to /10.10.30.23 on /10.10.30.20[WRITE-/10.10.30.23] >>>> at Sun Oct 06 19:55:58 MSK 2013* >>>> * Message received from /10.10.30.19 on /10.10.30.20[Thread-4] at Sun >>>> Oct 06 19:55:58 MSK 2013* >>>> * Processing response from /10.10.30.19 on >>>> /10.10.30.20[RequestResponseStage:6] >>>> at Sun Oct 06 19:55:58 MSK 2013* >>>> * Message received from /10.10.30.20 on /10.10.30.23[Thread-18] at Sun >>>> Oct 06 19:55:58 MSK 2013* >>>> *]* >>>> * >>>> * >>>> *SELECT * FROM bookingfile_uids WHERE date=C20131006 AND >>>> timeAndUid=195248590_4762ce41-d2d2-448d-be8c-c7fcb6b7394e returned 0 >>>> records.* >>>> >>>> * Cassandra's system.log on all 3 nodes lists nothing special during >>>> these queries - just some compaction related *INFO* entries. >>>> >>>> Can anyone help with this? What is our next step? >>>> >>>> Thanks in advance, >>>> Alexander >>>> >>> >>> >> > >