Sorry this is so long after the initial. I wrote a dtest to try to make this happen here: https://github.com/bakerag1/cassandra-dtest/blob/master/collection_update_test.py
This is my first dtest and my second python script, so I am not overly confident that it is doing a good job of this test, so if any dtest expert out there could look it over, I would appreciate it. I couldn't detect this happening with 5 threads concurrently creating and updating the same 200 records, with List<String> and List<UDT>. We are not using batches in our case, but we are using prepared statements. I will add that to the test and let you know. -Andrew On Fri, Oct 14, 2016 at 11:00 PM Russell Spitzer <russell.spit...@gmail.com> wrote: > Are you sure you aren't using batches? These will assign the same > timestamp to your inserts which can lead to unexpected behaviors. > > On Fri, Oct 14, 2016 at 9:45 PM Vladimir Yudovin <vla...@winguzone.com> > wrote: > > Did you try the same quires with Java driver without using prepared > statements? > > > Best regards, Vladimir Yudovin, > > > *Winguzone <https://winguzone.com?from=list> - Hosted Cloud Cassandra on > Azure and SoftLayer.Launch your cluster in minutes.* > > > ---- On Fri, 14 Oct 2016 15:13:38 -0400*Aoi Kadoya <cadyan....@gmail.com > <cadyan....@gmail.com>>* wrote ---- > > Hi Vladimir, > > In fact I am having difficulty to reproduce this issue by cqlsh. > I was reported this issue by one of our developers and he is using his > client application that uses cassandra java driver 3.0.3. (we're using > DSE5.0.1) > > <here is how his application log look like> > > app A: > 2016-10-11 13:28:23,014 [TRACE] [core.QueryLogger.NORMAL] [cluster1] > [HOST1/IP1:9042] Query completed normally, took 5 ms: [8 bound values] > INSERT INTO global.table_name > ("id","alert_to","alert_emails","created_by","created_date","alert_level","updated_by","updated_date") > > VALUES (?,?,?,?,?,?,?,?); > [id:25712, alert_to:[2], alert_emails:NULL, > created_by:'service-worker:ec45afd2-c40a-44d9-a2a1-7416409be6e2', > created_date:1476160103007, alert_level:2, updated_by:NULL, > updated_date:NULL] > > app B: > 2016-10-11 13:28:23,014 [TRACE] [core.QueryLogger.NORMAL] [cluster1] > [HOST2/IP2:9042] Query completed normally, took 6 ms: [8 bound values] > INSERT INTO global.table_name > ("alert_to","alert_emails","created_date","id","created_by","updated_by","updated_date","alert_level") > > VALUES (?,?,?,?,?,?,?,?); > [alert_to:[1], alert_emails:NULL, created_date:1476160103007, > id:25712, > created_by:'service-worker:ec45afd2-c40a-44d9-a2a1-7416409be6e2', > updated_by:NULL, updated_date:NULL, alert_level:1] > > <and the data on cassandra> > <TYPES> > id bigint, > alert_emails list, > alert_level int, > alert_to list, > created_by text, > created_date timestamp, > updated_by text, > updated_date timestamp, > PRIMARY KEY (id) > > > SELECT id, alert_level, alert_to FROM global.table_name WHERE id=25712; > | id | alert_level | alert_to | > | 25712 | 2 | [2, 1] | > > but when I threw the queries like below from cqlsh from different > nodes at the same time in my testing environment, the data(alert_to) > was just [1], which is expected behavior. > > on host 1 > cqlsh> INSERT INTO global.table_name > ("id","alert_to","alert_emails","created_by","created_date","alert_level","updated_by","updated_date") > > VALUES > (25712,[2],NULL,'service-worker:ec45afd2-c40a-44d9-a2a1-7416409be6e2',1476160103007,2,NULL,NULL); > > on host2 > cqlsh> INSERT INTO global.table_name > ("alert_to","alert_emails","created_date","id","created_by","updated_by","updated_date","alert_level") > > VALUES > ([1],NULL,1476160103007,25712,'service-worker:ec45afd2-c40a-44d9-a2a1-7416409be6e2',NULL,NULL,1); > > > > so I wonder if this is something wrong with java driver but I cannot > figure out the way to break this down further. > > > @Andrew > we're not using UDT..but appreciate if you could share your case, too. > > Thanks, > Aoi > > 2016-10-13 11:26 GMT-07:00 Andrew Baker <baker...@gmail.com>: > > I saw evidence of this behavior, but when we created a test to try to > make > > it happen it never did, we assumed it was UDT related and lost interest, > > since it didn't have a big impact. I will try to carve some time to look > > into this some more and let you know if I find anything. > > > > On Wed, Oct 12, 2016 at 9:24 PM Vladimir Yudovin <vla...@winguzone.com> > > wrote: > >> > >> The data is actually appended. not overwritten. > >> Strange, can you send exactly operators? > >> > >> Here is example I do: > >> CREATE KEYSPACE events WITH replication = {'class': 'SimpleStrategy', > >> 'replication_factor': 1}; > >> CREATE TABLE events.data (id int primary key, events list<text>); > >> INSERT INTO events.data (id, events) VALUES ( 0, ['a']); > >> SELECT * FROM events.data ; > >> id | events > >> ----+-------- > >> 0 | ['a'] > >> > >> (1 rows) > >> > >> INSERT INTO events.data (id, events) VALUES ( 0, ['b']); > >> SELECT * FROM events.data ; > >> id | events > >> ----+-------- > >> 0 | ['b'] > >> > >> (1 rows) > >> > >> As you see, 'a' was overwritten by 'b' > >> > >> > >> Best regards, Vladimir Yudovin, > >> Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer. > >> Launch your cluster in minutes. > >> > >> > >> ---- On Wed, 12 Oct 2016 23:58:23 -0400Aoi Kadoya <cadyan....@gmail.com> > > >> wrote ---- > >> > >> yes, that's what I thought. but, when I use these forms, > >> INSERT ... ['A'] > >> INSERT ... ['B'] > >> > >> The data is actually appended. not overwritten. > >> so I guess this is something unexpected? > >> > >> Thanks, > >> Aoi > >> > >> 2016-10-12 20:55 GMT-07:00 Vladimir Yudovin <vla...@winguzone.com>: > >> > If you use form > >> > INSERT ... ['A'] > >> > INSERT ... ['B'] > >> > > >> > latest INSERT will overwrite first, because this insert the whole > list. > >> > It's > >> > better to use UPDATE like: > >> > UPDATE ... SET events = events + ['A'] > >> > UPDATE ... SET events = events + ['B'] > >> > These operations add new elements to the end of existing list. > >> > > >> > > >> > From here > >> > https://docs.datastax.com/en/cql/3.0/cql/cql_using/use_list_t.html > >> > : > >> > > >> > These update operations are implemented internally without any > >> > read-before-write. Appending and prepending a new element to the list > >> > writes > >> > only the new element. > >> > > >> > > >> > Best regards, Vladimir Yudovin, > >> > Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer. > >> > Launch your cluster in minutes. > >> > > >> > > >> > ---- On Wed, 12 Oct 2016 17:39:46 -0400Aoi Kadoya < > cadyan....@gmail.com> > >> > wrote ---- > >> > > >> > Hi, > >> > > >> > When inserting different data into a list type column from different > >> > clients at the same time, is data supposed to be combined into one > >> > list? > >> > > >> > For example, if these 2 queries were requested from clients at the > >> > same timing, how events list should look like after? > >> > > >> > INSERT INTO cycling.upcoming_calendar (year, month, events) VALUES > >> > (2015, 06, ['A']); > >> > INSERT INTO cycling.upcoming_calendar (year, month, events) VALUES > >> > (2015, 06, ['B']); > >> > > >> > In my understanding, each operation should be treated as atomic, > which > >> > makes me think that even if client throw the queries at the same > time, > >> > cassandra would take them separately and the last insert would update > >> > the events list. (= data should be either ['A'] or ['B']) > >> > > >> > In my environment, I found that some data was saved as like ['A',B'] > >> > in the case like above. > >> > Is this expected behavior of list data type? > >> > > >> > I am still new to cassandra and trying to make myself understood how > >> > this happened. > >> > Appreciate if you could help me with figuring this out! > >> > > >> > Thanks, > >> > Aoi > >> > > >> > > >