"The problem is I can't know the combination of set/unset values" --> Just for this requirement, Achilles has a working solution for many years using INSERT_NOT_NULL_FIELDS strategy:
https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy Or you can use the Update API that by design only perform update on not null fields: https://github.com/doanduyhai/Achilles/wiki/Quick-Reference#updating-all-non-null-fields-for-an-entity Behind the scene, for each new combination of INSERT INTO table(x,y,z) statement, Achilles will check its prepared statement cache and if the statement does not exist yet, create a new prepared statement and put it into the cache for later re-use for you Disclaiment: I'm the creator of Achilles On Thu, Dec 27, 2018 at 10:21 PM Tomas Bartalos <tomas.barta...@gmail.com> wrote: > Hello, > > The problem is I can't know the combination of set/unset values. From my > perspective every value should be set. The event from Kafka represents the > complete state of the happening at certain point in time. In my table I > want to store the latest event so the most recent state of the happening > (in this table I don't care about the history). Actually I used wrong > expression since its just the opposite of "incremental update", every event > carries all data (state) for specific point of time. > > The event is represented with nested json structure. Top level elements of > the json are table fields with type like text, boolean, timestamp, list and > the nested elements are UDT fields. > > Simplified example: > There is a new purchase for the happening, event: > {total_amount: 50, items : [A, B, C, new_item], purchase_time : > '2018-12-27 13:30', specials: null, customer : {... }, fare_amount,...} > I don't know what actually happened for this event, maybe there is a new > item purchased, maybe some customer info have been changed, maybe the > specials have been revoked and I have to reset them. I just need to store > the state as it artived from Kafka, there might already be an event for > this happening saved before, or maybe this is the first one. > > BR, > Tomas > > > On Thu, 27 Dec 2018, 9:36 pm Eric Stevens <migh...@gmail.com wrote: > >> Depending on the use case, creating separate prepared statements for each >> combination of set / unset values in large INSERT/UPDATE statements may be >> prohibitive. >> >> Instead, you can look into driver level support for UNSET values. >> Requires Cassandra 2.2 or later IIRC. >> >> See: >> Java Driver: >> https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#parameters-and-binding >> Python Driver: >> https://www.datastax.com/dev/blog/python-driver-2-6-0-rc1-with-cassandra-2-2-features#distinguishing_between_null_and_unset_values >> Node Driver: >> https://docs.datastax.com/en/developer/nodejs-driver/3.5/features/datatypes/nulls/#unset >> >> On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R < >> sean_r_dur...@homedepot.com> wrote: >> >>> You say the events are incremental updates. I am interpreting this to >>> mean only some columns are updated. Others should keep their original >>> values. >>> >>> You are correct that inserting null creates a tombstone. >>> >>> Can you only insert the columns that actually have new values? Just skip >>> the columns with no information. (Make the insert generator a bit smarter.) >>> >>> Create table happening (id text primary key, event text, a text, b text, >>> c text); >>> Insert into table happening (id, event, a, b, c) values >>> ("MainEvent","The most complete info we have right now","Priceless","10 >>> pm","Grand Ballroom"); >>> -- b changes >>> Insert into happening (id, b) values ("MainEvent","9:30 pm"); >>> >>> >>> Sean Durity >>> >>> >>> -----Original Message----- >>> From: Tomas Bartalos <tomas.barta...@gmail.com> >>> Sent: Thursday, December 27, 2018 9:27 AM >>> To: user@cassandra.apache.org >>> Subject: [EXTERNAL] Howto avoid tombstones when inserting NULL values >>> >>> Hello, >>> >>> I’d start with describing my use case and how I’d like to use Cassandra >>> to solve my storage needs. >>> We're processing a stream of events for various happenings. Every event >>> have a unique happening_id. >>> One happening may have many events, usually ~ 20-100 events. I’d like to >>> store only the latest event for the same happening (Event is an incremental >>> update and it contains all up-to date data about happening). >>> Technically the events are streamed from Kafka, processed with Spark an >>> saved to Cassandra. >>> In Cassandra we use upserts (insert with same primary key). So far so >>> good, however there comes the tombstone... >>> >>> When I’m inserting field with NULL value, Cassandra creates tombstone >>> for this field. As I understood this is due to space efficiency, Cassandra >>> doesn’t have to remember there is a NULL value, she just deletes the >>> respective column and a delete creates a ... tombstone. >>> I was hoping there could be an option to tell Cassandra not to be so >>> space effective and store “unset" info without generating tombstones. >>> Something similar to inserting empty strings instead of null values: >>> >>> CREATE TABLE happening (id text PRIMARY KEY, event text); insert into >>> happening (‘1’, ‘event1’); — tombstone is generated insert into happening >>> (‘1’, null); — tombstone is not generated insert into happening (‘1’, '’); >>> >>> Possible solutions: >>> 1. Disable tombstones with gc_grace_seconds = 0 or set to reasonable low >>> value (1 hour ?) . Not good, since phantom data may re-appear 2. ignore >>> NULLs on spark side with “spark.cassandra.output.ignoreNulls=true”. Not >>> good since this will never overwrite previously inserted event field with >>> “empty” one. >>> 3. On inserts with spark, find all NULL values and replace them with >>> “empty” equivalent (empty string for text, 0 for integer). Very inefficient >>> and problematic to find “empty” equivalent for some data types. >>> >>> Until tombstones appeared Cassandra was the right fit for our use case, >>> however now I’m not sure if we’re heading the right direction. >>> Could you please give me some advice how to solve this problem ? >>> >>> Thank you, >>> Tomas >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >>> For additional commands, e-mail: user-h...@cassandra.apache.org >>> >>> >>> ________________________________ >>> >>> The information in this Internet Email is confidential and may be >>> legally privileged. It is intended solely for the addressee. Access to this >>> Email by anyone else is unauthorized. If you are not the intended >>> recipient, any disclosure, copying, distribution or any action taken or >>> omitted to be taken in reliance on it, is prohibited and may be unlawful. >>> When addressed to our clients any opinions or advice contained in this >>> Email are subject to the terms and conditions expressed in any applicable >>> governing The Home Depot terms of business or client engagement letter. The >>> Home Depot disclaims all responsibility and liability for the accuracy and >>> content of this attachment and for any damages or losses arising from any >>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other >>> items of a destructive nature, which may be contained in this attachment >>> and shall not be liable for direct, indirect, consequential or special >>> damages in connection with this e-mail message or its attachment. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >>> For additional commands, e-mail: user-h...@cassandra.apache.org >>> >>