You say the events are incremental updates. I am interpreting this to mean only
some columns are updated. Others should keep their original values.
You are correct that inserting null creates a tombstone.
Can you only insert the columns that actually have new values? Just skip the
columns with no information. (Make the insert generator a bit smarter.)
Create table happening (id text primary key, event text, a text, b text, c
text);
Insert into table happening (id, event, a, b, c) values ("MainEvent","The most
complete info we have right now","Priceless","10 pm","Grand Ballroom");
-- b changes
Insert into happening (id, b) values ("MainEvent","9:30 pm");
Sean Durity
-----Original Message-----
From: Tomas Bartalos <[email protected]>
Sent: Thursday, December 27, 2018 9:27 AM
To: [email protected]
Subject: [EXTERNAL] Howto avoid tombstones when inserting NULL values
Hello,
I’d start with describing my use case and how I’d like to use Cassandra to
solve my storage needs.
We're processing a stream of events for various happenings. Every event have a
unique happening_id.
One happening may have many events, usually ~ 20-100 events. I’d like to store
only the latest event for the same happening (Event is an incremental update
and it contains all up-to date data about happening).
Technically the events are streamed from Kafka, processed with Spark an saved
to Cassandra.
In Cassandra we use upserts (insert with same primary key). So far so good,
however there comes the tombstone...
When I’m inserting field with NULL value, Cassandra creates tombstone for this
field. As I understood this is due to space efficiency, Cassandra doesn’t have
to remember there is a NULL value, she just deletes the respective column and a
delete creates a ... tombstone.
I was hoping there could be an option to tell Cassandra not to be so space
effective and store “unset" info without generating tombstones.
Something similar to inserting empty strings instead of null values:
CREATE TABLE happening (id text PRIMARY KEY, event text); insert into happening
(‘1’, ‘event1’); — tombstone is generated insert into happening (‘1’, null); —
tombstone is not generated insert into happening (‘1’, '’);
Possible solutions:
1. Disable tombstones with gc_grace_seconds = 0 or set to reasonable low value
(1 hour ?) . Not good, since phantom data may re-appear 2. ignore NULLs on
spark side with “spark.cassandra.output.ignoreNulls=true”. Not good since this
will never overwrite previously inserted event field with “empty” one.
3. On inserts with spark, find all NULL values and replace them with “empty”
equivalent (empty string for text, 0 for integer). Very inefficient and
problematic to find “empty” equivalent for some data types.
Until tombstones appeared Cassandra was the right fit for our use case, however
now I’m not sure if we’re heading the right direction.
Could you please give me some advice how to solve this problem ?
Thank you,
Tomas
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
________________________________
The information in this Internet Email is confidential and may be legally
privileged. It is intended solely for the addressee. Access to this Email by
anyone else is unauthorized. If you are not the intended recipient, any
disclosure, copying, distribution or any action taken or omitted to be taken in
reliance on it, is prohibited and may be unlawful. When addressed to our
clients any opinions or advice contained in this Email are subject to the terms
and conditions expressed in any applicable governing The Home Depot terms of
business or client engagement letter. The Home Depot disclaims all
responsibility and liability for the accuracy and content of this attachment
and for any damages or losses arising from any inaccuracies, errors, viruses,
e.g., worms, trojan horses, etc., or other items of a destructive nature, which
may be contained in this attachment and shall not be liable for direct,
indirect, consequential or special damages in connection with this e-mail
message or its attachment.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]