If you aren't removing elements from the map, you should instead be able to use an UPDATE statement and append the map. It will have the same effect as overwriting it, because all the new keys will take precedence over the existing keys. But it'll happen without generating a tombstone first.
If you do have to remove elements from the collection during this process, you are either facing tombstones or having to surgically figure out which elements ought to be removed (which also involves tombstones, though at least not range tombstones, so a bit cheaper). On Fri, May 27, 2016, 5:39 AM Matthias Niehoff < matthias.nieh...@codecentric.de> wrote: > We are processing events in Spark and store the resulting entries > (containing a map) in Cassandra. The results can be new (no entry for this > key in Cassandra) or an Update (there is already an entry with this key in > Cassandra). We use the spark-cassandra-connector to store the data in > Cassandra. > > The connector will always do an insert of the data and will rely on the > upsert capabilities of cassandra. So every time an event is updated the > complete map is replaced with all the problems of tombstones. > Seems like we have to implement our own persist logic in which we check if > an element already exists and if yes update the map manually. that would > require a read before write which would be nasty. Another option would be > not to use a collection but (clustering) columns. Do you have another idea > of doing this? > > (the conclusion of this whole thing for me would be: use upsert, but do > specific updates on collections as an upsert might replace the whole > collection and generate thumbstones) > > 2016-05-25 17:37 GMT+02:00 Tyler Hobbs <ty...@datastax.com>: > >> If you replace an entire collection, whether it's a map, set, or list, a >> range tombstone will be inserted followed by the new collection. If you >> only update a single element, no tombstones are generated. >> >> On Wed, May 25, 2016 at 9:48 AM, Matthias Niehoff < >> matthias.nieh...@codecentric.de> wrote: >> >>> Hi, >>> >>> we have a table with a Map Field. We do not delete anything in this >>> table, but to updates on the values including the Map Field (most of the >>> time a new value for an existing key, Rarely adding new keys). We now >>> encounter a huge amount of thumbstones for this Table. >>> >>> We used sstable2json to take a look into the sstables: >>> >>> >>> {"key": "Betty_StoreCatalogLines:7", >>> >>> "cells": [["276-1-6MPQ0RI-276110031802001001:","",1463820040628001], >>> >>> ["276-1-6MPQ0RI-276110031802001001:last_modified","2016-05-21 >>> 08:40Z",1463820040628001], >>> >>> >>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463040069753999,"t",1463040069], >>> >>> >>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463120708590002,"t",1463120708], >>> >>> >>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463145700735007,"t",1463145700], >>> >>> >>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463157430862000,"t",1463157430], >>> >>> >>> [„276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_“,“276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!“,1463164595291002,"t",1463164595], >>> >>> . . . >>> >>> >>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463820040628000,"t",1463820040], >>> >>> >>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:62657474795f73746f72655f636174616c6f675f6c696e6573","00000154d265c6b0",1463820040628001], >>> >>> >>> [„276-1-6MPQ0RI-276110031802001001:payload“,"{\"payload\":{\"Article >>> Id\":\"276110031802001001\",\"Row Id\":\"1-6MPQ0RI\",\"Article >>> #\":\"31802001001\",\"Quote Item Id\":\"1-6MPWPVC\",\"Country >>> Code\":\"276\"}}",1463820040628001] >>> >>> >>> >>> Looking at the SStables it seem like every update of a value in a Map >>> breaks down to a delete and insert in the corresponding SSTable (see all >>> the thumbstone flags „t“ in the extract of sstable2json above). >>> >>> We are using Cassandra 2.2.5. >>> >>> Can you confirm this behavior? >>> >>> Thanks! >>> -- >>> Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting >>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland >>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0) >>> 172.1702676 >>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de | >>> www.more4fi.de >>> >>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal >>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns >>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen >>> Schütz >>> >>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält >>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht >>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, >>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und >>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder >>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser >>> E-Mail ist nicht gestattet >>> >> >> >> >> -- >> Tyler Hobbs >> DataStax <http://datastax.com/> >> > > > > -- > Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting > codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland > tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0) > 172.1702676 > www.codecentric.de | blog.codecentric.de | www.meettheexperts.de | > www.more4fi.de > > Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal > Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns > Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen Schütz > > Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche > und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige > Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie > bitte sofort den Absender und löschen Sie diese E-Mail und evtl. > beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen > evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist > nicht gestattet >