JSON would be an option, yes. A frozen collection would not work for us, as the updates are both overwrites of existing values and appends of new values (but never a remove of values). So we end up with 3 options:
1. use clustering columns 2. use json 3. save the row not using the spark-cassandra-connectors saveToCassandra() method (which does an insert of the whole row and map), but writing an own save method using update on the map (as Eric proposed). I think we will go for option 1 or 2 as those are the least costly solutions. Nevertheless, its a pity that an insert on a row with a map will always create tombstones :-( 2016-06-02 2:02 GMT+02:00 Eric Stevens <migh...@gmail.com>: > From that perspective, you could also use a frozen collection which takes > away the ability to append, but for which overwrites shouldn't generate a > tombstone. > > On Wed, Jun 1, 2016, 5:54 PM kurt Greaves <k...@instaclustr.com> wrote: > >> Is there anything stopping you from using JSON instead of a collection? >> >> On 27 May 2016 at 15:20, Eric Stevens <migh...@gmail.com> wrote: >> >>> If you aren't removing elements from the map, you should instead be able >>> to use an UPDATE statement and append the map. It will have the same effect >>> as overwriting it, because all the new keys will take precedence over the >>> existing keys. But it'll happen without generating a tombstone first. >>> >>> If you do have to remove elements from the collection during this >>> process, you are either facing tombstones or having to surgically figure >>> out which elements ought to be removed (which also involves tombstones, >>> though at least not range tombstones, so a bit cheaper). >>> >>> On Fri, May 27, 2016, 5:39 AM Matthias Niehoff < >>> matthias.nieh...@codecentric.de> wrote: >>> >>>> We are processing events in Spark and store the resulting entries >>>> (containing a map) in Cassandra. The results can be new (no entry for this >>>> key in Cassandra) or an Update (there is already an entry with this key in >>>> Cassandra). We use the spark-cassandra-connector to store the data in >>>> Cassandra. >>>> >>>> The connector will always do an insert of the data and will rely on the >>>> upsert capabilities of cassandra. So every time an event is updated the >>>> complete map is replaced with all the problems of tombstones. >>>> Seems like we have to implement our own persist logic in which we check >>>> if an element already exists and if yes update the map manually. that would >>>> require a read before write which would be nasty. Another option would be >>>> not to use a collection but (clustering) columns. Do you have another idea >>>> of doing this? >>>> >>>> (the conclusion of this whole thing for me would be: use upsert, but do >>>> specific updates on collections as an upsert might replace the whole >>>> collection and generate thumbstones) >>>> >>>> 2016-05-25 17:37 GMT+02:00 Tyler Hobbs <ty...@datastax.com>: >>>> >>>>> If you replace an entire collection, whether it's a map, set, or list, >>>>> a range tombstone will be inserted followed by the new collection. If you >>>>> only update a single element, no tombstones are generated. >>>>> >>>>> On Wed, May 25, 2016 at 9:48 AM, Matthias Niehoff < >>>>> matthias.nieh...@codecentric.de> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> we have a table with a Map Field. We do not delete anything in this >>>>>> table, but to updates on the values including the Map Field (most of the >>>>>> time a new value for an existing key, Rarely adding new keys). We now >>>>>> encounter a huge amount of thumbstones for this Table. >>>>>> >>>>>> We used sstable2json to take a look into the sstables: >>>>>> >>>>>> >>>>>> {"key": "Betty_StoreCatalogLines:7", >>>>>> >>>>>> "cells": [["276-1-6MPQ0RI-276110031802001001:","",1463820040628001], >>>>>> >>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified","2016-05-21 >>>>>> 08:40Z",1463820040628001], >>>>>> >>>>>> >>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463040069753999,"t",1463040069], >>>>>> >>>>>> >>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463120708590002,"t",1463120708], >>>>>> >>>>>> >>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463145700735007,"t",1463145700], >>>>>> >>>>>> >>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463157430862000,"t",1463157430], >>>>>> >>>>>> >>>>>> [„276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_“,“276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!“,1463164595291002,"t",1463164595], >>>>>> >>>>>> . . . >>>>>> >>>>>> >>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463820040628000,"t",1463820040], >>>>>> >>>>>> >>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:62657474795f73746f72655f636174616c6f675f6c696e6573","00000154d265c6b0",1463820040628001], >>>>>> >>>>>> >>>>>> [„276-1-6MPQ0RI-276110031802001001:payload“,"{\"payload\":{\"Article >>>>>> Id\":\"276110031802001001\",\"Row Id\":\"1-6MPQ0RI\",\"Article >>>>>> #\":\"31802001001\",\"Quote Item Id\":\"1-6MPWPVC\",\"Country >>>>>> Code\":\"276\"}}",1463820040628001] >>>>>> >>>>>> >>>>>> >>>>>> Looking at the SStables it seem like every update of a value in a Map >>>>>> breaks down to a delete and insert in the corresponding SSTable (see all >>>>>> the thumbstone flags „t“ in the extract of sstable2json above). >>>>>> >>>>>> We are using Cassandra 2.2.5. >>>>>> >>>>>> Can you confirm this behavior? >>>>>> >>>>>> Thanks! >>>>>> -- >>>>>> Matthias Niehoff | IT-Consultant | Agile Software Factory | >>>>>> Consulting >>>>>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland >>>>>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 >>>>>> (0) 172.1702676 >>>>>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de | >>>>>> www.more4fi.de >>>>>> >>>>>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal >>>>>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns >>>>>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen >>>>>> Schütz >>>>>> >>>>>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält >>>>>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht >>>>>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, >>>>>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail >>>>>> und >>>>>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder >>>>>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser >>>>>> E-Mail ist nicht gestattet >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Tyler Hobbs >>>>> DataStax <http://datastax.com/> >>>>> >>>> >>>> >>>> >>>> -- >>>> Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting >>>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland >>>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0) >>>> 172.1702676 >>>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de | >>>> www.more4fi.de >>>> >>>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal >>>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns >>>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen >>>> Schütz >>>> >>>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält >>>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht >>>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, >>>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und >>>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder >>>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser >>>> E-Mail ist nicht gestattet >>>> >>> >> >> >> -- >> Kurt Greaves >> k...@instaclustr.com >> www.instaclustr.com >> > -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0) 172.1702676 www.codecentric.de | blog.codecentric.de | www.meettheexperts.de | www.more4fi.de Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen Schütz Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist nicht gestattet