If it's overwrites and append only with no removes, an UPDATE will let you do that to standard collections. Like INSERT, UPDATE acts like an UPSERT.
On Thu, Jun 2, 2016, 12:52 AM Matthias Niehoff < matthias.nieh...@codecentric.de> wrote: > JSON would be an option, yes. A frozen collection would not work for us, > as the updates are both overwrites of existing values and appends of new > values (but never a remove of values). > So we end up with 3 options: > > 1. use clustering columns > 2. use json > 3. save the row not using the spark-cassandra-connectors saveToCassandra() > method (which does an insert of the whole row and map), but writing an own > save method using update on the map (as Eric proposed). > > I think we will go for option 1 or 2 as those are the least costly > solutions. > > Nevertheless, its a pity that an insert on a row with a map will always > create tombstones :-( > > > > 2016-06-02 2:02 GMT+02:00 Eric Stevens <migh...@gmail.com>: > >> From that perspective, you could also use a frozen collection which takes >> away the ability to append, but for which overwrites shouldn't generate a >> tombstone. >> >> On Wed, Jun 1, 2016, 5:54 PM kurt Greaves <k...@instaclustr.com> wrote: >> >>> Is there anything stopping you from using JSON instead of a collection? >>> >>> On 27 May 2016 at 15:20, Eric Stevens <migh...@gmail.com> wrote: >>> >>>> If you aren't removing elements from the map, you should instead be >>>> able to use an UPDATE statement and append the map. It will have the same >>>> effect as overwriting it, because all the new keys will take precedence >>>> over the existing keys. But it'll happen without generating a tombstone >>>> first. >>>> >>>> If you do have to remove elements from the collection during this >>>> process, you are either facing tombstones or having to surgically figure >>>> out which elements ought to be removed (which also involves tombstones, >>>> though at least not range tombstones, so a bit cheaper). >>>> >>>> On Fri, May 27, 2016, 5:39 AM Matthias Niehoff < >>>> matthias.nieh...@codecentric.de> wrote: >>>> >>>>> We are processing events in Spark and store the resulting entries >>>>> (containing a map) in Cassandra. The results can be new (no entry for this >>>>> key in Cassandra) or an Update (there is already an entry with this key in >>>>> Cassandra). We use the spark-cassandra-connector to store the data in >>>>> Cassandra. >>>>> >>>>> The connector will always do an insert of the data and will rely on >>>>> the upsert capabilities of cassandra. So every time an event is updated >>>>> the >>>>> complete map is replaced with all the problems of tombstones. >>>>> Seems like we have to implement our own persist logic in which we >>>>> check if an element already exists and if yes update the map manually. >>>>> that >>>>> would require a read before write which would be nasty. Another option >>>>> would be not to use a collection but (clustering) columns. Do you have >>>>> another idea of doing this? >>>>> >>>>> (the conclusion of this whole thing for me would be: use upsert, but >>>>> do specific updates on collections as an upsert might replace the whole >>>>> collection and generate thumbstones) >>>>> >>>>> 2016-05-25 17:37 GMT+02:00 Tyler Hobbs <ty...@datastax.com>: >>>>> >>>>>> If you replace an entire collection, whether it's a map, set, or >>>>>> list, a range tombstone will be inserted followed by the new collection. >>>>>> If you only update a single element, no tombstones are generated. >>>>>> >>>>>> On Wed, May 25, 2016 at 9:48 AM, Matthias Niehoff < >>>>>> matthias.nieh...@codecentric.de> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> we have a table with a Map Field. We do not delete anything in this >>>>>>> table, but to updates on the values including the Map Field (most of the >>>>>>> time a new value for an existing key, Rarely adding new keys). We now >>>>>>> encounter a huge amount of thumbstones for this Table. >>>>>>> >>>>>>> We used sstable2json to take a look into the sstables: >>>>>>> >>>>>>> >>>>>>> {"key": "Betty_StoreCatalogLines:7", >>>>>>> >>>>>>> "cells": [["276-1-6MPQ0RI-276110031802001001:","",1463820040628001], >>>>>>> >>>>>>> >>>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified","2016-05-21 >>>>>>> 08:40Z",1463820040628001], >>>>>>> >>>>>>> >>>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463040069753999,"t",1463040069], >>>>>>> >>>>>>> >>>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463120708590002,"t",1463120708], >>>>>>> >>>>>>> >>>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463145700735007,"t",1463145700], >>>>>>> >>>>>>> >>>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463157430862000,"t",1463157430], >>>>>>> >>>>>>> >>>>>>> [„276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_“,“276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!“,1463164595291002,"t",1463164595], >>>>>>> >>>>>>> . . . >>>>>>> >>>>>>> >>>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463820040628000,"t",1463820040], >>>>>>> >>>>>>> >>>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:62657474795f73746f72655f636174616c6f675f6c696e6573","00000154d265c6b0",1463820040628001], >>>>>>> >>>>>>> >>>>>>> [„276-1-6MPQ0RI-276110031802001001:payload“,"{\"payload\":{\"Article >>>>>>> Id\":\"276110031802001001\",\"Row Id\":\"1-6MPQ0RI\",\"Article >>>>>>> #\":\"31802001001\",\"Quote Item Id\":\"1-6MPWPVC\",\"Country >>>>>>> Code\":\"276\"}}",1463820040628001] >>>>>>> >>>>>>> >>>>>>> >>>>>>> Looking at the SStables it seem like every update of a value in a >>>>>>> Map breaks down to a delete and insert in the corresponding SSTable (see >>>>>>> all the thumbstone flags „t“ in the extract of sstable2json above). >>>>>>> >>>>>>> We are using Cassandra 2.2.5. >>>>>>> >>>>>>> Can you confirm this behavior? >>>>>>> >>>>>>> Thanks! >>>>>>> -- >>>>>>> Matthias Niehoff | IT-Consultant | Agile Software Factory | >>>>>>> Consulting >>>>>>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland >>>>>>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 >>>>>>> (0) 172.1702676 >>>>>>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de | >>>>>>> www.more4fi.de >>>>>>> >>>>>>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal >>>>>>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns >>>>>>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen >>>>>>> Schütz >>>>>>> >>>>>>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält >>>>>>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht >>>>>>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, >>>>>>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail >>>>>>> und >>>>>>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder >>>>>>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser >>>>>>> E-Mail ist nicht gestattet >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Tyler Hobbs >>>>>> DataStax <http://datastax.com/> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting >>>>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland >>>>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 >>>>> (0) 172.1702676 >>>>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de | >>>>> www.more4fi.de >>>>> >>>>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal >>>>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns >>>>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen >>>>> Schütz >>>>> >>>>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält >>>>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht >>>>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, >>>>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und >>>>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder >>>>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser >>>>> E-Mail ist nicht gestattet >>>>> >>>> >>> >>> >>> -- >>> Kurt Greaves >>> k...@instaclustr.com >>> www.instaclustr.com >>> >> > > > -- > Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting > codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland > tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0) > 172.1702676 > www.codecentric.de | blog.codecentric.de | www.meettheexperts.de | > www.more4fi.de > > Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal > Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns > Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen Schütz > > Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche > und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige > Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie > bitte sofort den Absender und löschen Sie diese E-Mail und evtl. > beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen > evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist > nicht gestattet >