If it's overwrites and append only with no removes, an UPDATE will let you
do that to standard collections. Like INSERT, UPDATE acts like an UPSERT.

On Thu, Jun 2, 2016, 12:52 AM Matthias Niehoff <
matthias.nieh...@codecentric.de> wrote:

> JSON would be an option, yes. A frozen collection would not work for us,
> as the updates are both overwrites of existing values and appends of new
> values (but never a remove of values).
> So we end up with 3 options:
>
> 1. use clustering columns
> 2. use json
> 3. save the row not using the spark-cassandra-connectors saveToCassandra()
> method (which does an insert of the whole row and map), but writing an own
> save method using update on the map (as Eric proposed).
>
> I think we will go for option 1 or 2 as those are the least costly
> solutions.
>
> Nevertheless, its a pity that an insert on a row with a map will always
> create tombstones :-(
>
>
>
> 2016-06-02 2:02 GMT+02:00 Eric Stevens <migh...@gmail.com>:
>
>> From that perspective, you could also use a frozen collection which takes
>> away the ability to append, but for which overwrites shouldn't generate a
>> tombstone.
>>
>> On Wed, Jun 1, 2016, 5:54 PM kurt Greaves <k...@instaclustr.com> wrote:
>>
>>> Is there anything stopping you from using JSON instead of a collection?
>>>
>>> On 27 May 2016 at 15:20, Eric Stevens <migh...@gmail.com> wrote:
>>>
>>>> If you aren't removing elements from the map, you should instead be
>>>> able to use an UPDATE statement and append the map. It will have the same
>>>> effect as overwriting it, because all the new keys will take precedence
>>>> over the existing keys. But it'll happen without generating a tombstone
>>>> first.
>>>>
>>>> If you do have to remove elements from the collection during this
>>>> process, you are either facing tombstones or having to surgically figure
>>>> out which elements ought to be removed (which also involves tombstones,
>>>> though at least not range tombstones, so a bit cheaper).
>>>>
>>>> On Fri, May 27, 2016, 5:39 AM Matthias Niehoff <
>>>> matthias.nieh...@codecentric.de> wrote:
>>>>
>>>>> We are processing events in Spark and store the resulting entries
>>>>> (containing a map) in Cassandra. The results can be new (no entry for this
>>>>> key in Cassandra) or an Update (there is already an entry with this key in
>>>>> Cassandra). We use the spark-cassandra-connector to store the data in
>>>>> Cassandra.
>>>>>
>>>>> The connector will always do an insert of the data and will rely on
>>>>> the upsert capabilities of cassandra. So every time an event is updated 
>>>>> the
>>>>> complete map is replaced with all the problems of tombstones.
>>>>> Seems like we have to implement our own persist logic in which we
>>>>> check if an element already exists and if yes update the map manually. 
>>>>> that
>>>>> would require a read before write which would be nasty. Another option
>>>>> would be not to use a collection but (clustering) columns. Do you have
>>>>> another idea of doing this?
>>>>>
>>>>> (the conclusion of this whole thing for me would be: use upsert, but
>>>>> do specific updates on collections as an upsert might replace the whole
>>>>> collection and generate thumbstones)
>>>>>
>>>>> 2016-05-25 17:37 GMT+02:00 Tyler Hobbs <ty...@datastax.com>:
>>>>>
>>>>>> If you replace an entire collection, whether it's a map, set, or
>>>>>> list, a range tombstone will be inserted followed by the new collection.
>>>>>> If you only update a single element, no tombstones are generated.
>>>>>>
>>>>>> On Wed, May 25, 2016 at 9:48 AM, Matthias Niehoff <
>>>>>> matthias.nieh...@codecentric.de> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> we have a table with a Map Field. We do not delete anything in this
>>>>>>> table, but to updates on the values including the Map Field (most of the
>>>>>>> time a new value for an existing key, Rarely adding new keys). We now
>>>>>>> encounter a huge amount of thumbstones for this Table.
>>>>>>>
>>>>>>> We used sstable2json to take a look into the sstables:
>>>>>>>
>>>>>>>
>>>>>>> {"key": "Betty_StoreCatalogLines:7",
>>>>>>>
>>>>>>>  "cells": [["276-1-6MPQ0RI-276110031802001001:","",1463820040628001],
>>>>>>>
>>>>>>>            
>>>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified","2016-05-21 
>>>>>>> 08:40Z",1463820040628001],
>>>>>>>
>>>>>>>            
>>>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463040069753999,"t",1463040069],
>>>>>>>
>>>>>>>            
>>>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463120708590002,"t",1463120708],
>>>>>>>
>>>>>>>            
>>>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463145700735007,"t",1463145700],
>>>>>>>
>>>>>>>            
>>>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463157430862000,"t",1463157430],
>>>>>>>
>>>>>>>            
>>>>>>> [„276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_“,“276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!“,1463164595291002,"t",1463164595],
>>>>>>>
>>>>>>> . . .
>>>>>>>
>>>>>>>   
>>>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463820040628000,"t",1463820040],
>>>>>>>
>>>>>>>            
>>>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:62657474795f73746f72655f636174616c6f675f6c696e6573","00000154d265c6b0",1463820040628001],
>>>>>>>
>>>>>>>            
>>>>>>> [„276-1-6MPQ0RI-276110031802001001:payload“,"{\"payload\":{\"Article 
>>>>>>> Id\":\"276110031802001001\",\"Row Id\":\"1-6MPQ0RI\",\"Article 
>>>>>>> #\":\"31802001001\",\"Quote Item Id\":\"1-6MPWPVC\",\"Country 
>>>>>>> Code\":\"276\"}}",1463820040628001]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Looking at the SStables it seem like every update of a value in a
>>>>>>> Map breaks down to a delete and insert in the corresponding SSTable (see
>>>>>>> all the thumbstone flags „t“ in the extract of sstable2json above).
>>>>>>>
>>>>>>> We are using Cassandra 2.2.5.
>>>>>>>
>>>>>>> Can you confirm this behavior?
>>>>>>>
>>>>>>> Thanks!
>>>>>>> --
>>>>>>> Matthias Niehoff | IT-Consultant | Agile Software Factory  |
>>>>>>> Consulting
>>>>>>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
>>>>>>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49
>>>>>>> (0) 172.1702676
>>>>>>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
>>>>>>> www.more4fi.de
>>>>>>>
>>>>>>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
>>>>>>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
>>>>>>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen
>>>>>>> Schütz
>>>>>>>
>>>>>>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält
>>>>>>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht
>>>>>>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
>>>>>>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail 
>>>>>>> und
>>>>>>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder
>>>>>>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser
>>>>>>> E-Mail ist nicht gestattet
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Tyler Hobbs
>>>>>> DataStax <http://datastax.com/>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
>>>>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
>>>>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49
>>>>> (0) 172.1702676
>>>>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
>>>>> www.more4fi.de
>>>>>
>>>>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
>>>>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
>>>>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen
>>>>> Schütz
>>>>>
>>>>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält
>>>>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht
>>>>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
>>>>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und
>>>>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder
>>>>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser
>>>>> E-Mail ist nicht gestattet
>>>>>
>>>>
>>>
>>>
>>> --
>>> Kurt Greaves
>>> k...@instaclustr.com
>>> www.instaclustr.com
>>>
>>
>
>
> --
> Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
> 172.1702676
> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
> www.more4fi.de
>
> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen Schütz
>
> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche
> und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige
> Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie
> bitte sofort den Absender und löschen Sie diese E-Mail und evtl.
> beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen
> evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist
> nicht gestattet
>

Reply via email to