Re: Internal Handling of Map Updates

Eric Stevens Thu, 02 Jun 2016 06:32:33 -0700

If it's overwrites and append only with no removes, an UPDATE will let you
do that to standard collections. Like INSERT, UPDATE acts like an UPSERT.


On Thu, Jun 2, 2016, 12:52 AM Matthias Niehoff <
matthias.nieh...@codecentric.de> wrote:

> JSON would be an option, yes. A frozen collection would not work for us,
> as the updates are both overwrites of existing values and appends of new
> values (but never a remove of values).
> So we end up with 3 options:
>
> 1. use clustering columns
> 2. use json
> 3. save the row not using the spark-cassandra-connectors saveToCassandra()
> method (which does an insert of the whole row and map), but writing an own
> save method using update on the map (as Eric proposed).
>
> I think we will go for option 1 or 2 as those are the least costly
> solutions.
>
> Nevertheless, its a pity that an insert on a row with a map will always
> create tombstones :-(
>
>
>
> 2016-06-02 2:02 GMT+02:00 Eric Stevens <migh...@gmail.com>:
>
>> From that perspective, you could also use a frozen collection which takes
>> away the ability to append, but for which overwrites shouldn't generate a
>> tombstone.
>>
>> On Wed, Jun 1, 2016, 5:54 PM kurt Greaves <k...@instaclustr.com> wrote:
>>
>>> Is there anything stopping you from using JSON instead of a collection?
>>>
>>> On 27 May 2016 at 15:20, Eric Stevens <migh...@gmail.com> wrote:
>>>
>>>> If you aren't removing elements from the map, you should instead be
>>>> able to use an UPDATE statement and append the map. It will have the same
>>>> effect as overwriting it, because all the new keys will take precedence
>>>> over the existing keys. But it'll happen without generating a tombstone
>>>> first.
>>>>
>>>> If you do have to remove elements from the collection during this
>>>> process, you are either facing tombstones or having to surgically figure
>>>> out which elements ought to be removed (which also involves tombstones,
>>>> though at least not range tombstones, so a bit cheaper).
>>>>
>>>> On Fri, May 27, 2016, 5:39 AM Matthias Niehoff <
>>>> matthias.nieh...@codecentric.de> wrote:
>>>>
>>>>> We are processing events in Spark and store the resulting entries
>>>>> (containing a map) in Cassandra. The results can be new (no entry for this
>>>>> key in Cassandra) or an Update (there is already an entry with this key in
>>>>> Cassandra). We use the spark-cassandra-connector to store the data in
>>>>> Cassandra.
>>>>>
>>>>> The connector will always do an insert of the data and will rely on
>>>>> the upsert capabilities of cassandra. So every time an event is updated 
>>>>> the
>>>>> complete map is replaced with all the problems of tombstones.
>>>>> Seems like we have to implement our own persist logic in which we
>>>>> check if an element already exists and if yes update the map manually. 
>>>>> that
>>>>> would require a read before write which would be nasty. Another option
>>>>> would be not to use a collection but (clustering) columns. Do you have
>>>>> another idea of doing this?
>>>>>
>>>>> (the conclusion of this whole thing for me would be: use upsert, but
>>>>> do specific updates on collections as an upsert might replace the whole
>>>>> collection and generate thumbstones)
>>>>>
>>>>> 2016-05-25 17:37 GMT+02:00 Tyler Hobbs <ty...@datastax.com>:
>>>>>
>>>>>> If you replace an entire collection, whether it's a map, set, or
>>>>>> list, a range tombstone will be inserted followed by the new collection.
>>>>>> If you only update a single element, no tombstones are generated.
>>>>>>
>>>>>> On Wed, May 25, 2016 at 9:48 AM, Matthias Niehoff <
>>>>>> matthias.nieh...@codecentric.de> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> we have a table with a Map Field. We do not delete anything in this
>>>>>>> table, but to updates on the values including the Map Field (most of the
>>>>>>> time a new value for an existing key, Rarely adding new keys). We now
>>>>>>> encounter a huge amount of thumbstones for this Table.
>>>>>>>
>>>>>>> We used sstable2json to take a look into the sstables:
>>>>>>>
>>>>>>>
>>>>>>> {"key": "Betty_StoreCatalogLines:7",
>>>>>>>
>>>>>>>  "cells": [["276-1-6MPQ0RI-276110031802001001:","",1463820040628001],
>>>>>>>
>>>>>>>            
>>>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified","2016-05-21 
>>>>>>> 08:40Z",1463820040628001],
>>>>>>>
>>>>>>>            
>>>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463040069753999,"t",1463040069],
>>>>>>>
>>>>>>>            
>>>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463120708590002,"t",1463120708],
>>>>>>>
>>>>>>>            
>>>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463145700735007,"t",1463145700],
>>>>>>>
>>>>>>>            
>>>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463157430862000,"t",1463157430],
>>>>>>>
>>>>>>>            
>>>>>>> [„276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_“,“276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!“,1463164595291002,"t",1463164595],
>>>>>>>
>>>>>>> . . .
>>>>>>>
>>>>>>>   
>>>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463820040628000,"t",1463820040],
>>>>>>>
>>>>>>>            
>>>>>>> ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:62657474795f73746f72655f636174616c6f675f6c696e6573","00000154d265c6b0",1463820040628001],
>>>>>>>
>>>>>>>            
>>>>>>> [„276-1-6MPQ0RI-276110031802001001:payload“,"{\"payload\":{\"Article 
>>>>>>> Id\":\"276110031802001001\",\"Row Id\":\"1-6MPQ0RI\",\"Article 
>>>>>>> #\":\"31802001001\",\"Quote Item Id\":\"1-6MPWPVC\",\"Country 
>>>>>>> Code\":\"276\"}}",1463820040628001]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Looking at the SStables it seem like every update of a value in a
>>>>>>> Map breaks down to a delete and insert in the corresponding SSTable (see
>>>>>>> all the thumbstone flags „t“ in the extract of sstable2json above).
>>>>>>>
>>>>>>> We are using Cassandra 2.2.5.
>>>>>>>
>>>>>>> Can you confirm this behavior?
>>>>>>>
>>>>>>> Thanks!
>>>>>>> --
>>>>>>> Matthias Niehoff | IT-Consultant | Agile Software Factory  |
>>>>>>> Consulting
>>>>>>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
>>>>>>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49
>>>>>>> (0) 172.1702676
>>>>>>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
>>>>>>> www.more4fi.de
>>>>>>>
>>>>>>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
>>>>>>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
>>>>>>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen
>>>>>>> Schütz
>>>>>>>
>>>>>>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält
>>>>>>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht
>>>>>>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
>>>>>>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail 
>>>>>>> und
>>>>>>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder
>>>>>>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser
>>>>>>> E-Mail ist nicht gestattet
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Tyler Hobbs
>>>>>> DataStax <http://datastax.com/>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
>>>>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
>>>>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49
>>>>> (0) 172.1702676
>>>>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
>>>>> www.more4fi.de
>>>>>
>>>>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
>>>>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
>>>>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen
>>>>> Schütz
>>>>>
>>>>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält
>>>>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht
>>>>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
>>>>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und
>>>>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder
>>>>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser
>>>>> E-Mail ist nicht gestattet
>>>>>
>>>>
>>>
>>>
>>> --
>>> Kurt Greaves
>>> k...@instaclustr.com
>>> www.instaclustr.com
>>>
>>
>
>
> --
> Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
> 172.1702676
> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
> www.more4fi.de
>
> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen Schütz
>
> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche
> und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige
> Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie
> bitte sofort den Absender und löschen Sie diese E-Mail und evtl.
> beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen
> evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist
> nicht gestattet
>

Re: Internal Handling of Map Updates

Reply via email to