Sebastien,

Another thing to keep in mind when writing/updating a map column is that it
is internally (in the memtable) backed by a synchronized data structure -
if the rate of writes/updates is sufficiently high, the resulting CPU load
will cripple the nodes (see CASSANDRA-15464
<https://issues.apache.org/jira/browse/CASSANDRA-15464> - this mentions
sets, but same is true for maps as well).

Arvydas



On Mon, Dec 18, 2023 at 10:06 AM Bowen Song via user <
user@cassandra.apache.org> wrote:

> Hi Sebastien,
>
> It's a bit more complicated than that.
>
> To begin with, the first-class citizen in Cassandra is partition, not
> row. All map fields in the same row are in the same partition, and all
> rows with the same partition key but different clustering keys are also
> in the same partition. During a compaction, Cassandra does its best not
> to split a partition into multiple SSTables, unless it must, e.g. when
> dealing with repaired vs unrepaired data. That means regardless it's a
> map field in a row or multiple rows within same partition, they get
> compacted into the same number of SSTables.
>
> A map type field's data may live in one column, but definitely not just
> one blob of data from the server's perspective, unless it's frozen.
> Reading such data is no cheaper than reading multiple columns and rows
> within the same partition, as each components of it, a key or a value,
> needs to be deserialised individually from the on-disk SSTable format,
> and then serialised again for the network protocol (often called the
> native protocol, NTP, or binary protocol) when it is read by a CQL client.
>
> There's no obvious performance benefit for reading key-value pairs from
> a map field in a row vs columns and rows in the same partition. However,
> each row can be read separately and selectively, but key-value pairs in
> a map cannot. All data in a map field must be fetched all at once. So if
> you ever need to selectively read the data, reading multiple columns and
> rows in the same partition filtered by clustering keys will actually
> perform better than reading all key-value pairs from a large map type
> field and then discarding the unwanted data.
>
> If you really want better server-side read performance and always read
> the whole thing, you should consider use a frozen map or frozen UDT
> instead. Of course, there's a cost to freeze them. A frozen data cannot
> be partially modified (e.g. add, remove or update a value in it), it can
> only be deleted or overwritten with new data at once. Which means it may
> not be suitable for your use case.
>
> I can see you also mentioned big partitions. Large partitions in
> Cassandra usually is a bad idea, regardless it's a single row with a few
> columns or many rows with many columns. There's some exceptions that may
> work well, but generally you should avoid creating large partitions if
> possible. The problem with large partitions is usually the JVM heap and
> GC pauses, rarely CPU or disk resources.
>
> Regards,
> Bowen
>
>
> On 18/12/2023 17:00, Sébastien Rebecchi wrote:
> > Hello
> >
> > If i have a colum of type Map, then with many insertions, the map
> > grows, but after compation, as the full map is 1 column of a table,
> > will it be contained fully in 1 SSTable?
> > I guess yes cause the map is contained in a single row. Am I right?
> > Versus if we use a clustering key + a standard column instead of a
> > map, insertions will create many rows, 1 per clustering key value, so
> > even after compaction the partition could be splitted in several
> SSTables.
> > Can you tell me if i understood correctly please? Because if it is
> > right then it means the pb of big partitions can be enhanced using Map
> > as it will induce much more CPU and disk resources to perform
> > compaction (on the other hand you will have lower read amplification
> > factor with map).
> >
> > Thanks,
> >
> > Sébastien
>

Reply via email to