Re: About Map column

Bowen Song via user Mon, 18 Dec 2023 10:06:45 -0800

Hi Sebastien,

It's a bit more complicated than that.

To begin with, the first-class citizen in Cassandra is partition, notrow. All map fields in the same row are in the same partition, and allrows with the same partition key but different clustering keys are alsoin the same partition. During a compaction, Cassandra does its best notto split a partition into multiple SSTables, unless it must, e.g. whendealing with repaired vs unrepaired data. That means regardless it's amap field in a row or multiple rows within same partition, they getcompacted into the same number of SSTables.

A map type field's data may live in one column, but definitely not justone blob of data from the server's perspective, unless it's frozen.Reading such data is no cheaper than reading multiple columns and rowswithin the same partition, as each components of it, a key or a value,needs to be deserialised individually from the on-disk SSTable format,and then serialised again for the network protocol (often called thenative protocol, NTP, or binary protocol) when it is read by a CQL client.

There's no obvious performance benefit for reading key-value pairs froma map field in a row vs columns and rows in the same partition. However,each row can be read separately and selectively, but key-value pairs ina map cannot. All data in a map field must be fetched all at once. So ifyou ever need to selectively read the data, reading multiple columns androws in the same partition filtered by clustering keys will actuallyperform better than reading all key-value pairs from a large map typefield and then discarding the unwanted data.

If you really want better server-side read performance and always readthe whole thing, you should consider use a frozen map or frozen UDTinstead. Of course, there's a cost to freeze them. A frozen data cannotbe partially modified (e.g. add, remove or update a value in it), it canonly be deleted or overwritten with new data at once. Which means it maynot be suitable for your use case.

I can see you also mentioned big partitions. Large partitions inCassandra usually is a bad idea, regardless it's a single row with a fewcolumns or many rows with many columns. There's some exceptions that maywork well, but generally you should avoid creating large partitions ifpossible. The problem with large partitions is usually the JVM heap andGC pauses, rarely CPU or disk resources.


Regards,
Bowen


On 18/12/2023 17:00, Sébastien Rebecchi wrote:

Hello
If i have a colum of type Map, then with many insertions, the mapgrows, but after compation, as the full map is 1 column of a table,will it be contained fully in 1 SSTable?
I guess yes cause the map is contained in a single row. Am I right?
Versus if we use a clustering key + a standard column instead of amap, insertions will create many rows, 1 per clustering key value, soeven after compaction the partition could be splitted in several SSTables.Can you tell me if i understood correctly please? Because if it isright then it means the pb of big partitions can be enhanced using Mapas it will induce much more CPU and disk resources to performcompaction (on the other hand you will have lower read amplificationfactor with map).
Thanks,

Sébastien

Re: About Map column

Reply via email to