Re: Data Distribution in Table/Column Family

Alain RODRIGUEZ Thu, 27 Aug 2015 08:06:06 -0700

Hi,

Did you try to run the following on all your nodes and compare ?


du -sh /*whatever*/cassandra/data/*

Of course if you have unequal snapshots sizes remove them in the above
command (or directly remove them).

This should answer (barely) your question about an eventual even
distribution (/!\ having a few MB or GB deviation - depending on your total
data size - might happen without this being a real issue, I would say up to
5-15 % on a big enough dataset)

Also, "nodetool cfstats" give you an approximation of the number of rows
and the space used (to run on each node) among other useful informations.

But the main thing to do is to double check your tables model to see if
your workflow could create a hotspot on any of those, you should be able to
guess if one of your table is badly distributed imho.

C*heers,

Alain

2015-08-27 15:43 GMT+02:00 Saladi Naidu <naidusp2...@yahoo.com>:

> Is there a way to find out how data is distributed within column family by
> each node? Nodetool provides how data is distributed across nodes that only
> shows all the data by node. We are seeing heavy load on one node and I
> suspect that partitioning is not distributing data equally. But to prove
> that to development team we need to know the stats for that table
>
> Naidu Saladi
>

Re: Data Distribution in Table/Column Family

Reply via email to