Re: Practical limit on number of column families

Jack Krupansky Tue, 01 Mar 2016 05:24:07 -0800

I don't think there are any "reasons behind it." It is simply empirical
experience - as reported here.


Cassandra scales in two dimension - number of rows per node and number of
nodes. If some source of information lead you to believe otherwise, please
point out the source so that we can endeavor to correct it.

The exact number of rows per node and tables per node will always have to
be evaluated empirically - a proof of concept implementation, since it all
depends on the mix of capabilities of your hardware combined with your
specific data model, your specific data values, your specific access
patterns, and your specific load. And it also depends on your own personal
tolerance for degradation of latency and throughput - some people might
find a given set of performance  metrics acceptable while other might not.

-- Jack Krupansky

On Tue, Mar 1, 2016 at 3:54 AM, Fernando Jimenez <
fernando.jime...@wealth-port.com> wrote:

> Hi Tommaso
>
> It’s not that I _need_ a large number of tables. This approach maps easily
> to the problem we are trying to solve, but it’s becoming clear it’s not the
> right approach.
>
> At the moment I’m trying to understand the limitations in Cassandra
> regarding number of Tables and the reasons behind it. I’ve come to the
> email list as my Google-foo is not giving me what I’m looking for :(
>
> FJ
>
>
>
> On 01 Mar 2016, at 09:36, tommaso barbugli <tbarbu...@gmail.com> wrote:
>
> Hi Fernando,
>
> I used to have a cluster with ~300 tables (1 keyspace) on C* 2.0, it was a
> real pain in terms of operations. Repairs were terribly slow, boot of C*
> slowed down and in general tracking table metrics becomes bit more work.
> Why do you need this high number of tables?
>
> Tommaso
>
> On Tue, Mar 1, 2016 at 9:16 AM, Fernando Jimenez <
> fernando.jime...@wealth-port.com> wrote:
>
>> Hi Jack
>>
>> By entry I mean row
>>
>> Apologies for the “obsolete terminology”. When I first looked at
>> Cassandra it was still on CQL2, and now that I’m looking at it again I’ve
>> defaulted to the terms I already knew. I will bear it in mind and call them
>> tables from now on.
>>
>> Is there any documentation about this limit? for example, I’d be keen to
>> know how much memory is consumed per table, and I’m also curious about the
>> reasons for keeping this in memory. I’m trying to understand the
>> limitations here, rather than challenge them.
>>
>> So far I found nothing in my search, hence why I had to resort to some
>> “load testing” to see what happens when you push the table count high
>>
>> Thanks
>> FJ
>>
>>
>> On 01 Mar 2016, at 06:23, Jack Krupansky <jack.krupan...@gmail.com>
>> wrote:
>>
>> 3,000 entries? What's an "entry"? Do you mean row, column, or... what?
>>
>> You are using the obsolete terminology of CQL2 and Thrift - column
>> family. With CQL3 you should be creating "tables". The practical
>> recommendation of an upper limit of a few hundred tables across all key
>> spaces remains.
>>
>> Technically you can go higher and technically you can reduce the overhead
>> per table (an undocumented Jira - intentionally undocumented since it is
>> strongly not recommended), but... it is unlikely that you will be happy
>> with the results.
>>
>> What is the nature of the use case?
>>
>> You basically have two choices: an additional cluster column to
>> distinguish categories of table, or separate clusters for each few hundred
>> of tables.
>>
>>
>> -- Jack Krupansky
>>
>> On Mon, Feb 29, 2016 at 12:30 PM, Fernando Jimenez <
>> fernando.jime...@wealth-port.com> wrote:
>>
>>> Hi all
>>>
>>> I have a use case for Cassandra that would require creating a large
>>> number of column families. I have found references to early versions of
>>> Cassandra where each column family would require a fixed amount of memory
>>> on all nodes, effectively imposing an upper limit on the total number of
>>> CFs. I have also seen rumblings that this may have been fixed in later
>>> versions.
>>>
>>> To put the question to rest, I have setup a DSE sandbox and created some
>>> code to generate column families populated with 3,000 entries each.
>>>
>>> Unfortunately I have now hit this issue:
>>> https://issues.apache.org/jira/browse/CASSANDRA-9291
>>>
>>> So I will have to retest against Cassandra 3.0 instead
>>>
>>> However, I would like to understand the limitations regarding creation
>>> of column families.
>>>
>>> * Is there a practical upper limit?
>>> * is this a fixed limit, or does it scale as more nodes are added into
>>> the cluster?
>>> * Is there a difference between one keyspace with thousands of column
>>> families, vs thousands of keyspaces with only a few column families each?
>>>
>>> I haven’t found any hard evidence/documentation to help me here, but if
>>> you can point me in the right direction, I will oblige and RTFM away.
>>>
>>> Many thanks for your help!
>>>
>>> Cheers
>>> FJ
>>>
>>>
>>>
>>
>>
>
>

Re: Practical limit on number of column families

Reply via email to