It is the total table count, across all key spaces. Memory is memory. -- Jack Krupansky
On Tue, Mar 1, 2016 at 6:26 PM, Brian Sam-Bodden <bsbod...@integrallis.com> wrote: > Eric, > Is the keyspace as a multitenancy solution as bad as the many tables > pattern? Is the memory overhead of keyspaces as heavy as that of tables? > > Cheers, > Brian > > > On Tuesday, March 1, 2016, Eric Stevens <migh...@gmail.com> wrote: > >> It's definitely not true for every use case of a large number of tables, >> but for many uses where you'd be tempted to do that, adding whatever would >> have driven your table naming instead as a column in your partition key on >> a smaller number of tables will meet your needs. This is especially true >> if you're looking to solve multi-tenancy, unless you let your tenants >> dynamically drive your schema (which is a separate can of worms). >> >> On Tue, Mar 1, 2016 at 9:08 AM Jack Krupansky <jack.krupan...@gmail.com> >> wrote: >> >>> I don't think Cassandra was "purposefully developed" for some target >>> number of tables - there is no evidence of any such an explicit intent. >>> Instead, it would be fair to say that Cassandra was "not purposefully >>> developed" with a goal of supporting "large numbers of tables." Sometimes >>> features and capabilities come for free or as a side effect of the >>> technologies used, but usually specific features and specific capabilities >>> (such as large numbers of tables) require explicit intent and explicit >>> effort. >>> >>> One could indeed endeavor to design a data store (I'm not even sure it >>> would still be considered a database per se) that supported either large >>> numbers of tables or an additional level of storage model in between table >>> and row (call it "group" maybe or "sub-table".) But obviously Cassandra was >>> not designed with that goal in mind. >>> >>> Traditionally, a "table" is a defined relation over a set of data. >>> Relation and data are distinct concepts. And a relation name is not simply >>> a Java-style "object". A relation (table) name is supposed to represent an >>> abstraction or entity type, while essentially all of the cases I have heard >>> of for wanting thousands (or even hundreds) of tables are trying to use >>> table as more of a container for a group of rows for a specific entity >>> instance rather than a distinct entity type. Granted, Cassandra is not >>> obligated to be limited to the relational model, but Cassandra, especially >>> CQL, is intentionally modeled reasonably closely with the relational model >>> in terms of the data modeling abstractions even though the storage engine >>> is designed to scale across nodes. >>> >>> You could file a Jira requesting such a feature improvement. And then we >>> would see if sentiment has shifted over the years. >>> >>> The key thing is to offer up a use case that warrants support for large >>> numbers of tables. So far, it has usually been the case that the perceived >>> need for separate tables could easily be met using clustering columns of a >>> single table. >>> >>> Seriously, if you guys can define a legitimate use case that can't >>> easily be handled by a single table, that could get the discussion started. >>> >>> -- Jack Krupansky >>> >>> On Tue, Mar 1, 2016 at 9:11 AM, Fernando Jimenez < >>> fernando.jime...@wealth-port.com> wrote: >>> >>>> Hi Jack >>>> >>>> Being purposefully developed to only handle up to “a few hundred” >>>> tables is reason enough. I accept that, and likely a use case with many >>>> tables was never really considered. But I would still like to understand >>>> the design choices made so perhaps we gain some confidence level in this >>>> upper limit in the number of tables. The best estimate we have so far is “a >>>> few hundred” which is a bit vague. >>>> >>>> Regarding scaling, I’m not talking about scaling in terms of data >>>> volume, but on how the data is structured. One thousand tables with one row >>>> each is the same data volume as one table with one thousand rows, excluding >>>> any data structures required to maintain the extra tables. But whereas the >>>> first seems likely to bring a Cassandra cluster to its knees, the second >>>> will run happily on a single node cluster in a low end machine. >>>> >>>> We will design our code to use a single table to avoid having >>>> nightmares with this issue. But if there is any authoritative documentation >>>> on this characteristic of Cassandra, I would love to know more. >>>> >>>> FJ >>>> >>>> >>>> On 01 Mar 2016, at 14:23, Jack Krupansky <jack.krupan...@gmail.com> >>>> wrote: >>>> >>>> I don't think there are any "reasons behind it." It is simply empirical >>>> experience - as reported here. >>>> >>>> Cassandra scales in two dimension - number of rows per node and number >>>> of nodes. If some source of information lead you to believe otherwise, >>>> please point out the source so that we can endeavor to correct it. >>>> >>>> The exact number of rows per node and tables per node will always have >>>> to be evaluated empirically - a proof of concept implementation, since it >>>> all depends on the mix of capabilities of your hardware combined with your >>>> specific data model, your specific data values, your specific access >>>> patterns, and your specific load. And it also depends on your own personal >>>> tolerance for degradation of latency and throughput - some people might >>>> find a given set of performance metrics acceptable while other might not. >>>> >>>> -- Jack Krupansky >>>> >>>> On Tue, Mar 1, 2016 at 3:54 AM, Fernando Jimenez < >>>> fernando.jime...@wealth-port.com> wrote: >>>> >>>>> Hi Tommaso >>>>> >>>>> It’s not that I _need_ a large number of tables. This approach maps >>>>> easily to the problem we are trying to solve, but it’s becoming clear it’s >>>>> not the right approach. >>>>> >>>>> At the moment I’m trying to understand the limitations in Cassandra >>>>> regarding number of Tables and the reasons behind it. I’ve come to the >>>>> email list as my Google-foo is not giving me what I’m looking for :( >>>>> >>>>> FJ >>>>> >>>>> >>>>> >>>>> On 01 Mar 2016, at 09:36, tommaso barbugli <tbarbu...@gmail.com> >>>>> wrote: >>>>> >>>>> Hi Fernando, >>>>> >>>>> I used to have a cluster with ~300 tables (1 keyspace) on C* 2.0, it >>>>> was a real pain in terms of operations. Repairs were terribly slow, boot >>>>> of >>>>> C* slowed down and in general tracking table metrics becomes bit more >>>>> work. >>>>> Why do you need this high number of tables? >>>>> >>>>> Tommaso >>>>> >>>>> On Tue, Mar 1, 2016 at 9:16 AM, Fernando Jimenez < >>>>> fernando.jime...@wealth-port.com> wrote: >>>>> >>>>>> Hi Jack >>>>>> >>>>>> By entry I mean row >>>>>> >>>>>> Apologies for the “obsolete terminology”. When I first looked at >>>>>> Cassandra it was still on CQL2, and now that I’m looking at it again I’ve >>>>>> defaulted to the terms I already knew. I will bear it in mind and call >>>>>> them >>>>>> tables from now on. >>>>>> >>>>>> Is there any documentation about this limit? for example, I’d be keen >>>>>> to know how much memory is consumed per table, and I’m also curious about >>>>>> the reasons for keeping this in memory. I’m trying to understand the >>>>>> limitations here, rather than challenge them. >>>>>> >>>>>> So far I found nothing in my search, hence why I had to resort to >>>>>> some “load testing” to see what happens when you push the table count >>>>>> high >>>>>> >>>>>> Thanks >>>>>> FJ >>>>>> >>>>>> >>>>>> On 01 Mar 2016, at 06:23, Jack Krupansky <jack.krupan...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> 3,000 entries? What's an "entry"? Do you mean row, column, or... what? >>>>>> >>>>>> You are using the obsolete terminology of CQL2 and Thrift - column >>>>>> family. With CQL3 you should be creating "tables". The practical >>>>>> recommendation of an upper limit of a few hundred tables across all key >>>>>> spaces remains. >>>>>> >>>>>> Technically you can go higher and technically you can reduce the >>>>>> overhead per table (an undocumented Jira - intentionally undocumented >>>>>> since >>>>>> it is strongly not recommended), but... it is unlikely that you will be >>>>>> happy with the results. >>>>>> >>>>>> What is the nature of the use case? >>>>>> >>>>>> You basically have two choices: an additional cluster column to >>>>>> distinguish categories of table, or separate clusters for each few >>>>>> hundred >>>>>> of tables. >>>>>> >>>>>> >>>>>> -- Jack Krupansky >>>>>> >>>>>> On Mon, Feb 29, 2016 at 12:30 PM, Fernando Jimenez < >>>>>> fernando.jime...@wealth-port.com> wrote: >>>>>> >>>>>>> Hi all >>>>>>> >>>>>>> I have a use case for Cassandra that would require creating a large >>>>>>> number of column families. I have found references to early versions of >>>>>>> Cassandra where each column family would require a fixed amount of >>>>>>> memory >>>>>>> on all nodes, effectively imposing an upper limit on the total number of >>>>>>> CFs. I have also seen rumblings that this may have been fixed in later >>>>>>> versions. >>>>>>> >>>>>>> To put the question to rest, I have setup a DSE sandbox and created >>>>>>> some code to generate column families populated with 3,000 entries each. >>>>>>> >>>>>>> Unfortunately I have now hit this issue: >>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-9291 >>>>>>> >>>>>>> So I will have to retest against Cassandra 3.0 instead >>>>>>> >>>>>>> However, I would like to understand the limitations regarding >>>>>>> creation of column families. >>>>>>> >>>>>>> * Is there a practical upper limit? >>>>>>> * is this a fixed limit, or does it scale as more nodes are added >>>>>>> into the cluster? >>>>>>> * Is there a difference between one keyspace with thousands of >>>>>>> column families, vs thousands of keyspaces with only a few column >>>>>>> families >>>>>>> each? >>>>>>> >>>>>>> I haven’t found any hard evidence/documentation to help me here, but >>>>>>> if you can point me in the right direction, I will oblige and RTFM away. >>>>>>> >>>>>>> Many thanks for your help! >>>>>>> >>>>>>> Cheers >>>>>>> FJ >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> > > -- > Cheers, > Brian > http://www.integrallis.com > >