Can you provide exact details on where your load balancer is? Like Michael
said, you shouldn't need one between your client and the c* cluster if
you're using a DataStax driver.

All the best,


[image: datastax_logo.png] <http://www.datastax.com/>

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
<https://twitter.com/datastax> [image: g+.png]
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax>

<http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Wed, Sep 30, 2015 at 12:06 PM, Walsh, Stephen <stephen.wa...@aspect.com>
wrote:

> Many thanks all,
>
>
>
> The Load balancers are only between our own node and not as a middle-man
> to Cassandra. It’s just so we can push more data into Cassandra.
>
> The only reason we are not using 2.1.9 is time , we haven’t had time to
> test upgrades.
>
>
>
> I wasn’t able to find any best practices for number of CF, where do you
> see this documented?
>
> I see a lot of comments on 1,000 CF’s Vs 1,000 key spaces.
>
>
>
> Errors around a few times a second, about 10 or so.
>
> They are constant.
>
>
>
> Our TTL is 10 seconds on data with gc_grace_seconds set to 0 on each CF.
>
> We don’t seem to get any OOM errors.
>
>
>
> We never had these issue with our first run. Its only when we added
> another 25% of writes.
>
>
>
> Many thanks for taking the time to reply Jack
>
>
>
>
>
>
>
> *From:* Jack Krupansky [mailto:jack.krupan...@gmail.com]
> *Sent:* 30 September 2015 16:53
> *To:* user@cassandra.apache.org
> *Subject:* Re: Consistency Issues
>
>
>
> More than "low hundreds" (200 or 300 max, and preferably under 100) of
> tables/column families is not exactly a recommended best practice. You may
> be able to get it to work, but probably only with very heavy tuning (i.e.,
> lots of time and playing with options) on your own part. IOW, no quick and
> easy solution.
>
>
>
> The only immediate issue that pops to mind is that you are hitting a GC
> pause due to the large heap size and high volume.
>
>
>
> How frequent are these errors occurring? Like, how much data can you load
> before the first one pops up, and are they then frequent/constant or just
> occasionally/rarely?
>
>
>
> Can you test to see if you can see similar timeouts with say only 100 or
> 50 tables? At least that might isolate whether the issue relates at all to
> the number of tables vs. raw data rate or GC pause.
>
>
>
> Sometimes you can reduce/eliminate the GC pause issue by reducing the heap
> so that it is only modestly above the minimum required to avoid OOM.
>
>
>
>
> -- Jack Krupansky
>
>
>
> On Wed, Sep 30, 2015 at 11:22 AM, Walsh, Stephen <stephen.wa...@aspect.com>
> wrote:
>
> More information,
>
>
>
> I’ve just setup a NTP server to rule out any timing issues.
>
> And I also see this in the Cassandra node log files
>
>
>
> MessagingService-Incoming-/172.31.22.4] 2015-09-30 15:19:14,769
> IncomingTcpConnection.java:97 - UnknownColumnFamilyException reading from
> socket; closing
>
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
> cfId=cf411b50-6785-11e5-a435-e7be20c92086
>
>
>
> Any idea what this is related too?
>
> All these tests are run with a clean setup of Cassandra  nodes followed by
> a nodetool repair.
>
> Before any data hits them.
>
>
>
>
>
> *From:* Walsh, Stephen [mailto:stephen.wa...@aspect.com]
> *Sent:* 30 September 2015 15:17
> *To:* user@cassandra.apache.org
> *Subject:* Consistency Issues
>
>
>
> Hi there,
>
>
>
> We are having some issues with consistency. I’ll try my best to explain.
>
>
>
> We have an application that was able to
>
> Write ~1000 p/s
>
> Read ~300 p/s
>
> Total CF created: 400
>
> Total Keyspaces created : 80
>
>
>
> On a 4 node Cassandra Cluster with
>
> Version 2.1.6
>
> Replication : 3
>
> Consistency  (Read & Write) : LOCAL_QUORUM
>
> Cores : 4
>
> Ram : 15 GB
>
> Heap Size 8GB
>
>
>
> This was fine and worked, but was pushing our application to the max.
>
>
>
> ---------------------
>
>
>
> Next we added a load balancer (HaProxy) to our application.
>
> So now we have 3 of our nodes talking to 4 Cassandra Nodes with a load of
>
> Write ~1250 p/s
>
> Read 0p/s
>
> Total CF created: 450
>
> Total Keyspaces created : 100
>
>
>
> On our application we now see
>
> Cassandra timeout during write query at consistency LOCAL_QUORUM (2
> replica were required but only 1 acknowledged the write)
>
> (we are using java Cassandra driver 2.1.6)
>
>
>
> So we increased the number of Cassandra nodes
>
> To 5, then 6  and each time got the same replication error.
>
>
>
> So then we double the spec of every node to
>
> 8 cores
>
> 30GB  RAM
>
> Heap size 15GB
>
>
>
> And we still get this replication error (2 replica were required but only
> 1 acknowledged the write)
>
>
>
> We know that when we introduce HaProxy Load balancer with 3 of our nodes
> that its hits Cassandra 3 times quicker.
>
> But we’ve now increased the Cassandra spec nearly 3 fold, and only for an
> extra 250 writes p/s and it still doesn’t work.
>
>
>
> We’re having a hard time finding out why replication is an issue with the
> size of a cluster.
>
>
>
> We tried to get OpsCenter working to monitor the nodes, but due to the
> amount of CF’s in Cassandra the datastax-agent takes 90% of the CPU on
> every node.
>
>
>
> Any suggestion / recommendation would be very welcome.
>
>
>
> Regards
>
> Stephen Walsh
>
>
>
>
>
>
>
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>
>
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>

Reply via email to