Honestly 20ms for spinning disks is really good, so I think you're just dealing with the reality of having a certain percentage of your reads off disk and not in memory. If you're reading data that is on older SSTables and you're out of buffer cache I'm not sure how else you could improve that.
Sounds like a physics problem to me. On Wed, Nov 25, 2015 at 10:05 AM, Antoine Bonavita <anto...@stickyads.tv> wrote: > Sebastian (and others, help is always appreciated), > > After 24h OK, read latencies started to degrade (up to 20ms) and I had to > ramp down volumes again. > > The degradation is clearly linked to the number read IOPs which went up to > 1.65k/s after 24h. > > If anybody can give me hints on what I should look at, I'm very happy to > do so. > > A. > > > On 11/23/2015 12:07 PM, Antoine Bonavita wrote: > >> Sebastian, >> >> I tried to ramp up volume with this new setting and ran into the same >> problems. >> >> After that I restarted my nodes. This pretty much instantly got read >> latencies back to normal (< 5ms) on the 32G nodes. >> >> I am currently ramping up volumes again and here is what I am seeing on >> 32G nodes: >> * Read latencies are OK (<5ms) >> * A lot of read IOPS (~ 400 read/s) >> * I enabled logging for the DateCompactionStrategy and I get only this >> kind of lines : >> DEBUG [CompactionExecutor:186] 2015-11-23 12:02:45,915 >> DateTieredCompactionStrategy.java:137 - Compaction buckets are [] >> DEBUG [CompactionExecutor:186] 2015-11-23 12:03:16,704 >> DateTieredCompactionStrategy.java:137 - Compaction buckets are >> >> [[BigTableReader(path='/var/lib/cassandra/data/views/views-451e4d8061ef11e5896f091196a360a0/la-6452-big-Data.db')]] >> >> * When I run pcstats I still get about 100 *-Data.db files loaded at 15% >> (which is what I was seeing with max_sstable_age_days set at 5). >> >> I'm really happy with the first item in my list but the other items seem >> to indicate something is still wrong and it does not look like it's >> compaction. >> >> Any help would be truly appreciated. >> >> A. >> >> On 11/20/2015 12:58 AM, Antoine Bonavita wrote: >> >>> Sebastian, >>> >>> I took into account your suggestion and set max_sstable_age_days to 1. >>> >>> I left the TTL at 432000 and the gc_grace_seconds at 172800. So, I >>> expect SSTable older than 7 days to get deleted. Am I right ? >>> >>> I did not change dclocal_read_repair_chance because I have only one DC >>> at this point in time. Did you mean that I should set read_repair_chance >>> to 0 ? >>> >>> Thanks again for your time and help. Really appreciated. >>> >>> A. >>> >>> >>> On 11/19/2015 02:36 AM, Sebastian Estevez wrote: >>> >>>> When you say drop you mean reduce the value (to 1 day for example), >>>> not "don't set the value", right ? >>>> >>>> >>>> Yes. >>>> >>>> If I set max sstable age days to 1, my understanding is that >>>> SSTables with expired data (5 days) are not going to be compacted >>>> ever. And therefore my disk usage will keep growing forever. Did I >>>> miss something here ? >>>> >>>> >>>> We will expire sstables who's highest TTL is beyond gc_grace_seconds as >>>> of CASSANDRA-5228 >>>> <https://issues.apache.org/jira/browse/CASSANDRA-5228>. This is nice >>>> because the sstable is just dropped for free, no need to scan it and >>>> remove tombstones which is very expensive and DTCS will guarantee that >>>> all the data within an sstable is close together in time. >>>> >>>> So, if I set max sstable age days to 1, I have to run repairs at >>>> least once a day, correct ? >>>> >>>> I'm afraid I don't get your point about painful compactions. >>>> >>>> >>>> I was referring to the problems described here CASSANDRA-9644 >>>> <https://issues.apache.org/jira/browse/CASSANDRA-9644> >>>> >>>> >>>> >>>> >>>> All the best, >>>> >>>> >>>> datastax_logo.png <http://www.datastax.com/> >>>> >>>> Sebastián Estévez >>>> >>>> Solutions Architect |954 905 8615 | sebastian.este...@datastax.com >>>> <mailto:sebastian.este...@datastax.com> >>>> >>>> linkedin.png <https://www.linkedin.com/company/datastax>facebook.png >>>> <https://www.facebook.com/datastax>twitter.png >>>> <https://twitter.com/datastax>g+.png >>>> <https://plus.google.com/+Datastax/about>< >>>> http://feeds.feedburner.com/datastax> >>>> >>>> >>>> <http://goog_410786983> >>>> >>>> >>>> <http://www.datastax.com/gartner-magic-quadrant-odbms> >>>> >>>> >>>> DataStax is the fastest, most scalable distributed database technology, >>>> delivering Apache Cassandra to the world’s most innovative enterprises. >>>> Datastax is built to be agile, always-on, and predictably scalable to >>>> any size. With more than 500 customers in 45 countries, DataStax is the >>>> database technology and transactional backbone of choice for the worlds >>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. >>>> >>>> On Wed, Nov 18, 2015 at 5:53 PM, Antoine Bonavita <anto...@stickyads.tv >>>> <mailto:anto...@stickyads.tv>> wrote: >>>> >>>> Sebastian, >>>> >>>> Your help is very much appreciated. I re-read the blog post and also >>>> https://labs.spotify.com/2014/12/18/date-tiered-compaction/ but >>>> some >>>> things are still confusing me. >>>> >>>> Please see my questions inline below. >>>> >>>> On 11/18/2015 04:21 PM, Sebastian Estevez wrote: >>>> >>>> Yep, I think you've mixed up your DTCS levers. I would read, or >>>> re-read >>>> Marcus's post >>>> http://www.datastax.com/dev/blog/datetieredcompactionstrategy >>>> >>>> *TL;DR:* >>>> >>>> * *base_time_seconds* is the size of your initial window >>>> * *max_sstable_age_days* is the time after which you stop >>>> compacting >>>> sstables >>>> * *default_time_to_live* is the time after which data >>>> expires and >>>> sstables will start to become available for GC. (432000 is >>>> 5 days) >>>> >>>> >>>> Could it be that compaction is putting those in cache >>>> constantly? >>>> >>>> >>>> Yep, you'll keep compacting sstables until they're 10 days old >>>> per your >>>> current settings and when you compact there are reads and then >>>> writes. >>>> >>>> >>>> >>>> If you aren't doing any updates and most of your reads are >>>> within 1 >>>> hour, you can probably afford to drop max sstable age days. >>>> >>>> When you say drop you mean reduce the value (to 1 day for example), >>>> not "don't set the value", right ? >>>> >>>> If I set max sstable age days to 1, my understanding is that >>>> SSTables with expired data (5 days) are not going to be compacted >>>> ever. And therefore my disk usage will keep growing forever. Did I >>>> miss something here ? >>>> >>>> Just make >>>> sure you're doing your repairs more often than the max sstable >>>> age days >>>> to avoid some painful compactions. >>>> >>>> So, if I set max sstable age days to 1, I have to run repairs at >>>> least once a day, correct ? >>>> I'm afraid I don't get your point about painful compactions. >>>> >>>> Along the same lines, you should probably set >>>> dclocal_read_repair_chance >>>> to 0 >>>> >>>> Will try that. >>>> >>>> >>>> Regarding the heap configuration, both are very similar >>>> >>>> >>>> Probably unrelated but, is there a reason why they're not >>>> identical? >>>> Especially the different new gen size could have gc >>>> implications. >>>> >>>> Both are calculated by cassandra-env.sh. If my bash skills are still >>>> intact, the NewGen size difference comes from the number of cores: >>>> the 64G machine has 12 cores where the 32G machine has 8 cores (I >>>> did not even realize this before looking into this, that's why I did >>>> not mention it in my previous emails). >>>> >>>> Thanks a lot for your help. >>>> >>>> A. >>>> >>>> >>>> >>>> >>>> >>>> All the best, >>>> >>>> >>>> datastax_logo.png <http://www.datastax.com/> >>>> >>>> Sebastián Estévez >>>> >>>> Solutions Architect |954 905 8615 <tel:954%20905%208615> | >>>> sebastian.este...@datastax.com >>>> <mailto:sebastian.este...@datastax.com> >>>> <mailto:sebastian.este...@datastax.com >>>> <mailto:sebastian.este...@datastax.com>> >>>> >>>> linkedin.png >>>> <https://www.linkedin.com/company/datastax>facebook.png >>>> <https://www.facebook.com/datastax>twitter.png >>>> <https://twitter.com/datastax>g+.png >>>> >>>> <https://plus.google.com/+Datastax/about>< >>>> http://feeds.feedburner.com/datastax> >>>> >>>> >>>> <http://goog_410786983> >>>> >>>> >>>> <http://www.datastax.com/gartner-magic-quadrant-odbms> >>>> >>>> >>>> DataStax is the fastest, most scalable distributed database >>>> technology, >>>> delivering Apache Cassandra to the world’s most innovative >>>> enterprises. >>>> Datastax is built to be agile, always-on, and predictably >>>> scalable to >>>> any size. With more than 500 customers in 45 countries, DataStax >>>> is the >>>> database technology and transactional backbone of choice for the >>>> worlds >>>> most innovative companies such as Netflix, Adobe, Intuit, and >>>> eBay. >>>> >>>> On Wed, Nov 18, 2015 at 6:44 AM, Antoine Bonavita >>>> <anto...@stickyads.tv <mailto:anto...@stickyads.tv> >>>> <mailto:anto...@stickyads.tv <mailto:anto...@stickyads.tv>>> >>>> wrote: >>>> >>>> Sebastian, Robet, >>>> >>>> First, a big thank you to both of you for your help. >>>> >>>> It looks like you were right. I used pcstat (awesome tool, >>>> thanks >>>> for that as well) and it appears some files I would not >>>> expect to be >>>> in cache actually are. Here is a sample of my output >>>> (edited for >>>> convenience, adding the file timestamp from the OS): >>>> >>>> * >>>> >>>> >>>> >>>> /var/lib/cassandra/data/views/views-451e4d8061ef11e5896f091196a360a0/la-5951-big-Data.db >>>> >>>> >>>> - 000.619 % - Nov 16 12:25 >>>> * >>>> >>>> >>>> >>>> /var/lib/cassandra/data/views/views-451e4d8061ef11e5896f091196a360a0/la-5954-big-Data.db >>>> >>>> >>>> - 000.681 % - Nov 16 13:44 >>>> * >>>> >>>> >>>> >>>> /var/lib/cassandra/data/views/views-451e4d8061ef11e5896f091196a360a0/la-5955-big-Data.db >>>> >>>> >>>> - 000.610 % - Nov 16 14:11 >>>> * >>>> >>>> >>>> >>>> /var/lib/cassandra/data/views/views-451e4d8061ef11e5896f091196a360a0/la-5956-big-Data.db >>>> >>>> >>>> - 015.621 % - Nov 16 14:26 >>>> * >>>> >>>> >>>> >>>> /var/lib/cassandra/data/views/views-451e4d8061ef11e5896f091196a360a0/la-5957-big-Data.db >>>> >>>> >>>> - 015.558 % - Nov 16 14:50 >>>> >>>> The SSTables that come before are all at about 0% and the >>>> ones that >>>> come after it are all at about 15%. >>>> >>>> As you can see the first SSTable at 15% date back from 24h. >>>> Given my >>>> application I'm pretty sure those are not from the reads >>>> (reads of >>>> data older than 1h is definitely under 0.1% of reads). >>>> Could it be >>>> that compaction is putting those in cache constantly ? >>>> If so, then I'm probably confused on the meaning/effect of >>>> max_sstable_age_days (set at 10 in my case) and >>>> base_time_seconds >>>> (not set in my case so the default of 3600 applies). I >>>> would not >>>> expect any compaction to happen beyond the first hour and >>>> the 10 >>>> days is here to make sure data still gets expired and >>>> SSTables >>>> removed (thus releasing disk space). I don't see where the >>>> 24h come >>>> from. >>>> If you guys can shed some light on this, it would be >>>> awesome. I'm >>>> sure I got something wrong. >>>> >>>> Regarding the heap configuration, both are very similar: >>>> * 32G machine: -Xms8049M -Xmx8049M -Xmn800M >>>> * 64G machine: -Xms8192M -Xmx8192M -Xmn1200M >>>> I think we can rule that out. >>>> >>>> Thanks again for you help, I truly appreciate it. >>>> >>>> A. >>>> >>>> On 11/17/2015 08:48 PM, Robert Coli wrote: >>>> >>>> On Tue, Nov 17, 2015 at 11:08 AM, Sebastian Estevez >>>> <sebastian.este...@datastax.com >>>> <mailto:sebastian.este...@datastax.com> >>>> <mailto:sebastian.este...@datastax.com >>>> <mailto:sebastian.este...@datastax.com>> >>>> <mailto:sebastian.este...@datastax.com >>>> <mailto:sebastian.este...@datastax.com> >>>> <mailto:sebastian.este...@datastax.com >>>> <mailto:sebastian.este...@datastax.com>>>> >>>> wrote: >>>> >>>> You're sstables are probably falling out of page >>>> cache on the >>>> smaller nodes and your slow disks are killing your >>>> latencies. >>>> >>>> >>>> +1 most likely. >>>> >>>> Are the heaps the same size on both machines? >>>> >>>> =Rob >>>> >>>> >>>> -- >>>> Antoine Bonavita (anto...@stickyads.tv >>>> <mailto:anto...@stickyads.tv> >>>> <mailto:anto...@stickyads.tv >>>> <mailto:anto...@stickyads.tv>>) - CTO StickyADS.tv >>>> Tel: +33 6 34 33 47 36 <tel:%2B33%206%2034%2033%2047%2036> >>>> <tel:%2B33%206%2034%2033%2047%2036>/+33 9 50 >>>> 68 21 32 <tel:%2B33%209%2050%2068%2021%2032> >>>> NEW YORK | LONDON | HAMBURG | PARIS | MONTPELLIER | MILAN | >>>> MADRID >>>> >>>> >>>> >>>> -- >>>> Antoine Bonavita (anto...@stickyads.tv >>>> <mailto:anto...@stickyads.tv>) - CTO StickyADS.tv >>>> Tel: +33 6 34 33 47 36 <tel:%2B33%206%2034%2033%2047%2036>/+33 9 50 >>>> 68 21 32 <tel:%2B33%209%2050%2068%2021%2032> >>>> NEW YORK | LONDON | HAMBURG | PARIS | MONTPELLIER | MILAN | MADRID >>>> >>>> >>>> >>> >> > -- > Antoine Bonavita (anto...@stickyads.tv) - CTO StickyADS.tv > Tel: +33 6 34 33 47 36/+33 9 50 68 21 32 > NEW YORK | LONDON | HAMBURG | PARIS | MONTPELLIER | MILAN | MADRID > -- Thanks, Ryan Svihla