When you say drop you mean reduce the value (to 1 day for example),
not "don't set the value", right ?
Yes.
If I set max sstable age days to 1, my understanding is that
SSTables with expired data (5 days) are not going to be compacted
ever. And therefore my disk usage will keep growing forever. Did I
miss something here ?
We will expire sstables who's highest TTL is beyond gc_grace_seconds as
of CASSANDRA-5228
<https://issues.apache.org/jira/browse/CASSANDRA-5228>. This is nice
because the sstable is just dropped for free, no need to scan it and
remove tombstones which is very expensive and DTCS will guarantee that
all the data within an sstable is close together in time.
So, if I set max sstable age days to 1, I have to run repairs at
least once a day, correct ?
I'm afraid I don't get your point about painful compactions.
I was referring to the problems described here CASSANDRA-9644
<https://issues.apache.org/jira/browse/CASSANDRA-9644>
All the best,
datastax_logo.png <http://www.datastax.com/>
Sebastián Estévez
Solutions Architect |954 905 8615 | sebastian.este...@datastax.com
<mailto:sebastian.este...@datastax.com>
linkedin.png <https://www.linkedin.com/company/datastax>facebook.png
<https://www.facebook.com/datastax>twitter.png
<https://twitter.com/datastax>g+.png
<https://plus.google.com/+Datastax/about><http://feeds.feedburner.com/datastax>
<http://goog_410786983>
<http://www.datastax.com/gartner-magic-quadrant-odbms>
DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to
any size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.
On Wed, Nov 18, 2015 at 5:53 PM, Antoine Bonavita <anto...@stickyads.tv
<mailto:anto...@stickyads.tv>> wrote:
Sebastian,
Your help is very much appreciated. I re-read the blog post and also
https://labs.spotify.com/2014/12/18/date-tiered-compaction/ but some
things are still confusing me.
Please see my questions inline below.
On 11/18/2015 04:21 PM, Sebastian Estevez wrote:
Yep, I think you've mixed up your DTCS levers. I would read, or
re-read
Marcus's post
http://www.datastax.com/dev/blog/datetieredcompactionstrategy
*TL;DR:*
* *base_time_seconds* is the size of your initial window
* *max_sstable_age_days* is the time after which you stop
compacting
sstables
* *default_time_to_live* is the time after which data
expires and
sstables will start to become available for GC. (432000 is
5 days)
Could it be that compaction is putting those in cache
constantly?
Yep, you'll keep compacting sstables until they're 10 days old
per your
current settings and when you compact there are reads and then
writes.
If you aren't doing any updates and most of your reads are
within 1
hour, you can probably afford to drop max sstable age days.
When you say drop you mean reduce the value (to 1 day for example),
not "don't set the value", right ?
If I set max sstable age days to 1, my understanding is that
SSTables with expired data (5 days) are not going to be compacted
ever. And therefore my disk usage will keep growing forever. Did I
miss something here ?
Just make
sure you're doing your repairs more often than the max sstable
age days
to avoid some painful compactions.
So, if I set max sstable age days to 1, I have to run repairs at
least once a day, correct ?
I'm afraid I don't get your point about painful compactions.
Along the same lines, you should probably set
dclocal_read_repair_chance
to 0
Will try that.
Regarding the heap configuration, both are very similar
Probably unrelated but, is there a reason why they're not
identical?
Especially the different new gen size could have gc
implications.
Both are calculated by cassandra-env.sh. If my bash skills are still
intact, the NewGen size difference comes from the number of cores:
the 64G machine has 12 cores where the 32G machine has 8 cores (I
did not even realize this before looking into this, that's why I did
not mention it in my previous emails).
Thanks a lot for your help.
A.
All the best,
datastax_logo.png <http://www.datastax.com/>
Sebastián Estévez
Solutions Architect |954 905 8615 <tel:954%20905%208615> |
sebastian.este...@datastax.com
<mailto:sebastian.este...@datastax.com>
<mailto:sebastian.este...@datastax.com
<mailto:sebastian.este...@datastax.com>>
linkedin.png
<https://www.linkedin.com/company/datastax>facebook.png
<https://www.facebook.com/datastax>twitter.png
<https://twitter.com/datastax>g+.png
<https://plus.google.com/+Datastax/about><http://feeds.feedburner.com/datastax>
<http://goog_410786983>
<http://www.datastax.com/gartner-magic-quadrant-odbms>
DataStax is the fastest, most scalable distributed database
technology,
delivering Apache Cassandra to the world’s most innovative
enterprises.
Datastax is built to be agile, always-on, and predictably
scalable to
any size. With more than 500 customers in 45 countries, DataStax
is the
database technology and transactional backbone of choice for the
worlds
most innovative companies such as Netflix, Adobe, Intuit, and
eBay.
On Wed, Nov 18, 2015 at 6:44 AM, Antoine Bonavita
<anto...@stickyads.tv <mailto:anto...@stickyads.tv>
<mailto:anto...@stickyads.tv <mailto:anto...@stickyads.tv>>>
wrote:
Sebastian, Robet,
First, a big thank you to both of you for your help.
It looks like you were right. I used pcstat (awesome tool,
thanks
for that as well) and it appears some files I would not
expect to be
in cache actually are. Here is a sample of my output
(edited for
convenience, adding the file timestamp from the OS):
*
/var/lib/cassandra/data/views/views-451e4d8061ef11e5896f091196a360a0/la-5951-big-Data.db
- 000.619 % - Nov 16 12:25
*
/var/lib/cassandra/data/views/views-451e4d8061ef11e5896f091196a360a0/la-5954-big-Data.db
- 000.681 % - Nov 16 13:44
*
/var/lib/cassandra/data/views/views-451e4d8061ef11e5896f091196a360a0/la-5955-big-Data.db
- 000.610 % - Nov 16 14:11
*
/var/lib/cassandra/data/views/views-451e4d8061ef11e5896f091196a360a0/la-5956-big-Data.db
- 015.621 % - Nov 16 14:26
*
/var/lib/cassandra/data/views/views-451e4d8061ef11e5896f091196a360a0/la-5957-big-Data.db
- 015.558 % - Nov 16 14:50
The SSTables that come before are all at about 0% and the
ones that
come after it are all at about 15%.
As you can see the first SSTable at 15% date back from 24h.
Given my
application I'm pretty sure those are not from the reads
(reads of
data older than 1h is definitely under 0.1% of reads).
Could it be
that compaction is putting those in cache constantly ?
If so, then I'm probably confused on the meaning/effect of
max_sstable_age_days (set at 10 in my case) and
base_time_seconds
(not set in my case so the default of 3600 applies). I
would not
expect any compaction to happen beyond the first hour and
the 10
days is here to make sure data still gets expired and
SSTables
removed (thus releasing disk space). I don't see where the
24h come
from.
If you guys can shed some light on this, it would be
awesome. I'm
sure I got something wrong.
Regarding the heap configuration, both are very similar:
* 32G machine: -Xms8049M -Xmx8049M -Xmn800M
* 64G machine: -Xms8192M -Xmx8192M -Xmn1200M
I think we can rule that out.
Thanks again for you help, I truly appreciate it.
A.
On 11/17/2015 08:48 PM, Robert Coli wrote:
On Tue, Nov 17, 2015 at 11:08 AM, Sebastian Estevez
<sebastian.este...@datastax.com
<mailto:sebastian.este...@datastax.com>
<mailto:sebastian.este...@datastax.com
<mailto:sebastian.este...@datastax.com>>
<mailto:sebastian.este...@datastax.com
<mailto:sebastian.este...@datastax.com>
<mailto:sebastian.este...@datastax.com
<mailto:sebastian.este...@datastax.com>>>>
wrote:
You're sstables are probably falling out of page
cache on the
smaller nodes and your slow disks are killing your
latencies.
+1 most likely.
Are the heaps the same size on both machines?
=Rob
--
Antoine Bonavita (anto...@stickyads.tv
<mailto:anto...@stickyads.tv>
<mailto:anto...@stickyads.tv
<mailto:anto...@stickyads.tv>>) - CTO StickyADS.tv
Tel: +33 6 34 33 47 36 <tel:%2B33%206%2034%2033%2047%2036>
<tel:%2B33%206%2034%2033%2047%2036>/+33 9 50
68 21 32 <tel:%2B33%209%2050%2068%2021%2032>
NEW YORK | LONDON | HAMBURG | PARIS | MONTPELLIER | MILAN |
MADRID
--
Antoine Bonavita (anto...@stickyads.tv
<mailto:anto...@stickyads.tv>) - CTO StickyADS.tv
Tel: +33 6 34 33 47 36 <tel:%2B33%206%2034%2033%2047%2036>/+33 9 50
68 21 32 <tel:%2B33%209%2050%2068%2021%2032>
NEW YORK | LONDON | HAMBURG | PARIS | MONTPELLIER | MILAN | MADRID