On Tue, Jul 13, 2010 at 10:26 PM, Jonathan Ellis <jbel...@gmail.com> wrote:

>
> I'm totally fine with saying "Here's a JNI library for Linux [or even
> Linux version >= 2.6.X]" since that makes up 99% of our production
> deployments, and leaving the remaining 1% with the status quo.
>

You really need to say Linux > 2.6 and filesystem xyz .

That probably reduces the percentage a bit, but probably not critically.

It is quite a while since I have written code for directio (I really try to
avoid using it anymore), but from memory, as long as there is a framework
which is somewhat extendable and can be used as a basis for new platforms,
it should be reasonably trivial for a somewhat experienced person to add a
new unix like platform in a couple of days.

No idea for windows. I have never written code for this there.


> > O_DIRECT also bypasses the cache completely
>
> Right, that's the idea. :)
>

Hm... I would have thought it was clear that my idea is that you do want to
interact with the cache if you can! :)

Under high load, you might reduce performance 10-30% by throwing out the
scheduling benefits you get from the OS (yes, that is based on real life
experience). Of course... that is given that you can somehow can avoid the
worst case scenarios without direct I/O. As always, things will differ from
use case to use case.

A well performing HW raid card with sufficient writeback cache might also
help reduce the negative impact of directio.

Funny enough, it is often the systems with light read load that is hardest
hit. Systems with heavy read load have more pressure on the cache on the
read side and the write will not push content out of the cache (or
applications out of physical memory) as easily. To make things more
annoying, OSes (not just linux) has a tendency of behaving different from
release to release. What is a problem on one linux release is not
necessarily a problem on another.

I have not seen huge problems when compacting on cassandra in terms of I/O
myself, but I am currently working on HW with loads of memory, so I might
not see the problems others see. I am more concerned with other performance
issues at the moment.

One nifty effect which may, or may not, be worth looking into, is what
happens when you flip over to the new compacted SSTable, the last thing you
write to the new compacted table will be there ready in cache to be read
once you start using it. It can as such be worth ordering the compaction so
that the most performance critical parts are written last and they are
written without direct I/O or similar settings so they will be ready in
cache when needed.

I am not sure to what extent parts of the SSTables have structures of
importance like this for Cassandra. Haven't really thought about it until
now.

Might also be worth looking at IO scheduler settings in the linux kernel.
Some of the io schedulers also supports ionice/io priorities.

I have never used it on single threads, but I have read that ioprio_set()
accepts thread id's (not just process ids like the man page indicate). While
not super efficient, in my experience, on preventing cache flushing of
mostly idle data, if the compaction I/O occurs in isolated threads so ionice
can be applied to that thread, it should help.



> Exactly: it the fadvise mode that would actually be useful to us, is a
> no-op and not likely to change soon. A bit of history:
>
> Interesting, I had not seen that before.
Thanks!

Terje

Reply via email to