Edward, Thank you so much for this reply - this is great stuff, and I really appreciate it.
You'll be happy to know that I've already pre-ordered your book. I'm looking forward to it! (When is the ship date?) Best regards, Les On Wed, Jun 22, 2011 at 7:03 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > > > On Wed, Jun 22, 2011 at 8:31 PM, Les Hazlewood <l...@katasoft.com> wrote: > >> Hi Thoku, >> >> You were able to more concisely represent my intentions (and their >> reasoning) in this thread than I was able to do so myself. Thanks! >> >> On Wed, Jun 22, 2011 at 5:14 PM, Thoku Hansen <tho...@gmail.com> wrote: >> >>> I think that Les's question was reasonable. Why *not* ask the community >>> for the 'gotchas'? >>> >>> Whether the info is already documented or not, it could be an opportunity >>> to improve the documentation based on users' perception. >>> >>> The "you just have to learn" responses are fair also, but that reminds me >>> of the days when running Oracle was a black art, and accumulated wisdom made >>> DBAs irreplaceable. >>> >> >> Yes, this was my initial concern. I know that Cassandra is still young, >> and I expect this to be the norm for a while, but I was hoping to make that >> process a bit easier (for me and anyone else reading this thread in the >> future). >> >> Some recommendations *are* documented, but they are dispersed / stale / >>> contradictory / or counter-intuitive. >>> >>> Others have not been documented in the wiki nor in DataStax's doco, and >>> are instead learned anecdotally or The Hard Way. >>> >>> For example, whether documented or not, some of the 'gotchas' that I >>> encountered when I first started working with Cassandra were: >>> >>> * Don't use OpenJDK. Prefer the Sun JDK. (Wiki says >>> this<http://wiki.apache.org/cassandra/GettingStarted> >>> , Jira says that <https://issues.apache.org/jira/browse/CASSANDRA-2441> >>> ). >>> * Its not viable to run without JNA installed. >>> * Disable swap memory. >>> * Need to run nodetool repair on a regular basis. >>> >>> I'm looking forward to Edward Capriolo's Cassandra >>> book<https://www.packtpub.com/cassandra-apache-high-performance-cookbook/book> >>> which >>> Les will probably find helpful. >>> >> >> Thanks for linking to this. I'm pre-ordering right away. >> >> And thanks for the pointers, they are exactly the kind of enumerated >> things I was looking to elicit. These are the kinds of things that are hard >> to track down in a single place. I think it'd be nice for the community to >> contribute this stuff to a single page ('best practices', 'checklist', >> whatever you want to call it). It would certainly make things easier when >> getting started. >> >> Thanks again, >> >> Les >> > > Since I got a plug on the book I will chip in again to the thread :) > > Some things that were mentioned already: > > Install JNA absolutely (without JNA the snapshot command has to fork to > hard link the sstables, I have seen clients backoff from this). Also the > performance focused Cassandra devs always try to squeeze out performance by > utilizing more native features. > > OpenJDK vs Sun. I agree, almost always try to do what 'most others' do in > production, this way you get surprised less. > > Other stuff: > > RAID. You might want to go RAID 1+0 if you are aiming for uptime. RAID 0 > has better performance, but if you lose a node your capacity is diminished, > rebuilding and rejoining a node involves more manpower more steps and more > chances for human error. > > Collect statistics on the normal system items CPU, disk (size and > utilization), memory. Then collect the JMX cassandra counters and understand > how they interact. For example record ReadCount and WriteCount per column > family, then use try to determine how this effects disk utilization. You can > use this for capacity planning. Then try using a key/row cache. Evaluate > again. Check the hit rate graph for your new cache. How did this effect your > disk? You want to head off anything that can be a performance killer like > traffic patterns changing or data growing significantly. > > Do not be short on hardware. I do not want to say "overbuy" but if uptime > is important have spares drives and servers and have room to grow. > > Balance that ring :) > > I have not read the original thread concerning the problem you mentioned. > One way to avoid OOM is large amounts of RAM :) On a more serious note most > OOM's are caused by setting caches or memtables too large. If the OOM was > caused by a software bug, the cassandra devs are on the ball and move fast. > I still suggest not jumping into a release right away. I know its hard to > live without counters or CQL since new things are super cool. But if you > want all those 9s your going to have to stay disciplined. Unless a release > has a fix for a problem you think you have, stay a minor or revision back, > or at least wait some time before upgrading to it, and do some internal > confidence testing before pulling the trigger on an update. > > Almost all usecases demand that repair be run regularly due to the nature > of distributed deletes. > > Other good tips, subscribe to all the mailing lists, and hang out in the > IRC channels cassandra, cassandra-dev, cassandra-ops. You get an osmoses > learning effect and you learn to fix or head off issues you never had. >