Thanks a lot that is comforting. We are also small at the moment so I definitely can relate with the idea of keeping small and simple at a level where it just works.
I see the new Apache version has a lot of fixes so I will try to upgrade before I look into downgrading. On Saturday, October 25, 2014, Laing, Michael <michael.la...@nytimes.com> wrote: > Since no one else has stepped in... > > We have run clusters with ridiculously small nodes - I have a production > cluster in AWS with 4GB nodes each with 1 CPU and disk-based instance > storage. It works fine but you can see those little puppies struggle... > > And I ran into problems such as you observe... > > Upgrading Java to the latest 1.7 and - most importantly - *reverting to > the default configuration, esp. for heap*, seemed to settle things down > completely. Also make sure that you are using the 'recommended production > settings' from the docs on your boxen. > > However we are running 2.0.x not 2.1.0 so YMMV. > > And we are switching to 15GB nodes w 2 heftier CPUs each and SSD storage - > still a 'small' machine, but much more reasonable for C*. > > However I can't say I am an expert, since I deliberately keep things so > simple that we do not encounter problems - it just works so I dig into > other stuff. > > ml > > > On Sat, Oct 25, 2014 at 5:22 PM, Maxime <maxim...@gmail.com > <javascript:_e(%7B%7D,'cvml','maxim...@gmail.com');>> wrote: > >> Hello, I've been trying to add a new node to my cluster ( 4 nodes ) for a >> few days now. >> >> I started by adding a node similar to my current configuration, 4 GB or >> RAM + 2 Cores on DigitalOcean. However every time, I would end up getting >> OOM errors after many log entries of the type: >> >> INFO [SlabPoolCleaner] 2014-10-25 13:44:57,240 >> ColumnFamilyStore.java:856 - Enqueuing flush of mycf: 5383 (0%) on-heap, 0 >> (0%) off-heap >> >> leading to: >> >> ka-120-Data.db (39291 bytes) for commitlog position >> ReplayPosition(segmentId=1414243978538, position=23699418) >> WARN [SharedPool-Worker-13] 2014-10-25 13:48:18,032 >> AbstractTracingAwareExecutorService.java:167 - Uncaught exception on thread >> Thread[SharedPool-Worker-13,5,main]: {} >> java.lang.OutOfMemoryError: Java heap space >> >> Thinking it had to do with either compaction somehow or streaming, 2 >> activities I've had tremendous issues with in the past; I tried to slow >> down the setstreamthroughput to extremely low values all the way to 5. I >> also tried setting setcompactionthoughput to 0, and then reading that in >> some cases it might be too fast, down to 8. Nothing worked, it merely >> vaguely changed the mean time to OOM but not in a way indicating either was >> anywhere a solution. >> >> The nodes were configured with 2 GB of Heap initially, I tried to crank >> it up to 3 GB, stressing the host memory to its limit. >> >> After doing some exploration (I am considering writing a Cassandra Ops >> documentation with lessons learned since there seems to be little of it in >> organized fashions), I read that some people had strange issues on >> lower-end boxes like that, so I bit the bullet and upgraded my new node to >> a 8GB + 4 Core instance, which was anecdotally better. >> >> To my complete shock, exact same issues are present, even raising the >> Heap memory to 6 GB. I figure it can't be a "normal" situation anymore, but >> must be a bug somehow. >> >> My cluster is 4 nodes, RF of 2, about 160 GB of data across all nodes. >> About 10 CF of varying sizes. Runtime writes are between 300 to 900 / >> second. Cassandra 2.1.0, nothing too wild. >> >> Has anyone encountered these kinds of issues before? I would really enjoy >> hearing about the experiences of people trying to run small-sized clusters >> like mine. From everything I read, Cassandra operations go very well on >> large (16 GB + 8 Cores) machines, but I'm sad to report I've had nothing >> but trouble trying to run on smaller machines, perhaps I can learn from >> other's experience? >> >> Full logs can be provided to anyone interested. >> >> Cheers >> > >