Re: Scaling problems

Ian Soboroff Fri, 21 May 2010 11:32:30 -0700

Ok, spending some time slogging through the cassandra-user archives.  Seems
lots of folks have this problem.  Starting with a JVM upgrade, then skimming
through JIRA looking for patches.


Ian

On Fri, May 21, 2010 at 12:09 PM, Ian Soboroff <isobor...@gmail.com> wrote:

> So at the moment, I'm not running my loader, and I'm looking at one node
> which is slow to respond to nodetool requests.  At this point, it has a pile
> of hinted-handoffs pending which don't seem to be draining out.  The
> system.log shows that it's GCing pretty much constantly.
> Ian
>
>
> $ /usr/local/src/cassandra/bin/nodetool --host node7 tpstats
> Pool Name                    Active   Pending      Completed
> FILEUTILS-DELETE-POOL             0         0            178
> STREAM-STAGE                      0         0              0
> RESPONSE-STAGE                    0         0          21852
> ROW-READ-STAGE                    0         0              0
> LB-OPERATIONS                     0         0              0
> MESSAGE-DESERIALIZER-POOL         0         0        1648536
> GMFD                              0         0         125430
> LB-TARGET                         0         0              0
> CONSISTENCY-MANAGER               0         0              0
> ROW-MUTATION-STAGE                2         2        1886537
> MESSAGE-STREAMING-POOL            0         0              0
> LOAD-BALANCER-STAGE               0         0              0
> FLUSH-SORTER-POOL                 0         0              0
> MEMTABLE-POST-FLUSHER             0         0            206
> FLUSH-WRITER-POOL                 0         0            206
> AE-SERVICE-STAGE                  0         0              0
> HINTED-HANDOFF-POOL               1       158             23
>
>
>
> On Fri, May 21, 2010 at 10:37 AM, Ian Soboroff <isobor...@gmail.com>wrote:
>
>> On the to-do list for today.  Is there a tool to aggregate all  the JMX
>> stats from all nodes?  I mean, something a little more complete than nagios.
>> Ian
>>
>>
>> On Fri, May 21, 2010 at 10:23 AM, Jonathan Ellis <jbel...@gmail.com>wrote:
>>
>>> you should check the jmx stages I posted about
>>>
>>> On Fri, May 21, 2010 at 7:05 AM, Ian Soboroff <isobor...@gmail.com>
>>> wrote:
>>> > Just an update.  I rolled the memtable size back to 128MB.  I am still
>>> > seeing that the daemon runs for a while with reasonable heap usage, but
>>> then
>>> > the heap climbs up to the max (6GB in this case, should be plenty) and
>>> it
>>> > starts GCing, without much getting cleared.  The client catches lots of
>>> > exceptions, where I wait 30 seconds and try again, with a new client if
>>> > necessary, but it doesn't clear up.
>>> >
>>> > Could this be related to memory leak problems I've skimmed past on the
>>> list
>>> > here?
>>> >
>>> > It can't be that I'm creating rows a bit at a time... once I stick a
>>> web
>>> > page into two CFs, it's over and done with for this application.  I'm
>>> just
>>> > trying to get stuff loaded.
>>> >
>>> > Is there a limit to how much on-disk data a Cassandra daemon can
>>> manage?  Is
>>> > there runtime overhead associated with stuff on disk?
>>> >
>>> > Ian
>>> >
>>> > On Thu, May 20, 2010 at 9:31 PM, Ian Soboroff <isobor...@gmail.com>
>>> wrote:
>>> >>
>>> >> Excellent leads, thanks.  cassandra.in.sh has a heap of 6GB, but I
>>> didn't
>>> >> realize that I was trying to float so many memtables.  I'll poke
>>> tomorrow
>>> >> and report if it gets fixed.
>>> >> Ian
>>> >>
>>> >> On Thu, May 20, 2010 at 10:40 AM, Jonathan Ellis <jbel...@gmail.com>
>>> >> wrote:
>>> >>>
>>> >>> Some possibilities:
>>> >>>
>>> >>> You didn't adjust Cassandra heap size in cassandra.in.sh (1GB is too
>>> >>> small)
>>> >>> You're inserting at CL.ZERO (ROW-MUTATION-STAGE in tpstats will show
>>> >>> large pending ops -- large = 100s)
>>> >>> You're creating large rows a bit at a time and Cassandra OOMs when it
>>> >>> tries to compact (the oom should usually be in the compaction thread)
>>> >>> You have your 5 disks each with a separate data directory, which will
>>> >>> allow up to 12 total memtables in-flight internally, and 12*256 is
>>> too
>>> >>> much for the heap size you have (FLUSH-WRITER-STAGE in tpstats will
>>> >>> show large pending ops -- large = more than 2 or 3)
>>> >>>
>>> >>> On Tue, May 18, 2010 at 6:24 AM, Ian Soboroff <isobor...@gmail.com>
>>> >>> wrote:
>>> >>> > I hope this isn't too much of a newbie question.  I am using
>>> Cassandra
>>> >>> > 0.6.1
>>> >>> > on a small cluster of Linux boxes - 14 nodes, each with 8GB RAM and
>>> 5
>>> >>> > data
>>> >>> > drives.  The nodes are running HDFS to serve files within the
>>> cluster,
>>> >>> > but
>>> >>> > at the moment the rest of Hadoop is shut down.  I'm trying to load
>>> a
>>> >>> > large
>>> >>> > set of web pages (the ClueWeb collection, but more is coming) and
>>> my
>>> >>> > Cassandra daemons keep dying.
>>> >>> >
>>> >>> > I'm loading the pages into a simple column family that lets me
>>> fetch
>>> >>> > out
>>> >>> > pages by an internal ID or by URL.  The biggest thing in the row is
>>> the
>>> >>> > page
>>> >>> > content, maybe 15-20k per page of raw HTML.  There aren't a lot of
>>> >>> > columns.
>>> >>> > I tried Thrift, Hector, and the BMT interface, and at the moment
>>> I'm
>>> >>> > doing
>>> >>> > batch mutations over Thrift, about 2500 pages per batch, because
>>> that
>>> >>> > was
>>> >>> > fastest for me in testing.
>>> >>> >
>>> >>> > At this point, each Cassandra node has between 500GB and 1.5TB
>>> >>> > according to
>>> >>> > nodetool ring.  Let's say I start the daemons up, and they all go
>>> live
>>> >>> > after
>>> >>> > a couple minutes of scanning the tables.  I then start my importer,
>>> >>> > which is
>>> >>> > a single Java process reading Clueweb bundles over HDFS, cutting
>>> them
>>> >>> > up,
>>> >>> > and sending the mutations to Cassandra.  I only talk to one node at
>>> a
>>> >>> > time,
>>> >>> > switching to a new node when I get an exception.  As the job runs
>>> over
>>> >>> > a few
>>> >>> > hours, the Cassandra daemons eventually fall over, either with no
>>> error
>>> >>> > in
>>> >>> > the log or reporting that they are out of heap.
>>> >>> >
>>> >>> > Each daemon is getting 6GB of RAM and has scads of disk space to
>>> play
>>> >>> > with.
>>> >>> > I've set the storage-conf.xml to take 256MB in a memtable before
>>> >>> > flushing
>>> >>> > (like the BMT case), and to do batch commit log flushes, and to not
>>> >>> > have any
>>> >>> > caching in the CFs.  I'm sure I must be tuning something wrong.  I
>>> >>> > would
>>> >>> > eventually like this Cassandra setup to serve a light request load
>>> but
>>> >>> > over
>>> >>> > say 50-100 TB of data.  I'd appreciate any help or advice you can
>>> >>> > offer.
>>> >>> >
>>> >>> > Thanks,
>>> >>> > Ian
>>> >>> >
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Jonathan Ellis
>>> >>> Project Chair, Apache Cassandra
>>> >>> co-founder of Riptano, the source for professional Cassandra support
>>> >>> http://riptano.com
>>> >>
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of Riptano, the source for professional Cassandra support
>>> http://riptano.com
>>>
>>
>>
>

Re: Scaling problems

Reply via email to