Re: Understanding index builds (updated: crashed cluster)

Matt Kennedy Thu, 10 Mar 2011 13:36:56 -0800

Sorry, I wasn't clear on the timeline of events.  I started the index build
and then posted this message to the list. Once I read the links you posted,
I did expect the cluster to crash, but I let it run until it blew up anyway,
since I didn't really know how to stop the index build.


Which is sort of where I'm still stuck, I don't want to corrupt that column
family by issuing an "update column family" that has a smaller set of
indexes while the index build is going on without some encouragement from
the list that doing that won't wreck the column family. Is there a safe way
to tell an index build to stop after the cluster starts up from a crash due
to the index build?

Thanks,
Matt

On Thu, Mar 10, 2011 at 1:40 PM, Jonathan Ellis <jbel...@gmail.com> wrote:

> If you read the bugs I linked, you would see that this is expected
> behavior with 0.7.3 once you get more data than you can index
> in-memory.
>
> You should wait for the next Hudson build (which will include 2295)
> and use that.  Or, create your indexes before adding the data.
>
> On Thu, Mar 10, 2011 at 12:26 PM, Matt Kennedy <stinkym...@gmail.com>
> wrote:
> > Well it looks like the index creation job crashed the cluster.  All of
> the
> > nodes were down having dumped out .hprof files.  I brought the cluster
> back
> > up and when I do "describe keyspace ks" it looks like the index build
> > process has started over again.  Is it safe to attempt to stop that by
> > running an "update column family" command with fewer indexes defined?  Or
> is
> > there a better way to safely terminate this index creation process that I
> > assume will crash the cluster again eventually?
> >
> > Would creating the indexes one at a time help? Or will the same problem
> > occur once I get to a certain number of indexes on the column family?
> >
> > Thanks,
> > Matt
> >
> > On Wed, Mar 9, 2011 at 8:40 PM, Jonathan Ellis <jbel...@gmail.com>
> wrote:
> >>
> >> https://issues.apache.org/jira/browse/CASSANDRA-2294
> >> https://issues.apache.org/jira/browse/CASSANDRA-2295
> >>
> >> On Wed, Mar 9, 2011 at 5:47 PM, Matt Kennedy <stinkym...@gmail.com>
> wrote:
> >> > I'm trying to gain some insight into what happens with a cluster when
> >> > indexes are being built, or when CFs with indexed columns are being
> >> > written
> >> > to.
> >> >
> >> > Over the past couple of days we've been doing some loads into a CF
> with
> >> > 29
> >> > indexed columns.  Eventually, the nodes just got overwhelmed and the
> >> > client
> >> > (Hector) started getting timeouts.  We were using using a MapReduce
> job
> >> > to
> >> > load an HDFS file into Cassandra, though we had limited the load job
> to
> >> > one
> >> > task per node.  My confusion comes from how difficult it was to know
> >> > that
> >> > the nodes were becoming overwhelmed.  The ring consistently reported
> >> > that
> >> > all nodes were up and it did not appear that there were pending
> >> > operations
> >> > under tpstats.  I also monitor this cluster with Ganglia, and at no
> >> > point
> >> > did any of the machine loads appear very high at all, yet our job kept
> >> > failing with Hector reporting timeouts.
> >> >
> >> > Today we decided to leave index creation until the end, and just load
> >> > the
> >> > data using the same Hector code.  We bumped up the hadoop concurrency
> to
> >> > two
> >> > concurrent tasks per node, and everything went fine, as expected,
> we've
> >> > done
> >> > much larger loads than this using Hadoop and as long as you don't
> shoot
> >> > for
> >> > too much concurrency, Cassandra can deal with it.  So now we have the
> >> > data
> >> > in the column family and I updated the column family metadata in the
> CLI
> >> > to
> >> > enable the 29 indexes.  As soon as I do that, the ring starts
> reporting
> >> > that
> >> > nodes are down intermittently, and HintedHandoffs are starting to
> >> > accumulate
> >> > under tpstats. Ganglia is reporting very low overall load, so I'm
> >> > wondering
> >> > why it's taking so long for cli and nodetool commands to return.
> >> >
> >> > I'm just trying to get a better handle on what kind of actions have a
> >> > serious impact on cluster availability and to know the right places to
> >> > look
> >> > to try to get ahead of those conditions.
> >> >
> >> > Thanks for any insight you can provide,
> >> > Matt
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of DataStax, the source for professional Cassandra support
> >> http://www.datastax.com
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: Understanding index builds (updated: crashed cluster)

Reply via email to