Re: Understanding index builds (updated: crashed cluster)

Jonathan Ellis Thu, 10 Mar 2011 13:58:08 -0800

Drop the index, then restart once more.  It shouldn't try to rebuild
the index after that.


On Thu, Mar 10, 2011 at 3:36 PM, Matt Kennedy <stinkym...@gmail.com> wrote:
> Sorry, I wasn't clear on the timeline of events.  I started the index build
> and then posted this message to the list. Once I read the links you posted,
> I did expect the cluster to crash, but I let it run until it blew up anyway,
> since I didn't really know how to stop the index build.
>
> Which is sort of where I'm still stuck, I don't want to corrupt that column
> family by issuing an "update column family" that has a smaller set of
> indexes while the index build is going on without some encouragement from
> the list that doing that won't wreck the column family. Is there a safe way
> to tell an index build to stop after the cluster starts up from a crash due
> to the index build?
>
> Thanks,
> Matt
>
> On Thu, Mar 10, 2011 at 1:40 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
>>
>> If you read the bugs I linked, you would see that this is expected
>> behavior with 0.7.3 once you get more data than you can index
>> in-memory.
>>
>> You should wait for the next Hudson build (which will include 2295)
>> and use that.  Or, create your indexes before adding the data.
>>
>> On Thu, Mar 10, 2011 at 12:26 PM, Matt Kennedy <stinkym...@gmail.com>
>> wrote:
>> > Well it looks like the index creation job crashed the cluster.  All of
>> > the
>> > nodes were down having dumped out .hprof files.  I brought the cluster
>> > back
>> > up and when I do "describe keyspace ks" it looks like the index build
>> > process has started over again.  Is it safe to attempt to stop that by
>> > running an "update column family" command with fewer indexes defined?
>> > Or is
>> > there a better way to safely terminate this index creation process that
>> > I
>> > assume will crash the cluster again eventually?
>> >
>> > Would creating the indexes one at a time help? Or will the same problem
>> > occur once I get to a certain number of indexes on the column family?
>> >
>> > Thanks,
>> > Matt
>> >
>> > On Wed, Mar 9, 2011 at 8:40 PM, Jonathan Ellis <jbel...@gmail.com>
>> > wrote:
>> >>
>> >> https://issues.apache.org/jira/browse/CASSANDRA-2294
>> >> https://issues.apache.org/jira/browse/CASSANDRA-2295
>> >>
>> >> On Wed, Mar 9, 2011 at 5:47 PM, Matt Kennedy <stinkym...@gmail.com>
>> >> wrote:
>> >> > I'm trying to gain some insight into what happens with a cluster when
>> >> > indexes are being built, or when CFs with indexed columns are being
>> >> > written
>> >> > to.
>> >> >
>> >> > Over the past couple of days we've been doing some loads into a CF
>> >> > with
>> >> > 29
>> >> > indexed columns.  Eventually, the nodes just got overwhelmed and the
>> >> > client
>> >> > (Hector) started getting timeouts.  We were using using a MapReduce
>> >> > job
>> >> > to
>> >> > load an HDFS file into Cassandra, though we had limited the load job
>> >> > to
>> >> > one
>> >> > task per node.  My confusion comes from how difficult it was to know
>> >> > that
>> >> > the nodes were becoming overwhelmed.  The ring consistently reported
>> >> > that
>> >> > all nodes were up and it did not appear that there were pending
>> >> > operations
>> >> > under tpstats.  I also monitor this cluster with Ganglia, and at no
>> >> > point
>> >> > did any of the machine loads appear very high at all, yet our job
>> >> > kept
>> >> > failing with Hector reporting timeouts.
>> >> >
>> >> > Today we decided to leave index creation until the end, and just load
>> >> > the
>> >> > data using the same Hector code.  We bumped up the hadoop concurrency
>> >> > to
>> >> > two
>> >> > concurrent tasks per node, and everything went fine, as expected,
>> >> > we've
>> >> > done
>> >> > much larger loads than this using Hadoop and as long as you don't
>> >> > shoot
>> >> > for
>> >> > too much concurrency, Cassandra can deal with it.  So now we have the
>> >> > data
>> >> > in the column family and I updated the column family metadata in the
>> >> > CLI
>> >> > to
>> >> > enable the 29 indexes.  As soon as I do that, the ring starts
>> >> > reporting
>> >> > that
>> >> > nodes are down intermittently, and HintedHandoffs are starting to
>> >> > accumulate
>> >> > under tpstats. Ganglia is reporting very low overall load, so I'm
>> >> > wondering
>> >> > why it's taking so long for cli and nodetool commands to return.
>> >> >
>> >> > I'm just trying to get a better handle on what kind of actions have a
>> >> > serious impact on cluster availability and to know the right places
>> >> > to
>> >> > look
>> >> > to try to get ahead of those conditions.
>> >> >
>> >> > Thanks for any insight you can provide,
>> >> > Matt
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Jonathan Ellis
>> >> Project Chair, Apache Cassandra
>> >> co-founder of DataStax, the source for professional Cassandra support
>> >> http://www.datastax.com
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Understanding index builds (updated: crashed cluster)

Reply via email to