Drop the index, then restart once more. It shouldn't try to rebuild the index after that.
On Thu, Mar 10, 2011 at 3:36 PM, Matt Kennedy <stinkym...@gmail.com> wrote: > Sorry, I wasn't clear on the timeline of events. I started the index build > and then posted this message to the list. Once I read the links you posted, > I did expect the cluster to crash, but I let it run until it blew up anyway, > since I didn't really know how to stop the index build. > > Which is sort of where I'm still stuck, I don't want to corrupt that column > family by issuing an "update column family" that has a smaller set of > indexes while the index build is going on without some encouragement from > the list that doing that won't wreck the column family. Is there a safe way > to tell an index build to stop after the cluster starts up from a crash due > to the index build? > > Thanks, > Matt > > On Thu, Mar 10, 2011 at 1:40 PM, Jonathan Ellis <jbel...@gmail.com> wrote: >> >> If you read the bugs I linked, you would see that this is expected >> behavior with 0.7.3 once you get more data than you can index >> in-memory. >> >> You should wait for the next Hudson build (which will include 2295) >> and use that. Or, create your indexes before adding the data. >> >> On Thu, Mar 10, 2011 at 12:26 PM, Matt Kennedy <stinkym...@gmail.com> >> wrote: >> > Well it looks like the index creation job crashed the cluster. All of >> > the >> > nodes were down having dumped out .hprof files. I brought the cluster >> > back >> > up and when I do "describe keyspace ks" it looks like the index build >> > process has started over again. Is it safe to attempt to stop that by >> > running an "update column family" command with fewer indexes defined? >> > Or is >> > there a better way to safely terminate this index creation process that >> > I >> > assume will crash the cluster again eventually? >> > >> > Would creating the indexes one at a time help? Or will the same problem >> > occur once I get to a certain number of indexes on the column family? >> > >> > Thanks, >> > Matt >> > >> > On Wed, Mar 9, 2011 at 8:40 PM, Jonathan Ellis <jbel...@gmail.com> >> > wrote: >> >> >> >> https://issues.apache.org/jira/browse/CASSANDRA-2294 >> >> https://issues.apache.org/jira/browse/CASSANDRA-2295 >> >> >> >> On Wed, Mar 9, 2011 at 5:47 PM, Matt Kennedy <stinkym...@gmail.com> >> >> wrote: >> >> > I'm trying to gain some insight into what happens with a cluster when >> >> > indexes are being built, or when CFs with indexed columns are being >> >> > written >> >> > to. >> >> > >> >> > Over the past couple of days we've been doing some loads into a CF >> >> > with >> >> > 29 >> >> > indexed columns. Eventually, the nodes just got overwhelmed and the >> >> > client >> >> > (Hector) started getting timeouts. We were using using a MapReduce >> >> > job >> >> > to >> >> > load an HDFS file into Cassandra, though we had limited the load job >> >> > to >> >> > one >> >> > task per node. My confusion comes from how difficult it was to know >> >> > that >> >> > the nodes were becoming overwhelmed. The ring consistently reported >> >> > that >> >> > all nodes were up and it did not appear that there were pending >> >> > operations >> >> > under tpstats. I also monitor this cluster with Ganglia, and at no >> >> > point >> >> > did any of the machine loads appear very high at all, yet our job >> >> > kept >> >> > failing with Hector reporting timeouts. >> >> > >> >> > Today we decided to leave index creation until the end, and just load >> >> > the >> >> > data using the same Hector code. We bumped up the hadoop concurrency >> >> > to >> >> > two >> >> > concurrent tasks per node, and everything went fine, as expected, >> >> > we've >> >> > done >> >> > much larger loads than this using Hadoop and as long as you don't >> >> > shoot >> >> > for >> >> > too much concurrency, Cassandra can deal with it. So now we have the >> >> > data >> >> > in the column family and I updated the column family metadata in the >> >> > CLI >> >> > to >> >> > enable the 29 indexes. As soon as I do that, the ring starts >> >> > reporting >> >> > that >> >> > nodes are down intermittently, and HintedHandoffs are starting to >> >> > accumulate >> >> > under tpstats. Ganglia is reporting very low overall load, so I'm >> >> > wondering >> >> > why it's taking so long for cli and nodetool commands to return. >> >> > >> >> > I'm just trying to get a better handle on what kind of actions have a >> >> > serious impact on cluster availability and to know the right places >> >> > to >> >> > look >> >> > to try to get ahead of those conditions. >> >> > >> >> > Thanks for any insight you can provide, >> >> > Matt >> >> > >> >> >> >> >> >> >> >> -- >> >> Jonathan Ellis >> >> Project Chair, Apache Cassandra >> >> co-founder of DataStax, the source for professional Cassandra support >> >> http://www.datastax.com >> > >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com