If you read the bugs I linked, you would see that this is expected behavior with 0.7.3 once you get more data than you can index in-memory.
You should wait for the next Hudson build (which will include 2295) and use that. Or, create your indexes before adding the data. On Thu, Mar 10, 2011 at 12:26 PM, Matt Kennedy <stinkym...@gmail.com> wrote: > Well it looks like the index creation job crashed the cluster. All of the > nodes were down having dumped out .hprof files. I brought the cluster back > up and when I do "describe keyspace ks" it looks like the index build > process has started over again. Is it safe to attempt to stop that by > running an "update column family" command with fewer indexes defined? Or is > there a better way to safely terminate this index creation process that I > assume will crash the cluster again eventually? > > Would creating the indexes one at a time help? Or will the same problem > occur once I get to a certain number of indexes on the column family? > > Thanks, > Matt > > On Wed, Mar 9, 2011 at 8:40 PM, Jonathan Ellis <jbel...@gmail.com> wrote: >> >> https://issues.apache.org/jira/browse/CASSANDRA-2294 >> https://issues.apache.org/jira/browse/CASSANDRA-2295 >> >> On Wed, Mar 9, 2011 at 5:47 PM, Matt Kennedy <stinkym...@gmail.com> wrote: >> > I'm trying to gain some insight into what happens with a cluster when >> > indexes are being built, or when CFs with indexed columns are being >> > written >> > to. >> > >> > Over the past couple of days we've been doing some loads into a CF with >> > 29 >> > indexed columns. Eventually, the nodes just got overwhelmed and the >> > client >> > (Hector) started getting timeouts. We were using using a MapReduce job >> > to >> > load an HDFS file into Cassandra, though we had limited the load job to >> > one >> > task per node. My confusion comes from how difficult it was to know >> > that >> > the nodes were becoming overwhelmed. The ring consistently reported >> > that >> > all nodes were up and it did not appear that there were pending >> > operations >> > under tpstats. I also monitor this cluster with Ganglia, and at no >> > point >> > did any of the machine loads appear very high at all, yet our job kept >> > failing with Hector reporting timeouts. >> > >> > Today we decided to leave index creation until the end, and just load >> > the >> > data using the same Hector code. We bumped up the hadoop concurrency to >> > two >> > concurrent tasks per node, and everything went fine, as expected, we've >> > done >> > much larger loads than this using Hadoop and as long as you don't shoot >> > for >> > too much concurrency, Cassandra can deal with it. So now we have the >> > data >> > in the column family and I updated the column family metadata in the CLI >> > to >> > enable the 29 indexes. As soon as I do that, the ring starts reporting >> > that >> > nodes are down intermittently, and HintedHandoffs are starting to >> > accumulate >> > under tpstats. Ganglia is reporting very low overall load, so I'm >> > wondering >> > why it's taking so long for cli and nodetool commands to return. >> > >> > I'm just trying to get a better handle on what kind of actions have a >> > serious impact on cluster availability and to know the right places to >> > look >> > to try to get ahead of those conditions. >> > >> > Thanks for any insight you can provide, >> > Matt >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com