Re: Understanding index builds (updated: crashed cluster)

Jonathan Ellis Thu, 10 Mar 2011 10:41:45 -0800

If you read the bugs I linked, you would see that this is expected
behavior with 0.7.3 once you get more data than you can index
in-memory.


You should wait for the next Hudson build (which will include 2295)
and use that.  Or, create your indexes before adding the data.

On Thu, Mar 10, 2011 at 12:26 PM, Matt Kennedy <stinkym...@gmail.com> wrote:
> Well it looks like the index creation job crashed the cluster.  All of the
> nodes were down having dumped out .hprof files.  I brought the cluster back
> up and when I do "describe keyspace ks" it looks like the index build
> process has started over again.  Is it safe to attempt to stop that by
> running an "update column family" command with fewer indexes defined?  Or is
> there a better way to safely terminate this index creation process that I
> assume will crash the cluster again eventually?
>
> Would creating the indexes one at a time help? Or will the same problem
> occur once I get to a certain number of indexes on the column family?
>
> Thanks,
> Matt
>
> On Wed, Mar 9, 2011 at 8:40 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-2294
>> https://issues.apache.org/jira/browse/CASSANDRA-2295
>>
>> On Wed, Mar 9, 2011 at 5:47 PM, Matt Kennedy <stinkym...@gmail.com> wrote:
>> > I'm trying to gain some insight into what happens with a cluster when
>> > indexes are being built, or when CFs with indexed columns are being
>> > written
>> > to.
>> >
>> > Over the past couple of days we've been doing some loads into a CF with
>> > 29
>> > indexed columns.  Eventually, the nodes just got overwhelmed and the
>> > client
>> > (Hector) started getting timeouts.  We were using using a MapReduce job
>> > to
>> > load an HDFS file into Cassandra, though we had limited the load job to
>> > one
>> > task per node.  My confusion comes from how difficult it was to know
>> > that
>> > the nodes were becoming overwhelmed.  The ring consistently reported
>> > that
>> > all nodes were up and it did not appear that there were pending
>> > operations
>> > under tpstats.  I also monitor this cluster with Ganglia, and at no
>> > point
>> > did any of the machine loads appear very high at all, yet our job kept
>> > failing with Hector reporting timeouts.
>> >
>> > Today we decided to leave index creation until the end, and just load
>> > the
>> > data using the same Hector code.  We bumped up the hadoop concurrency to
>> > two
>> > concurrent tasks per node, and everything went fine, as expected, we've
>> > done
>> > much larger loads than this using Hadoop and as long as you don't shoot
>> > for
>> > too much concurrency, Cassandra can deal with it.  So now we have the
>> > data
>> > in the column family and I updated the column family metadata in the CLI
>> > to
>> > enable the 29 indexes.  As soon as I do that, the ring starts reporting
>> > that
>> > nodes are down intermittently, and HintedHandoffs are starting to
>> > accumulate
>> > under tpstats. Ganglia is reporting very low overall load, so I'm
>> > wondering
>> > why it's taking so long for cli and nodetool commands to return.
>> >
>> > I'm just trying to get a better handle on what kind of actions have a
>> > serious impact on cluster availability and to know the right places to
>> > look
>> > to try to get ahead of those conditions.
>> >
>> > Thanks for any insight you can provide,
>> > Matt
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Understanding index builds (updated: crashed cluster)

Reply via email to