This has happened before the save caches files were not compatible between
0.6 and 0.7. I have ran into this a couple other times before. The good
news is the save key cache is just an optimization, you can blow it away
and it is not usually a big deal.




On Fri, Apr 5, 2013 at 2:55 PM, Arya Goudarzi <gouda...@gmail.com> wrote:

> Here is a chunk of bloom filter sstable skip messages from the node I
> enabled DEBUG on:
>
> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,450 SSTableReader.java (line
> 737) Bloom filter allows skipping sstable 39459
> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,450 SSTableReader.java (line
> 737) Bloom filter allows skipping sstable 39483
> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,450 SSTableReader.java (line
> 737) Bloom filter allows skipping sstable 39332
> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,450 SSTableReader.java (line
> 737) Bloom filter allows skipping sstable 39335
> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,450 SSTableReader.java (line
> 737) Bloom filter allows skipping sstable 39438
> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,450 SSTableReader.java (line
> 737) Bloom filter allows skipping sstable 39478
> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,450 SSTableReader.java (line
> 737) Bloom filter allows skipping sstable 39456
> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,450 SSTableReader.java (line
> 737) Bloom filter allows skipping sstable 39469
> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,451 SSTableReader.java (line
> 737) Bloom filter allows skipping sstable 39334
> DEBUG [OptionalTasks:1] 2013-04-04 02:44:01,451 SSTableReader.java (line
> 737) Bloom filter allows skipping sstable 39406
>
> This is the last chunk of log before C* gets stuck, right before I stop
> the process, remove key caches and start again (This is from another node
> that I upgraded 2 days ago):
>
> INFO [SSTableBatchOpen:2] 2013-04-03 01:59:39,769 SSTableReader.java (line
> 166) Opening
> /var/lib/cassandra/data/cardspring_production/UniqueIndexes/cardspring_production-UniqueIndexes-hf-316499
> (5273270 bytes)
>  INFO [SSTableBatchOpen:1] 2013-04-03 01:59:39,858 SSTableReader.java
> (line 166) Opening
> /var/lib/cassandra/data/cardspring_production/UniqueIndexes/cardspring_production-UniqueIndexes-hf-314755
> (5264359 bytes)
>  INFO [SSTableBatchOpen:2] 2013-04-03 01:59:39,894 SSTableReader.java
> (line 166) Opening
> /var/lib/cassandra/data/cardspring_production/UniqueIndexes/cardspring_production-UniqueIndexes-hf-314762
> (5260887 bytes)
>  INFO [SSTableBatchOpen:1] 2013-04-03 01:59:39,980 SSTableReader.java
> (line 166) Opening
> /var/lib/cassandra/data/cardspring_production/UniqueIndexes/cardspring_production-UniqueIndexes-hf-315886
> (5262864 bytes)
>  INFO [OptionalTasks:1] 2013-04-03 01:59:40,298 AutoSavingCache.java (line
> 112) reading saved cache
> /var/lib/cassandra/saved_caches/cardspring_production-UniqueIndexes-KeyCache
>
>
> I finally upgrade all 12 nodes in our test environment yesterday. This
> issue seemed to exists on 7 nodes out of 12 nodes. They didn't alway get
> stuck on the same CF loading its saved KeyCache.
>
>
> On Fri, Apr 5, 2013 at 9:56 AM, aaron morton <aa...@thelastpickle.com>wrote:
>
>> skipping sstable due to bloom filter debug messages
>>
>> What were these messages?
>>
>> Do you have the logs from the start up ?
>>
>> Cheers
>>
>>    -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 4/04/2013, at 6:11 AM, Arya Goudarzi <gouda...@gmail.com> wrote:
>>
>> Hi,
>>
>> I have upgraded 2 nodes out of a 12 mode test cluster from 1.1.10 to
>> 1.2.3. During startup while tailing C*'s system.log, I observed a series of
>> SSTable batch load messages and skipping sstable due to bloom filter debug
>> messages which is normal for startup, but when it reached loading saved key
>> caches, it gets stuck forever. The I/O wait stays high in the CPU graph and
>> I/O ops are sent to disk, but C* never passes that step of loading the key
>> cache file successfully. The saved key cache file was about 75MB on one
>> node and 125MB on the other node and they were for different CFs.
>>
>> <image.jpeg>
>>
>> The CPU I/O wait constantly stayed at 40%~ while system.log was stuck at
>> loading one saved key cache file. I have marked that on the graph above.
>> The workaround was to delete the saved cache files and things loaded fine
>> (See marked Normal Startup).
>>
>> These machines are m1.xlarge EC2 instances. And this issue happened on
>> both nodes upgraded. This did not happen during exercise of upgrade from
>> 1.1.6 to 1.2.2 using the same snapshot.
>>
>> Should I raise a JIRA?
>>
>> -Arya
>>
>>
>>
>

Reply via email to