Re: Possibly losing data with corrupted SSTables

Rahul Menon Wed, 29 Jan 2014 09:42:14 -0800

Francisco,

the sstables with *-ib-* is something that was from a previous version of
c*. The *-ib-* naming convention started at c* 1.2.1 but 1.2.10 onwards im
sure it has the *-ic-* convention. You could try running a nodetool
sstableupgrade which should ideally upgrade the sstables with the *-ib-* to
*-ic-*.


Rahul

On Wed, Jan 29, 2014 at 12:55 AM, Francisco Nogueira Calmon Sobral <
fsob...@igcorp.com.br> wrote:

> Dear experts,
>
> We are facing a annoying problem in our cluster.
>
> We have 9 amazon extra large linux nodes, running Cassandra 1.2.11.
>
> The short story is that after moving the data from one cluster to another,
> we've been unable to run 'nodetool repair'. It get stuck due to a
> CorruptSSTableException in some nodes and CFs. After looking at some
> problematic CFs, we observed that some of them have root permissions,
> instead of cassandra permissions. Also, their names are different from the
> 'good' ones as we can see below:
>
> BAD
> ------
> -rw-r--r-- 8 cassandra cassandra 991M Nov  8 15:11
> Sessions-Users-ib-2516-Data.db
> -rw-r--r-- 8 cassandra cassandra 703M Nov  8 15:11
> Sessions-Users-ib-2516-Index.db
> -rw-r--r-- 8 cassandra cassandra 5.3M Nov 13 11:42
> Sessions-Users-ib-2516-Summary.db
>
> GOOD
> ---------
> -rw-r--r-- 1 cassandra cassandra  22K Jan 15 10:50
> Sessions-Users-ic-2933-CompressionInfo.db
> -rw-r--r-- 1 cassandra cassandra 106M Jan 15 10:50
> Sessions-Users-ic-2933-Data.db
> -rw-r--r-- 1 cassandra cassandra 2.2M Jan 15 10:50
> Sessions-Users-ic-2933-Filter.db
> -rw-r--r-- 1 cassandra cassandra  76M Jan 15 10:50
> Sessions-Users-ic-2933-Index.db
> -rw-r--r-- 1 cassandra cassandra 4.3K Jan 15 10:50
> Sessions-Users-ic-2933-Statistics.db
> -rw-r--r-- 1 cassandra cassandra 574K Jan 15 10:50
> Sessions-Users-ic-2933-Summary.db
> -rw-r--r-- 1 cassandra cassandra   79 Jan 15 10:50
> Sessions-Users-ic-2933-TOC.txt
>
>
> We changed the permissions back to 'cassandra' and ran 'nodetool scrub' in
> this problematic CF, but it has been running for at least two weeks (it is
> not frozen) and keeps logging many WARNs while working with the above
> mentioned SSTable:
>
> WARN [CompactionExecutor:15] 2014-01-28 17:01:22,571 OutputHandler.java
> (line 57) Non-fatal error reading row (stacktrace follows)
> java.io.IOError: java.io.IOException: Impossible row size
> 3618452438597849419
>         at
> org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:171)
>         at
> org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:526)
>         at
> org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:515)
>         at
> org.apache.cassandra.db.compaction.CompactionManager.access$400(CompactionManager.java:70)
>         at
> org.apache.cassandra.db.compaction.CompactionManager$3.perform(CompactionManager.java:280)
>         at
> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:250)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Impossible row size 3618452438597849419
>         ... 10 more
>
>
> 1) I do not think that deleting all data of one node and running 'nodetool
> rebuild' will work, since we observed that this problem occurs in all
> nodes. So we may not be able to restore all the data. What can be done in
> this case?
>
> 2) Why the permissions of some sstables are 'root'? Is this problem caused
> by our manual migration of data? (see long story below)
>
>
> How we ran into this?
>
> The long story is that we've tried to move our cluster with sstableloader,
> but it was unable to load all the data correctly. Our solution was to put
> ALL cluster data into EACH new node and run 'nodetool refresh'. I performed
> this task for each node and each column family sequentially. Sometimes I
> had to rename some sstables, because they came from different nodes with
> the same name. I don't remember if I ran 'nodetool repair'  or even
> 'nodetool cleanup' in each node. Apparently, the process was successful,
> and (almost) all the data was moved.
>
> Unfortunately, after 3 months since we moved, I am unable to perform read
> operations in some keys of some CFs. I think that some of these keys belong
> to the above mentioned sstables.
>
> Any insights are welcome.
>
> Best regards,
> Francisco Sobral

Re: Possibly losing data with corrupted SSTables

Reply via email to