Re: Corrupt SSTABLE over and over

Alaa Zubaidi (PDF) Fri, 12 Aug 2016 17:20:40 -0700

Hi Bryan,

Changing disk_failure_policy to best_effort, and running nodetool scrub,
did not work, it generated another error:
java.nio.file.AccessDeniedException


Also tried to remove all files (data, commitlog, savedcaches) and restart
the node fresh, and still I am getting corruption.

and Still nothing that indicate there is a HW issue?
All other nodes are fine

Regards,
Alaa


On Fri, Aug 12, 2016 at 12:00 PM, Bryan Cheng <[email protected]> wrote:

> Should also add that if the scope of corruption is _very_ large, and you
> have a good, aggressive repair policy (read: you are confident in the
> consistency of the data elsewhere in the cluster), you may just want to
> decommission and rebuild that node.
>
> On Fri, Aug 12, 2016 at 11:55 AM, Bryan Cheng <[email protected]>
> wrote:
>
>> Looks like you're doing the offline scrub- have you tried online?
>>
>> Here's my typical process for corrupt SSTables.
>>
>> With disk_failure_policy set to stop, examine the failing sstables. If
>> they are very small (in the range of kbs), it is unlikely that there is any
>> salvageable data there. Just delete them, start the machine, and schedule a
>> repair ASAP.
>>
>> If they are large, then it may be worth salvaging. If the scope of
>> corruption is reasonable (limited to a few sstables scattered among
>> different keyspaces), set disk_failure_policy to best_effort, start the
>> machine up, and run the nodetool scrub. This is online scrub, faster than
>> offline scrub (at least of 2.1.12, the last time I had to do this).
>>
>> Only if all else fails, attempt the very painful offline sstablescrub.
>>
>> Is the VMWare client Windows? (Trying to make sure its not just the
>> host). YMMV but in the past Windows was somewhat of a neglected platform
>> wrt Cassandra. I think you'd have a lot easier time getting help if running
>> Linux is an option here.
>>
>>
>>
>> On Fri, Aug 12, 2016 at 9:16 AM, Alaa Zubaidi (PDF) <[email protected]
>> > wrote:
>>
>>> Hi Jason,
>>>
>>> Thanks for your input...
>>> Thats what I am afraid of?
>>> Did you find any HW error in the VMware and HW logs? any indication that
>>> the HW is the reason? I need to make sure that this is the reason before
>>> asking the customer to spend more money?
>>>
>>> Thanks,
>>> Alaa
>>>
>>> On Thu, Aug 11, 2016 at 11:02 PM, Jason Wee <[email protected]> wrote:
>>>
>>>> cassandra run on virtual server (vmware)?
>>>>
>>>> > I tried sstablescrub but it crashed with hs-err-pid-...
>>>> maybe try with larger heap allocated to sstablescrub
>>>>
>>>> this sstable corrupt i ran into it as well (on cassandra 1.2), first i
>>>> try nodetool scrub, still persist, then offline sstablescrub still
>>>> persist, wipe the node and it happen again, then i change the hardware
>>>> (disk and mem). things went good.
>>>>
>>>> hth
>>>>
>>>> jason
>>>>
>>>>
>>>> On Fri, Aug 12, 2016 at 9:20 AM, Alaa Zubaidi (PDF)
>>>> <[email protected]> wrote:
>>>> > Hi,
>>>> >
>>>> > I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local
>>>> installation
>>>> > (NOT on the cloud)
>>>> >
>>>> > and I am getting
>>>> > Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra
>>>> > Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1m
>>>> ain]
>>>> > org.apache.cassandra.io.FSReaderError:
>>>> > org.apache.cassandra.io.sstable.CorruptSSTableExecption:
>>>> > org.apache.cassandra.io.compress.CurrptBlockException:
>>>> > (E:\........\la-4886-big-Data.db): corruption detected, chunk at
>>>> 4969092 of
>>>> > length 10208.
>>>> >     at
>>>> > org.apache.cassandra.io.util.RandomAccessReader.readBytes(Ra
>>>> ndomAccessReader.java:357)
>>>> > ~[apache-cassandra-2.2.1.jar:2.2.1]
>>>> > ....
>>>> > ....
>>>> > ERROR [CompactionExecutor:2] ....... FileUtils.java:463 - Existing
>>>> > forcefully due to file system exception on startup, disk failure
>>>> policy
>>>> > "stop"
>>>> >
>>>> > I tried sstablescrub but it crashed with hs-err-pid-...
>>>> > I removed the corrupted file and started the Node again, after one
>>>> day the
>>>> > corruption came back again, I removed the files, and restarted
>>>> Cassandra, it
>>>> > worked for few days, then I ran "nodetool repair" after it finished,
>>>> > Cassandra failed again but with commitlog corruption, after removing
>>>> the
>>>> > commitlog files, it failed again with another sstable corruption.
>>>> >
>>>> > I was also checking the HW, file system, and memory, the VMware logs
>>>> showed
>>>> > no HW error, also the HW management logs showed NO problems or issues.
>>>> > Also checked the Windows Logs (Application and System) the only thing
>>>> I
>>>> > found is on the system logs "Cassandra Service terminated with
>>>> > service-specific error Cannot create another system semaphore.
>>>> >
>>>> > I could not find any thing regarding that error, all comments point to
>>>> > application log.
>>>> >
>>>> > Any help is appreciated..
>>>> >
>>>> > --
>>>> >
>>>> > Alaa Zubaidi
>>>> >
>>>> >
>>>> > This message may contain confidential and privileged information. If
>>>> it has
>>>> > been sent to you in error, please reply to advise the sender of the
>>>> error
>>>> > and then immediately permanently delete it and all attachments to it
>>>> from
>>>> > your systems. If you are not the intended recipient, do not read,
>>>> copy,
>>>> > disclose or otherwise use this message or any attachments to it. The
>>>> sender
>>>> > disclaims any liability for such unauthorized use. PLEASE NOTE that
>>>> all
>>>> > incoming e-mails sent to PDF e-mail accounts will be archived and may
>>>> be
>>>> > scanned by us and/or by external service providers to detect and
>>>> prevent
>>>> > threats to our systems, investigate illegal or inappropriate behavior,
>>>> > and/or eliminate unsolicited promotional e-mails (“spam”). If you
>>>> have any
>>>> > concerns about this process, please contact us at
>>>> [email protected].
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Alaa Zubaidi
>>> PDF Solutions, Inc.
>>> 333 West San Carlos Street, Suite 1000
>>> San Jose, CA 95110  USA
>>> Tel: 408-283-5639
>>> fax: 408-938-6479
>>> email: [email protected]
>>>
>>>
>>> *This message may contain confidential and privileged information. If it
>>> has been sent to you in error, please reply to advise the sender of the
>>> error and then immediately permanently delete it and all attachments to it
>>> from your systems. If you are not the intended recipient, do not read,
>>> copy, disclose or otherwise use this message or any attachments to it. The
>>> sender disclaims any liability for such unauthorized use. PLEASE NOTE that
>>> all incoming e-mails sent to PDF e-mail accounts will be archived and may
>>> be scanned by us and/or by external service providers to detect and prevent
>>> threats to our systems, investigate illegal or inappropriate behavior,
>>> and/or eliminate unsolicited promotional e-mails (“spam”). If you have any
>>> concerns about this process, please contact us at *
>>> *[email protected]* <[email protected]>*.*
>>>
>>
>>
>


-- 

Alaa Zubaidi
PDF Solutions, Inc.
333 West San Carlos Street, Suite 1000
San Jose, CA 95110  USA
Tel: 408-283-5639
fax: 408-938-6479
email: [email protected]

-- 
*This message may contain confidential and privileged information. If it 
has been sent to you in error, please reply to advise the sender of the 
error and then immediately permanently delete it and all attachments to it 
from your systems. If you are not the intended recipient, do not read, 
copy, disclose or otherwise use this message or any attachments to it. The 
sender disclaims any liability for such unauthorized use. PLEASE NOTE that 
all incoming e-mails sent to PDF e-mail accounts will be archived and may 
be scanned by us and/or by external service providers to detect and prevent 
threats to our systems, investigate illegal or inappropriate behavior, 
and/or eliminate unsolicited promotional e-mails (“spam”). If you have any 
concerns about this process, please contact us at *
*[email protected]* <[email protected]>*.*

Re: Corrupt SSTABLE over and over

Reply via email to