Hi Bryan, Changing disk_failure_policy to best_effort, and running nodetool scrub, did not work, it generated another error: java.nio.file.AccessDeniedException
Also tried to remove all files (data, commitlog, savedcaches) and restart the node fresh, and still I am getting corruption. and Still nothing that indicate there is a HW issue? All other nodes are fine Regards, Alaa On Fri, Aug 12, 2016 at 12:00 PM, Bryan Cheng <br...@blockcypher.com> wrote: > Should also add that if the scope of corruption is _very_ large, and you > have a good, aggressive repair policy (read: you are confident in the > consistency of the data elsewhere in the cluster), you may just want to > decommission and rebuild that node. > > On Fri, Aug 12, 2016 at 11:55 AM, Bryan Cheng <br...@blockcypher.com> > wrote: > >> Looks like you're doing the offline scrub- have you tried online? >> >> Here's my typical process for corrupt SSTables. >> >> With disk_failure_policy set to stop, examine the failing sstables. If >> they are very small (in the range of kbs), it is unlikely that there is any >> salvageable data there. Just delete them, start the machine, and schedule a >> repair ASAP. >> >> If they are large, then it may be worth salvaging. If the scope of >> corruption is reasonable (limited to a few sstables scattered among >> different keyspaces), set disk_failure_policy to best_effort, start the >> machine up, and run the nodetool scrub. This is online scrub, faster than >> offline scrub (at least of 2.1.12, the last time I had to do this). >> >> Only if all else fails, attempt the very painful offline sstablescrub. >> >> Is the VMWare client Windows? (Trying to make sure its not just the >> host). YMMV but in the past Windows was somewhat of a neglected platform >> wrt Cassandra. I think you'd have a lot easier time getting help if running >> Linux is an option here. >> >> >> >> On Fri, Aug 12, 2016 at 9:16 AM, Alaa Zubaidi (PDF) <alaa.zuba...@pdf.com >> > wrote: >> >>> Hi Jason, >>> >>> Thanks for your input... >>> Thats what I am afraid of? >>> Did you find any HW error in the VMware and HW logs? any indication that >>> the HW is the reason? I need to make sure that this is the reason before >>> asking the customer to spend more money? >>> >>> Thanks, >>> Alaa >>> >>> On Thu, Aug 11, 2016 at 11:02 PM, Jason Wee <peich...@gmail.com> wrote: >>> >>>> cassandra run on virtual server (vmware)? >>>> >>>> > I tried sstablescrub but it crashed with hs-err-pid-... >>>> maybe try with larger heap allocated to sstablescrub >>>> >>>> this sstable corrupt i ran into it as well (on cassandra 1.2), first i >>>> try nodetool scrub, still persist, then offline sstablescrub still >>>> persist, wipe the node and it happen again, then i change the hardware >>>> (disk and mem). things went good. >>>> >>>> hth >>>> >>>> jason >>>> >>>> >>>> On Fri, Aug 12, 2016 at 9:20 AM, Alaa Zubaidi (PDF) >>>> <alaa.zuba...@pdf.com> wrote: >>>> > Hi, >>>> > >>>> > I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local >>>> installation >>>> > (NOT on the cloud) >>>> > >>>> > and I am getting >>>> > Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra >>>> > Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1m >>>> ain] >>>> > org.apache.cassandra.io.FSReaderError: >>>> > org.apache.cassandra.io.sstable.CorruptSSTableExecption: >>>> > org.apache.cassandra.io.compress.CurrptBlockException: >>>> > (E:\........\la-4886-big-Data.db): corruption detected, chunk at >>>> 4969092 of >>>> > length 10208. >>>> > at >>>> > org.apache.cassandra.io.util.RandomAccessReader.readBytes(Ra >>>> ndomAccessReader.java:357) >>>> > ~[apache-cassandra-2.2.1.jar:2.2.1] >>>> > .... >>>> > .... >>>> > ERROR [CompactionExecutor:2] ....... FileUtils.java:463 - Existing >>>> > forcefully due to file system exception on startup, disk failure >>>> policy >>>> > "stop" >>>> > >>>> > I tried sstablescrub but it crashed with hs-err-pid-... >>>> > I removed the corrupted file and started the Node again, after one >>>> day the >>>> > corruption came back again, I removed the files, and restarted >>>> Cassandra, it >>>> > worked for few days, then I ran "nodetool repair" after it finished, >>>> > Cassandra failed again but with commitlog corruption, after removing >>>> the >>>> > commitlog files, it failed again with another sstable corruption. >>>> > >>>> > I was also checking the HW, file system, and memory, the VMware logs >>>> showed >>>> > no HW error, also the HW management logs showed NO problems or issues. >>>> > Also checked the Windows Logs (Application and System) the only thing >>>> I >>>> > found is on the system logs "Cassandra Service terminated with >>>> > service-specific error Cannot create another system semaphore. >>>> > >>>> > I could not find any thing regarding that error, all comments point to >>>> > application log. >>>> > >>>> > Any help is appreciated.. >>>> > >>>> > -- >>>> > >>>> > Alaa Zubaidi >>>> > >>>> > >>>> > This message may contain confidential and privileged information. If >>>> it has >>>> > been sent to you in error, please reply to advise the sender of the >>>> error >>>> > and then immediately permanently delete it and all attachments to it >>>> from >>>> > your systems. If you are not the intended recipient, do not read, >>>> copy, >>>> > disclose or otherwise use this message or any attachments to it. The >>>> sender >>>> > disclaims any liability for such unauthorized use. PLEASE NOTE that >>>> all >>>> > incoming e-mails sent to PDF e-mail accounts will be archived and may >>>> be >>>> > scanned by us and/or by external service providers to detect and >>>> prevent >>>> > threats to our systems, investigate illegal or inappropriate behavior, >>>> > and/or eliminate unsolicited promotional e-mails (“spam”). If you >>>> have any >>>> > concerns about this process, please contact us at >>>> legal.departm...@pdf.com. >>>> >>> >>> >>> >>> -- >>> >>> Alaa Zubaidi >>> PDF Solutions, Inc. >>> 333 West San Carlos Street, Suite 1000 >>> San Jose, CA 95110 USA >>> Tel: 408-283-5639 >>> fax: 408-938-6479 >>> email: alaa.zuba...@pdf.com >>> >>> >>> *This message may contain confidential and privileged information. If it >>> has been sent to you in error, please reply to advise the sender of the >>> error and then immediately permanently delete it and all attachments to it >>> from your systems. If you are not the intended recipient, do not read, >>> copy, disclose or otherwise use this message or any attachments to it. The >>> sender disclaims any liability for such unauthorized use. PLEASE NOTE that >>> all incoming e-mails sent to PDF e-mail accounts will be archived and may >>> be scanned by us and/or by external service providers to detect and prevent >>> threats to our systems, investigate illegal or inappropriate behavior, >>> and/or eliminate unsolicited promotional e-mails (“spam”). If you have any >>> concerns about this process, please contact us at * >>> *legal.departm...@pdf.com* <legal.departm...@pdf.com>*.* >>> >> >> > -- Alaa Zubaidi PDF Solutions, Inc. 333 West San Carlos Street, Suite 1000 San Jose, CA 95110 USA Tel: 408-283-5639 fax: 408-938-6479 email: alaa.zuba...@pdf.com -- *This message may contain confidential and privileged information. If it has been sent to you in error, please reply to advise the sender of the error and then immediately permanently delete it and all attachments to it from your systems. If you are not the intended recipient, do not read, copy, disclose or otherwise use this message or any attachments to it. The sender disclaims any liability for such unauthorized use. PLEASE NOTE that all incoming e-mails sent to PDF e-mail accounts will be archived and may be scanned by us and/or by external service providers to detect and prevent threats to our systems, investigate illegal or inappropriate behavior, and/or eliminate unsolicited promotional e-mails (“spam”). If you have any concerns about this process, please contact us at * *legal.departm...@pdf.com* <legal.departm...@pdf.com>*.*