Should also add that if the scope of corruption is _very_ large, and you have a good, aggressive repair policy (read: you are confident in the consistency of the data elsewhere in the cluster), you may just want to decommission and rebuild that node.
On Fri, Aug 12, 2016 at 11:55 AM, Bryan Cheng <br...@blockcypher.com> wrote: > Looks like you're doing the offline scrub- have you tried online? > > Here's my typical process for corrupt SSTables. > > With disk_failure_policy set to stop, examine the failing sstables. If > they are very small (in the range of kbs), it is unlikely that there is any > salvageable data there. Just delete them, start the machine, and schedule a > repair ASAP. > > If they are large, then it may be worth salvaging. If the scope of > corruption is reasonable (limited to a few sstables scattered among > different keyspaces), set disk_failure_policy to best_effort, start the > machine up, and run the nodetool scrub. This is online scrub, faster than > offline scrub (at least of 2.1.12, the last time I had to do this). > > Only if all else fails, attempt the very painful offline sstablescrub. > > Is the VMWare client Windows? (Trying to make sure its not just the host). > YMMV but in the past Windows was somewhat of a neglected platform wrt > Cassandra. I think you'd have a lot easier time getting help if running > Linux is an option here. > > > > On Fri, Aug 12, 2016 at 9:16 AM, Alaa Zubaidi (PDF) <alaa.zuba...@pdf.com> > wrote: > >> Hi Jason, >> >> Thanks for your input... >> Thats what I am afraid of? >> Did you find any HW error in the VMware and HW logs? any indication that >> the HW is the reason? I need to make sure that this is the reason before >> asking the customer to spend more money? >> >> Thanks, >> Alaa >> >> On Thu, Aug 11, 2016 at 11:02 PM, Jason Wee <peich...@gmail.com> wrote: >> >>> cassandra run on virtual server (vmware)? >>> >>> > I tried sstablescrub but it crashed with hs-err-pid-... >>> maybe try with larger heap allocated to sstablescrub >>> >>> this sstable corrupt i ran into it as well (on cassandra 1.2), first i >>> try nodetool scrub, still persist, then offline sstablescrub still >>> persist, wipe the node and it happen again, then i change the hardware >>> (disk and mem). things went good. >>> >>> hth >>> >>> jason >>> >>> >>> On Fri, Aug 12, 2016 at 9:20 AM, Alaa Zubaidi (PDF) >>> <alaa.zuba...@pdf.com> wrote: >>> > Hi, >>> > >>> > I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local >>> installation >>> > (NOT on the cloud) >>> > >>> > and I am getting >>> > Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra >>> > Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1m >>> ain] >>> > org.apache.cassandra.io.FSReaderError: >>> > org.apache.cassandra.io.sstable.CorruptSSTableExecption: >>> > org.apache.cassandra.io.compress.CurrptBlockException: >>> > (E:\........\la-4886-big-Data.db): corruption detected, chunk at >>> 4969092 of >>> > length 10208. >>> > at >>> > org.apache.cassandra.io.util.RandomAccessReader.readBytes(Ra >>> ndomAccessReader.java:357) >>> > ~[apache-cassandra-2.2.1.jar:2.2.1] >>> > .... >>> > .... >>> > ERROR [CompactionExecutor:2] ....... FileUtils.java:463 - Existing >>> > forcefully due to file system exception on startup, disk failure policy >>> > "stop" >>> > >>> > I tried sstablescrub but it crashed with hs-err-pid-... >>> > I removed the corrupted file and started the Node again, after one day >>> the >>> > corruption came back again, I removed the files, and restarted >>> Cassandra, it >>> > worked for few days, then I ran "nodetool repair" after it finished, >>> > Cassandra failed again but with commitlog corruption, after removing >>> the >>> > commitlog files, it failed again with another sstable corruption. >>> > >>> > I was also checking the HW, file system, and memory, the VMware logs >>> showed >>> > no HW error, also the HW management logs showed NO problems or issues. >>> > Also checked the Windows Logs (Application and System) the only thing I >>> > found is on the system logs "Cassandra Service terminated with >>> > service-specific error Cannot create another system semaphore. >>> > >>> > I could not find any thing regarding that error, all comments point to >>> > application log. >>> > >>> > Any help is appreciated.. >>> > >>> > -- >>> > >>> > Alaa Zubaidi >>> > >>> > >>> > This message may contain confidential and privileged information. If >>> it has >>> > been sent to you in error, please reply to advise the sender of the >>> error >>> > and then immediately permanently delete it and all attachments to it >>> from >>> > your systems. If you are not the intended recipient, do not read, copy, >>> > disclose or otherwise use this message or any attachments to it. The >>> sender >>> > disclaims any liability for such unauthorized use. PLEASE NOTE that all >>> > incoming e-mails sent to PDF e-mail accounts will be archived and may >>> be >>> > scanned by us and/or by external service providers to detect and >>> prevent >>> > threats to our systems, investigate illegal or inappropriate behavior, >>> > and/or eliminate unsolicited promotional e-mails (“spam”). If you have >>> any >>> > concerns about this process, please contact us at >>> legal.departm...@pdf.com. >>> >> >> >> >> -- >> >> Alaa Zubaidi >> PDF Solutions, Inc. >> 333 West San Carlos Street, Suite 1000 >> San Jose, CA 95110 USA >> Tel: 408-283-5639 >> fax: 408-938-6479 >> email: alaa.zuba...@pdf.com >> >> >> *This message may contain confidential and privileged information. If it >> has been sent to you in error, please reply to advise the sender of the >> error and then immediately permanently delete it and all attachments to it >> from your systems. If you are not the intended recipient, do not read, >> copy, disclose or otherwise use this message or any attachments to it. The >> sender disclaims any liability for such unauthorized use. PLEASE NOTE that >> all incoming e-mails sent to PDF e-mail accounts will be archived and may >> be scanned by us and/or by external service providers to detect and prevent >> threats to our systems, investigate illegal or inappropriate behavior, >> and/or eliminate unsolicited promotional e-mails (“spam”). If you have any >> concerns about this process, please contact us at * >> *legal.departm...@pdf.com* <legal.departm...@pdf.com>*.* >> > >