Yeah that's not ideal and could lead to problems. I think corruption is only likely if compactions occur, but seems like data loss is a potential not to mention all sorts of other possible nasties that could occur running two C*'s at once. Seems to me that 11540 should have gone to 2.1 in the first place, but it just got missed. Very simple patch so I think a backport should be accepted.
On 7 August 2018 at 15:57, Steinmaurer, Thomas < thomas.steinmau...@dynatrace.com> wrote: > Hello, > > > > with 2.1, in case a second Cassandra process/instance is started on a host > (by accident), may this result in some sort of corruption, although > Cassandra will exit at some point in time due to not being able to bind TCP > ports already in use? > > > > What we have seen in this scenario is something like that: > > > > ERROR [main] 2018-08-05 21:10:24,046 CassandraDaemon.java:120 - Error > starting local jmx server: > > java.rmi.server.ExportException: Port already in use: 7199; nested > exception is: > > java.net.BindException: Address already in use (Bind > failed) > > … > > > > But then continuing with stuff like opening system and even user tables: > > > > INFO [main] 2018-08-05 21:10:24,060 CacheService.java:110 - Initializing > key cache with capacity of 100 MBs. > > INFO [main] 2018-08-05 21:10:24,067 CacheService.java:132 - Initializing > row cache with capacity of 0 MBs > > INFO [main] 2018-08-05 21:10:24,073 CacheService.java:149 - Initializing > counter cache with capacity of 50 MBs > > INFO [main] 2018-08-05 21:10:24,074 CacheService.java:160 - Scheduling > counter cache save to every 7200 seconds (going to save all keys). > > INFO [main] 2018-08-05 21:10:24,161 ColumnFamilyStore.java:365 - > Initializing system.sstable_activity > > INFO [SSTableBatchOpen:2] 2018-08-05 21:10:24,692 SSTableReader.java:475 > - Opening /var/opt/xxx-managed/cassandra/system/sstable_activity- > 5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-165 (2023 > bytes) > > INFO [SSTableBatchOpen:3] 2018-08-05 21:10:24,692 SSTableReader.java:475 > - Opening /var/opt/xxx-managed/cassandra/system/sstable_activity- > 5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-167 (2336 > bytes) > > INFO [SSTableBatchOpen:1] 2018-08-05 21:10:24,692 SSTableReader.java:475 > - Opening /var/opt/xxx-managed/cassandra/system/sstable_activity- > 5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-166 (2686 > bytes) > > INFO [main] 2018-08-05 21:10:24,755 ColumnFamilyStore.java:365 - > Initializing system.hints > > INFO [SSTableBatchOpen:1] 2018-08-05 21:10:24,758 SSTableReader.java:475 > - Opening /var/opt/xxx-managed/cassandra/system/hints- > 2666e20573ef38b390fefecf96e8f0c7/system-hints-ka-377 (46210621 bytes) > > INFO [main] 2018-08-05 21:10:24,766 ColumnFamilyStore.java:365 - > Initializing system.compaction_history > > INFO [SSTableBatchOpen:1] 2018-08-05 21:10:24,768 SSTableReader.java:475 > - Opening /var/opt/xxx-managed/cassandra/system/compaction_history- > b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-129 (91269 > bytes) > > … > > > > Replaying commit logs: > > > > … > > INFO [main] 2018-08-05 21:10:25,896 CommitLogReplayer.java:267 - > Replaying /var/opt/dynatrace-managed/cassandra/commitlog/CommitLog- > 4-1533133668366.log > > INFO [main] 2018-08-05 21:10:25,896 CommitLogReplayer.java:270 - > Replaying > /var/opt/dynatrace-managed/cassandra/commitlog/CommitLog-4-1533133668366.log > (CL version 4, messaging version 8) > > … > > > > Even writing memtables already (below just pasted system tables, but also > user tables): > > > > … > > INFO [MemtableFlushWriter:4] 2018-08-05 21:11:52,524 Memtable.java:347 - > Writing Memtable-size_estimates@1941663179(2.655MiB serialized bytes, > 325710 ops, 2%/0% of on/off-heap limit) > > INFO [MemtableFlushWriter:3] 2018-08-05 21:11:52,552 Memtable.java:347 - > Writing Memtable-peer_events@1474667699(0.199KiB serialized bytes, 4 ops, > 0%/0% of on/off-heap limit) > > … > > > > Until it comes to a point where it can’t bind ports like the storage port > 7000: > > > > ERROR [main] 2018-08-05 21:11:54,350 CassandraDaemon.java:395 - Fatal > configuration error > > org.apache.cassandra.exceptions.ConfigurationException: /XXX:7000 is in > use by another process. Change listen_address:storage_port in > cassandra.yaml to values that do not conflict with other services > > at org.apache.cassandra.net.MessagingService. > getServerSockets(MessagingService.java:495) ~[apache-cassandra-2.1.18.jar: > 2.1.18] > > … > > > > Until Cassandra stops: > > > > … > > INFO [StorageServiceShutdownHook] 2018-08-05 21:11:54,361 > Gossiper.java:1454 - Announcing shutdown > > … > > > > > > So, we have around 2 minutes where Cassandra is mangling with existing > data, although it shouldn’t. > > > > Sounds like a potential candidate for data corruption, right? E.g. later > on we then see things like (still while being in progress to shutdown?): > > > > WARN [SharedPool-Worker-1] 2018-08-05 21:11:58,181 > AbstractTracingAwareExecutorService.java:169 - Uncaught exception on > thread Thread[SharedPool-Worker-1,5,main]: {} > > java.lang.RuntimeException: java.io.FileNotFoundException: > /var/opt/xxx-managed/cassandra/xxx/xxx-fdc68b70950611e8ad7179f2d5bfa3cf/xxx-xxx-ka-15-Data.db > (No such file or directory) > > at org.apache.cassandra.io.compress. > CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:52) > ~[apache-cassandra-2.1.18.jar:2.1.18] > > at org.apache.cassandra.io.util. > CompressedPoolingSegmentedFile.createPooledReader( > CompressedPoolingSegmentedFile.java:95) ~[apache-cassandra-2.1.18.jar: > 2.1.18] > > at org.apache.cassandra.io.util.PoolingSegmentedFile. > getSegment(PoolingSegmentedFile.java:62) ~[apache-cassandra-2.1.18.jar: > 2.1.18] > > … > > > > > > I found this one here: https://issues.apache.org/ > jira/browse/CASSANDRA-11540 > > > > So, if this all leads to corruption, might this be a candidate for a > backport for a 2.1 bugfix release? > > > > Thanks a lot! > > > > Thomas > > > The contents of this e-mail are intended for the named addressee only. It > contains information that may be confidential. Unless you are the named > addressee or an authorized designee, you may not copy or use it, or > disclose it to anyone else. If you received it in error please notify us > immediately and then destroy it. Dynatrace Austria GmbH (registration > number FN 91482h) is a company registered in Linz whose registered office > is at 4040 Linz, Austria, Freistädterstraße 313 >