Yeah that's not ideal and could lead to problems. I think corruption is
only likely if compactions occur, but seems like data loss is a potential
not to mention all sorts of other possible nasties that could occur running
two C*'s at once. Seems to me that 11540 should have gone to 2.1 in the
first place, but it just got missed. Very simple patch so I think a
backport should be accepted.

On 7 August 2018 at 15:57, Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com> wrote:

> Hello,
>
>
>
> with 2.1, in case a second Cassandra process/instance is started on a host
> (by accident), may this result in some sort of corruption, although
> Cassandra will exit at some point in time due to not being able to bind TCP
> ports already in use?
>
>
>
> What we have seen in this scenario is something like that:
>
>
>
> ERROR [main] 2018-08-05 21:10:24,046 CassandraDaemon.java:120 - Error
> starting local jmx server:
>
> java.rmi.server.ExportException: Port already in use: 7199; nested
> exception is:
>
>                 java.net.BindException: Address already in use (Bind
> failed)
>
> …
>
>
>
> But then continuing with stuff like opening system and even user tables:
>
>
>
> INFO  [main] 2018-08-05 21:10:24,060 CacheService.java:110 - Initializing
> key cache with capacity of 100 MBs.
>
> INFO  [main] 2018-08-05 21:10:24,067 CacheService.java:132 - Initializing
> row cache with capacity of 0 MBs
>
> INFO  [main] 2018-08-05 21:10:24,073 CacheService.java:149 - Initializing
> counter cache with capacity of 50 MBs
>
> INFO  [main] 2018-08-05 21:10:24,074 CacheService.java:160 - Scheduling
> counter cache save to every 7200 seconds (going to save all keys).
>
> INFO  [main] 2018-08-05 21:10:24,161 ColumnFamilyStore.java:365 -
> Initializing system.sstable_activity
>
> INFO  [SSTableBatchOpen:2] 2018-08-05 21:10:24,692 SSTableReader.java:475
> - Opening /var/opt/xxx-managed/cassandra/system/sstable_activity-
> 5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-165 (2023
> bytes)
>
> INFO  [SSTableBatchOpen:3] 2018-08-05 21:10:24,692 SSTableReader.java:475
> - Opening /var/opt/xxx-managed/cassandra/system/sstable_activity-
> 5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-167 (2336
> bytes)
>
> INFO  [SSTableBatchOpen:1] 2018-08-05 21:10:24,692 SSTableReader.java:475
> - Opening /var/opt/xxx-managed/cassandra/system/sstable_activity-
> 5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-166 (2686
> bytes)
>
> INFO  [main] 2018-08-05 21:10:24,755 ColumnFamilyStore.java:365 -
> Initializing system.hints
>
> INFO  [SSTableBatchOpen:1] 2018-08-05 21:10:24,758 SSTableReader.java:475
> - Opening /var/opt/xxx-managed/cassandra/system/hints-
> 2666e20573ef38b390fefecf96e8f0c7/system-hints-ka-377 (46210621 bytes)
>
> INFO  [main] 2018-08-05 21:10:24,766 ColumnFamilyStore.java:365 -
> Initializing system.compaction_history
>
> INFO  [SSTableBatchOpen:1] 2018-08-05 21:10:24,768 SSTableReader.java:475
> - Opening /var/opt/xxx-managed/cassandra/system/compaction_history-
> b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-129 (91269
> bytes)
>
> …
>
>
>
> Replaying commit logs:
>
>
>
> …
>
> INFO  [main] 2018-08-05 21:10:25,896 CommitLogReplayer.java:267 -
> Replaying /var/opt/dynatrace-managed/cassandra/commitlog/CommitLog-
> 4-1533133668366.log
>
> INFO  [main] 2018-08-05 21:10:25,896 CommitLogReplayer.java:270 -
> Replaying 
> /var/opt/dynatrace-managed/cassandra/commitlog/CommitLog-4-1533133668366.log
> (CL version 4, messaging version 8)
>
> …
>
>
>
> Even writing memtables already (below just pasted system tables, but also
> user tables):
>
>
>
> …
>
> INFO  [MemtableFlushWriter:4] 2018-08-05 21:11:52,524 Memtable.java:347 -
> Writing Memtable-size_estimates@1941663179(2.655MiB serialized bytes,
> 325710 ops, 2%/0% of on/off-heap limit)
>
> INFO  [MemtableFlushWriter:3] 2018-08-05 21:11:52,552 Memtable.java:347 -
> Writing Memtable-peer_events@1474667699(0.199KiB serialized bytes, 4 ops,
> 0%/0% of on/off-heap limit)
>
> …
>
>
>
> Until it comes to a point where it can’t bind ports like the storage port
> 7000:
>
>
>
> ERROR [main] 2018-08-05 21:11:54,350 CassandraDaemon.java:395 - Fatal
> configuration error
>
> org.apache.cassandra.exceptions.ConfigurationException: /XXX:7000 is in
> use by another process.  Change listen_address:storage_port in
> cassandra.yaml to values that do not conflict with other services
>
>                 at org.apache.cassandra.net.MessagingService.
> getServerSockets(MessagingService.java:495) ~[apache-cassandra-2.1.18.jar:
> 2.1.18]
>
> …
>
>
>
> Until Cassandra stops:
>
>
>
> …
>
> INFO  [StorageServiceShutdownHook] 2018-08-05 21:11:54,361
> Gossiper.java:1454 - Announcing shutdown
>
> …
>
>
>
>
>
> So, we have around 2 minutes where Cassandra is mangling with existing
> data, although it shouldn’t.
>
>
>
> Sounds like a potential candidate for data corruption, right? E.g. later
> on we then see things like (still while being in progress to shutdown?):
>
>
>
> WARN  [SharedPool-Worker-1] 2018-08-05 21:11:58,181
> AbstractTracingAwareExecutorService.java:169 - Uncaught exception on
> thread Thread[SharedPool-Worker-1,5,main]: {}
>
> java.lang.RuntimeException: java.io.FileNotFoundException:
> /var/opt/xxx-managed/cassandra/xxx/xxx-fdc68b70950611e8ad7179f2d5bfa3cf/xxx-xxx-ka-15-Data.db
> (No such file or directory)
>
>                 at org.apache.cassandra.io.compress.
> CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:52)
> ~[apache-cassandra-2.1.18.jar:2.1.18]
>
>                 at org.apache.cassandra.io.util.
> CompressedPoolingSegmentedFile.createPooledReader(
> CompressedPoolingSegmentedFile.java:95) ~[apache-cassandra-2.1.18.jar:
> 2.1.18]
>
>                 at org.apache.cassandra.io.util.PoolingSegmentedFile.
> getSegment(PoolingSegmentedFile.java:62) ~[apache-cassandra-2.1.18.jar:
> 2.1.18]
>
> …
>
>
>
>
>
> I found this one here: https://issues.apache.org/
> jira/browse/CASSANDRA-11540
>
>
>
> So, if this all leads to corruption, might this be a candidate for a
> backport for a 2.1 bugfix release?
>
>
>
> Thanks a lot!
>
>
>
> Thomas
>
>
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313
>

Reply via email to