[ 
https://issues.apache.org/jira/browse/CASSANDRA-19661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18059187#comment-18059187
 ] 

Dmitry Konstantinov commented on CASSANDRA-19661:
-------------------------------------------------

{quote}replicated the issue again and captured thread dumps.
{quote}
Regarding pending Mutations issue mentioned before. Assuming that we speak 
about the case when the IllegalStateException is thrown.

The unexpected IllegalStateException breaks reclaiming of memory used by the 
flushed memtable, so we consume all memory available for memtables and block on 
awaiting for it.

[https://github.com/apache/cassandra/blob/cassandra-5.0.6/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1377]
 - the place where reclaim logic is register to trigger at the end of flush. We 
don't reach this method invocation because the IllegalStateException is thrown 
before on writer commit step:

[https://github.com/apache/cassandra/blob/cassandra-5.0.6/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1354]
 

Here is an example of a mutation thread  trying to allocate some memory for 
memtables:

{{"MutationStage-1" #131 daemon prio=5 os_prio=0 cpu=949990.56ms 
elapsed=26753.90s tid=0x00007267aea3f2d0 nid=0xcc734 waiting on condition  
[0x0000725ce31bc000]}}{{{}   java.lang.Thread.State: WAITING 
(parking){}}}{{{}at jdk.internal.misc.Unsafe.park([email protected]/Native 
Method){}}}{{{}at 
java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:341){}}}{{{}at
 
org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:321){}}}{{{}at
 
org.apache.cassandra.utils.concurrent.WaitQueue$Standard$AbstractSignal.await(WaitQueue.java:299){}}}{{{}at
 
org.apache.cassandra.utils.concurrent.Awaitable$Defaults.awaitThrowUncheckedOnInterrupt(Awaitable.java:131){}}}{{{}at
 
*{color:#de350b}org.apache.cassandra.utils.concurrent.Awaitable$AbstractAwaitable.awaitThrowUncheckedOnInterrupt(Awaitable.java:235){color}*{}}}{*}{color:#de350b}{{at
 
org.apache.cassandra.utils.memory.MemtableAllocator$SubAllocator.allocate(MemtableAllocator.java:195)}}{color}{*}{{{}*{color:#de350b}at{color}*
 
org.apache.cassandra.db.memtable.AbstractAllocatorMemtable.markExtraOnHeapUsed(AbstractAllocatorMemtable.java:196){}}}{{{}at
 
org.apache.cassandra.index.sai.StorageAttachedIndex$UpdateIndexer.adjustMemtableSize(StorageAttachedIndex.java:998){}}}{{{}at
 
org.apache.cassandra.index.sai.StorageAttachedIndex$UpdateIndexer.updateRow(StorageAttachedIndex.java:990){}}}{{{}at
 
org.apache.cassandra.index.sai.StorageAttachedIndexGroup$1.updateRow(StorageAttachedIndexGroup.java:191){}}}{{{}at
 
org.apache.cassandra.index.SecondaryIndexManager$WriteTimeTransaction.onUpdated(SecondaryIndexManager.java:1570){}}}{{{}at
 
org.apache.cassandra.db.partitions.BTreePartitionUpdater.merge(BTreePartitionUpdater.java:139){}}}{{{}at
 
org.apache.cassandra.db.partitions.BTreePartitionUpdater.merge(BTreePartitionUpdater.java:39){}}}{{{}at
 org.apache.cassandra.utils.btree.BTree.updateLeaves(BTree.java:430){}}}{{{}at 
org.apache.cassandra.utils.btree.BTree.update(BTree.java:372){}}}{{{}at 
org.apache.cassandra.db.partitions.BTreePartitionUpdater.makeMergedPartition(BTreePartitionUpdater.java:89){}}}{{{}at
 
org.apache.cassandra.db.partitions.BTreePartitionUpdater.mergePartitions(BTreePartitionUpdater.java:71){}}}{{{}at
 
org.apache.cassandra.db.memtable.TrieMemtable$MemtableShard$$Lambda$1179/0x00000008011e2d58.apply(Unknown
 Source){}}}{{{}at 
org.apache.cassandra.db.tries.InMemoryTrie.applyContent(InMemoryTrie.java:930){}}}{{{}at
 
org.apache.cassandra.db.tries.InMemoryTrie.putRecursive(InMemoryTrie.java:906){}}}{{{}at
 
org.apache.cassandra.db.tries.InMemoryTrie.putRecursive(InMemoryTrie.java:910){}}}{{{}at
 {}}}

{{...}}

{{{}org.apache.cassandra.db.tries.InMemoryTrie.putRecursive(InMemoryTrie.java:897){}}}{{{}at
 
org.apache.cassandra.db.tries.InMemoryTrie.putSingleton(InMemoryTrie.java:878){}}}{{{}at
 
org.apache.cassandra.db.memtable.TrieMemtable$MemtableShard.put(TrieMemtable.java:480){}}}{{{}at
 
org.apache.cassandra.db.memtable.TrieMemtable.put(TrieMemtable.java:190){}}}{{{}at
 
org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1474){}}}{{{}at
 
org.apache.cassandra.db.CassandraTableWriteHandler.write(CassandraTableWriteHandler.java:38){}}}{{{}at
 org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:653){}}}{{{}at 
org.apache.cassandra.db.Keyspace.applyFuture(Keyspace.java:474){}}}{{{}at 
org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:244){}}}{{{}at 
org.apache.cassandra.hints.Hint.applyFuture(Hint.java:109){}}}{{{}at 
org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:116){}}}{{{}at
 
org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78){}}}{{{}at
 
org.apache.cassandra.net.InboundSink$$Lambda$783/0x00000008010bf960.accept(Unknown
 Source){}}}{{{}at 
org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97){}}}{{{}at 
org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45){}}}{{{}at 
org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430){}}}{{{}at
 
org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133){}}}{{{}at
 org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:143){}}}{{{}at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30){}}}{{{}at
 java.lang.Thread.run([email protected]/Thread.java:833){}}}

 

The pending mutation issue should disappear once we fix the 
{{{}IllegalStateException{}}}. We could try to improve the error handling to 
better process such unexpected exceptions, but there may be no way to handle 
this completely correctly — we might even have to stop processing at all or 
risk data loss (at least for indexes). So I’m not sure whether investing 
significant effort into this will really pay off.

> Cannot restart Cassandra 5 after creating a vector table and index
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-19661
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19661
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Feature/SAI, Feature/Vector Search, Local/Startup and 
> Shutdown
>            Reporter: Sergio Rua
>            Assignee: Michael Marshall
>            Priority: Normal
>             Fix For: 5.0.x, 6.x
>
>         Attachments: 10.103.220.89_thread_dump.tgz, 
> 5.0.2_fail_memtableflush_vector_full.txt, logs.tar.gz, screenshot-1.png, 
> upload_content.py
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I'm using llama-index and llama3 to train a model. I'm using a very simple 
> code that reads some *.txt files from local and uploads them to Cassandra and 
> then creates the index:
>  
> {code:java}
> # Create the index from documents
> index = VectorStoreIndex.from_documents(
>     documents,
>     service_context=vector_store.service_context,
>     storage_context=storage_context,
>     show_progress=True,
>     ) {code}
> This works well and I'm able to use a Chat app to get responses from the 
> Cassandra data. however, right after, I cannot restart Cassandra. It'll break 
> with the following error:
>  
> {code:java}
> INFO  [PerDiskMemtableFlushWriter_0:7] 2024-05-23 08:23:20,102 
> Flushing.java:179 - Completed flushing 
> /data/cassandra/data/gpt/docs_20240523-10c8eaa018d811ef8dadf75182f3e2b4/da-6-bti-Data.db
>  (124.236MiB) for commitlog position 
> CommitLogPosition(segmentId=1716452305636, position=15336)
> [...]
> WARN  [MemtableFlushWriter:1] 2024-05-23 08:28:29,575 
> MemtableIndexWriter.java:92 - [gpt.docs.idx_vector_docs] Aborting index 
> memtable flush for 
> /data/cassandra/data/gpt/docs-aea77a80184b11ef8dadf75182f3e2b4/da-3-bti...{code}
> {code:java}
> java.lang.IllegalStateException: null
>         at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:496)
>         at 
> org.apache.cassandra.index.sai.disk.v1.vector.VectorPostings.computeRowIds(VectorPostings.java:76)
>         at 
> org.apache.cassandra.index.sai.disk.v1.vector.OnHeapGraph.writeData(OnHeapGraph.java:313)
>         at 
> org.apache.cassandra.index.sai.memory.VectorMemoryIndex.writeDirect(VectorMemoryIndex.java:272)
>         at 
> org.apache.cassandra.index.sai.memory.MemtableIndex.writeDirect(MemtableIndex.java:110)
>         at 
> org.apache.cassandra.index.sai.disk.v1.MemtableIndexWriter.flushVectorIndex(MemtableIndexWriter.java:192)
>         at 
> org.apache.cassandra.index.sai.disk.v1.MemtableIndexWriter.complete(MemtableIndexWriter.java:117)
>         at 
> org.apache.cassandra.index.sai.disk.StorageAttachedIndexWriter.complete(StorageAttachedIndexWriter.java:185)
>         at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
>         at 
> java.base/java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1085)
>         at 
> org.apache.cassandra.io.sstable.format.SSTableWriter.commit(SSTableWriter.java:289)
>         at 
> org.apache.cassandra.db.compaction.unified.ShardedMultiWriter.commit(ShardedMultiWriter.java:219)
>         at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.flushMemtable(ColumnFamilyStore.java:1323)
>         at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1222)
>         at 
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>         at java.base/java.lang.Thread.run(Thread.java:829) {code}
> The table created by the script is as follows:
>  
> {noformat}
> CREATE TABLE gpt.docs (
>     partition_id text,
>     row_id text,
>     attributes_blob text,
>     body_blob text,
>     vector vector<float, 1024>,
>     metadata_s map<text, text>,
>     PRIMARY KEY (partition_id, row_id)
> ) WITH CLUSTERING ORDER BY (row_id ASC)
>     AND additional_write_policy = '99p'
>     AND allow_auto_snapshot = true
>     AND bloom_filter_fp_chance = 0.01
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>     AND cdc = false
>     AND comment = ''
>     AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.UnifiedCompactionStrategy', 
> 'scaling_parameters': 'T4', 'target_sstable_size': '1GiB'}
>     AND compression = {'chunk_length_in_kb': '16', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND memtable = 'default'
>     AND crc_check_chance = 1.0
>     AND default_time_to_live = 0
>     AND extensions = {}
>     AND gc_grace_seconds = 864000
>     AND incremental_backups = true
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair = 'BLOCKING'
>     AND speculative_retry = '99p';
> CREATE CUSTOM INDEX eidx_metadata_s_docs ON gpt.docs (entries(metadata_s)) 
> USING 'org.apache.cassandra.index.sai.StorageAttachedIndex';
> CREATE CUSTOM INDEX idx_vector_docs ON gpt.docs (vector) USING 
> 'org.apache.cassandra.index.sai.StorageAttachedIndex';{noformat}
> Thank you
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to