[ https://issues.apache.org/jira/browse/CASSANDRA-18728 ]
Ke Han deleted comment on CASSANDRA-18728: ------------------------------------ was (Author: JIRAUSER289562): (Update) Thanks! I have found the way. It can be constructed using cassandra-cli. Using cqlsh seems not able to rebuild the data. [~wadey] -Hi Wade, is there any chance you could tell me how to create the data to trigger- CASSANDRA-14468? I am trying to fix a similar transient error. But I cannot create a table with comparator = BytesType. I have tried the following commands to create the same data dir you created (2.2.10 version), but it always generates comparator = UTF8. {code:java} CREATE KEYSPACE test3 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 } AND durable_writes = true; CREATE TABLE test3.alpha (key TEXT,foo TEXT, PRIMARY KEY (key)) WITH COMPACT STORAGE; CREATE TABLE test3.foos (key TEXT, "666f6f" TEXT, PRIMARY KEY (key)) WITH COMPACT STORAGE; CREATE INDEX idx_foo ON test3.foos ("666f6f");{code} Any help would be appreciated! > [Transient Bug] Incorrect ByteBuffer representation of ColumnIdentifiers when > 3.11.16 loading legacy data from 2.x > ------------------------------------------------------------------------------------------------------------------ > > Key: CASSANDRA-18728 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18728 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination > Reporter: Ke Han > Assignee: Ke Han > Priority: Normal > Fix For: 3.11.x > > Attachments: data.tar.gz, system.log > > Time Spent: 10m > Remaining Estimate: 0h > > h1. Bug Description > When using Cassandra 3.11.16 to load legacy data from 2.2.10, I noticed that > the byte representation of the column identifier is incorrect. > The legacy data contain two tables, and the schema is as follows. > {code:java} > cqlsh> desc test.alpha ; > CREATE TABLE test.alpha ( > key text PRIMARY KEY, > foo text > ) WITH COMPACT STORAGE > AND bloom_filter_fp_chance = 0.01 > AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment = '' > AND compaction = {'min_threshold': '4', 'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32'} > AND compression = {'sstable_compression': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = 'NONE'; > cqlsh> DESC test.foos ; > CREATE TABLE test.foos ( > key text PRIMARY KEY, > "666f6f" text > ) WITH COMPACT STORAGE > AND bloom_filter_fp_chance = 0.01 > AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment = '' > AND compaction = {'min_threshold': '4', 'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32'} > AND compression = {'sstable_compression': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = 'NONE'; > CREATE INDEX idx_foo ON test.foos ("666f6f"); {code} > There exists a column in test.foo with {*}name = "666f6f"{*}, the > corresponding byte representation should be Hex(666f6f) == > {*}363636663666{*}. However, when 3.11.15 loads the data and creating the > column, if we check the value in byteBuffer, the it still stores "666f6f". > {code:java} > // src/java/org/apache/cassandra/schema/SchemaKeyspace.java > public static ColumnDefinition createColumnFromRow(UntypedResultSet.Row row, > Types types) > { > String keyspace = row.getString("keyspace_name"); > String table = row.getString("table_name"); > ColumnDefinition.Kind kind = > ColumnDefinition.Kind.valueOf(row.getString("kind").toUpperCase()); > int position = row.getInt("position"); > ClusteringOrder order = > ClusteringOrder.valueOf(row.getString("clustering_order").toUpperCase()); > AbstractType<?> type = parse(keyspace, row.getString("type"), types); > if (order == ClusteringOrder.DESC) > type = ReversedType.getInstance(type); > // Injected log to check byteBuffer value > logger.info(String.format("column_name = %s, column_name_bytes = %s" , > row.getString("column_name"), new > String(row.getBytes("column_name_bytes").array(), StandardCharsets.UTF_8))); > ColumnIdentifier name = new > ColumnIdentifier(row.getBytes("column_name_bytes"), > row.getString("column_name")); > return new ColumnDefinition(keyspace, table, name, type, position, kind); > }{code} > h2. Logs > INFO [main] 2023-08-07 02:21:53,762 SchemaKeyspace.java:1136 - > *{color:#de350b}column_name = 666f6f, column_name_bytes = foo{color}* > It should be : +column_name_bytes = {color:#172b4d}666f6f{color}+ > {code:java} > INFO [main] 2023-08-07 02:21:53,722 StorageService.java:773 - Populating > token metadata from system tables > INFO [main] 2023-08-07 02:21:53,736 StorageService.java:780 - Token > metadata: Normal Tokens: > localhost/127.0.0.1:[95610762103941981519101009083045058398]INFO [main] > 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name = column1, > column_name_bytes = column1 > INFO [main] 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name = > foo, column_name_bytes = foo > INFO [main] 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name = > key, column_name_bytes = key > INFO [main] 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name = > value, column_name_bytes = value > INFO [main] 2023-08-07 02:21:53,762 SchemaKeyspace.java:1136 - column_name = > 666f6f, column_name_bytes = foo // Incorrect! > INFO [main] 2023-08-07 02:21:53,762 SchemaKeyspace.java:1136 - column_name = > column1, column_name_bytes = column1{code} > h1. Reproduce Method > h2. Method1: load attached data file > I have attached the data tar file, if start up Cassandra 3.11.16 with it and > inject a the log statement to print out the buffer value, we can notice that > the value is incorrect in the log. > h2. Method2: Generate data from the old version (2.1.19) > Start up Cassandra 2.1.19 version, using bin/cassandra-cli to construct the > following data > {code:java} > create keyspace test with strategy_options = {replication_factor:1} and > placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'; > use test; > create column family alpha > with column_type = 'Standard' > and comparator = 'UTF8Type' > and key_validation_class = 'UTF8Type' > and column_metadata = [{column_name: 'foo', validation_class: 'UTF8Type'}]; > create column family foos > with column_type = 'Standard' > and comparator = 'BytesType' > and key_validation_class = 'UTF8Type' > and column_metadata = [{column_name: '666f6f', validation_class: 'UTF8Type'}]; > {code} > Then load the data using 3.0.16 with the log statements injected, and you > will encounter the logs mentioned above. > h1. Thoughts > This is a transient bug which won't lead to exceptions or error logs. But the > incorrect byte representation might lead to some issues. > This bug shares the same triggering method with CASSANDRA-14468. I believe > this bug also shares the same root cause as CASSANDRA-14468. In > CASSANDRA-14468, the incorrect byte representation could lead to an upgrade > exception. It was partially fixed by avoiding the intern of ColumnIdentifier > (which makes this bug transient). But the real root cause remains, and it's > still possible to cause other problems. > h1. Root Cause > == TL, DR == > The new version (3.11.x) uses the *comparator* of the table to create > ColumnIdentifier. If the old version table comparator is {*}"BytesType"{*}, > the new version assumes that the old regular column name is already in bytes > format and thus [it directly puts the string in the > ByteBuffer|https://github.com/apache/cassandra/blob/058621a446d1b128c429bc5a40b67c5158524146/src/java/org/apache/cassandra/schema/LegacySchemaMigrator.java#L744]. > This generates a ColumnIdentifier whose text and bytes are inconsistent > * ColumnIdentifier: {text = "666f6f", bytes = {*}"666f6f"{*}}. > * The correct ColumnIdentify should be {text = "666f6f", bytes = > {*}"363636663666"{*}}. > == Full Version == > In more detail, this is how it happens > 1. In > [code|https://github.com/apache/cassandra/blob/058621a446d1b128c429bc5a40b67c5158524146/src/java/org/apache/cassandra/schema/LegacySchemaMigrator.java#L744], > it tries to intern the ColumnIdentifer using the comparator. The comparator > is BytesType and column name is "666f6f". > {code:java} > ColumnIdentifier.getInterned(comparator.fromString(row.getString("column_name")), > comparator); > {code} > 2. comparator.fromString(row.getString("column_name") directly returns a > ByteBuffer containing {*}"666f6f"{*}. The code below directly assumes that > the source is in bytes format. > {code:java} > // BytesType.java > public ByteBuffer fromString(String source) > { > try > { > return ByteBuffer.wrap(Hex.hexToBytes(source)); > } > catch (NumberFormatException e) > { > logger.info("running into MarshalException"); > throw new MarshalException(String.format("cannot parse '%s' as hex > bytes", source), e); > } > } {code} > 3. ColumnIdentifier.getInterned uses the returned ByteBuffer to create a new > ColumnIdentifier object. > {code:java} > text = "666f6f" > bytes = "666f6f"{code} > h1. Fix > This can be fixed in a simple way. If the comparator type is BytesType, we > shouldn't use comparator to get the ByteBuffer. Instead, consider it as > String format and directly use ByteBufferUtil.bytes to get the bytes. > Then the generated column identifier will be \{text = "666f6f", bytes = > "363636663666"} > The patch is [here|https://github.com/apache/cassandra/pull/2731]. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org