[ https://issues.apache.org/jira/browse/CASSANDRA-20190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938608#comment-17938608 ]
Dmitry Konstantinov commented on CASSANDRA-20190: ------------------------------------------------- A possible way can be to add a heuristic check into org.apache.cassandra.io.sstable.indexsummary.IndexSummary.IndexSummarySerializer#deserialize once we loaded offsets and entries into memory we can read offsets.getInt(0) (and/or entries.getLong(..)) and check if it is suspiciously big for example by offsets.getInt(0) > Integer.reserveBytes(offsets.getInt(0) or a similar way and then we can throw IOException. We load summaries using the following call tree if I read the code correctly: * org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder#openComponents <-- it should rebuild the summary * org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder#loadSummary <-- it will delete the summary file * org.apache.cassandra.io.sstable.format.big.IndexSummaryComponent#loadOrDeleteCorrupted <-- it will catch the IOException * org.apache.cassandra.io.sstable.format.big.IndexSummaryComponent#load * org.apache.cassandra.io.sstable.indexsummary.IndexSummary.IndexSummarySerializer#deserialize so, it looks like it would be the cheapest way to do such detection... > MemoryUtil.setInt/getInt and similar use the wrong endianness > ------------------------------------------------------------- > > Key: CASSANDRA-20190 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20190 > Project: Apache Cassandra > Issue Type: Bug > Components: Local/Other > Reporter: Branimir Lambov > Assignee: Dmitry Konstantinov > Priority: Normal > Fix For: 5.x > > Time Spent: 1h 20m > Remaining Estimate: 0h > > `NativeCell`, `NativeClustering` and `NativeDecoratedKey` use the above > methods from `MemoryUtil` to write and read data from native memory. As far > as I can see they are meant to write data in big endian. They do not (they > always correct to little endian). > Moreover, they disagree with their `ByByte` versions on big-endian machines > (which is only likely an issue on aligned-access architectures (x86 and arm > should be fine)). > The same is true for the methods in `Memory`, used by compression metadata as > well as index summaries. > We need to verify that this does not cause any problems, and to change the > methods to behave as expected and document the behaviour by explicitly using > `ByteOrder.LITTLE_ENDIAN` for any data that may have been persisted on disk > with the wrong endianness. > > The current MemoryUtil behaviour (before the fix): > ||Native > order||MemoryUtil.setX||MemoryUtil.setXByByte||MemoryUtil.getX||MemoryUtil.getXByByte|| > |BE|LE|BE|LE|BE| > |LE|LE|LE|LE|LE| > shortly: MemoryUtil.setX/getX is LE, MemoryUtil.setXByByte/getXByByte is > Native -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org