tmoschou opened a new pull request, #4685: URL: https://github.com/apache/cassandra/pull/4685
## Motivation Users frequently store fixed-size binary identifiers in blob columns (e.g. SHA-256 hashes, proprietary binary formats) and choose blobs over string representations to reduce disk space. Previously, creating a SAI index on a blob column was rejected with `Unsupported type: blob`, forcing workarounds like Base64 encoding. Blob support was originally excluded due to concerns about indexing arbitrarily large blobs (e.g. serialized objects, pagination cursors, or media payloads that could be many kilobytes). This patch introduces a dedicated `sai_blob_term_size_warn_threshold` (default 1KiB) / `sai_blob_term_size_fail_threshold` (default 8KiB) guardrail, following the existing pattern of `sai_string_term_size_*`, `sai_frozen_term_size_*`, and `sai_vector_term_size_*`. This allows operators to configure blob term size limits independently. ## Summary Add support for the `blob` CQL type in Storage Attached Index (SAI) as an equality-only (EQ) indexed literal type. ### Before ```sql CREATE TABLE mytable (id uuid, blob blob, PRIMARY KEY (id)); CREATE INDEX blob_idx ON mytable (blob) USING 'sai'; -- InvalidQueryException: Unsupported type: blob ``` ### After ```sql CREATE TABLE mytable (id uuid, blob blob, PRIMARY KEY (id)); CREATE INDEX blob_idx ON mytable (blob) USING 'sai'; -- OK INSERT INTO mytable (id, blob) VALUES (uuid(), 0xdeadbeef); SELECT * FROM mytable WHERE blob = 0xdeadbeef; ``` - Add `CQL3Type.Native.BLOB` to `StorageAttachedIndex.SUPPORTED_TYPES` - Add `BytesType` to `EQ_ONLY_TYPES` in `IndexTermType` and introduce a new `BYTES` capability so blob columns are classified as literal (trie-indexed) types - Add dedicated `sai_blob_term_size_warn_threshold` / `sai_blob_term_size_fail_threshold` guardrails (defaults: 1KiB / 8KiB) - Add `BlobDataSet` and parameterized CQL-level tests for the blob type, both standalone and within all collection variants (list, set, map keys/values/entries, frozen collections) - Add `GuardrailSaiBlobTermSizeTest` for the new guardrail - Update SAI documentation (`sai-concepts.adoc`, `sai-faq.adoc`) and `cassandra.yaml` / `cass_yaml_file.adoc` to reflect blob support ## Performance Each new parameterized test class adds ~7 seconds of runtime. With 11 new blob test classes, this adds roughly 75–80 seconds to the SAI type test suite. This follows the existing pattern used by all other SAI type tests (e.g. `BooleanTest`, `InetTest`, etc.). ## Test plan - [x] New unit tests: `BlobTest`, `ListBlobTest`, `FrozenListBlobTest`, `SetBlobTest`, `FrozenSetBlobTest`, `MapBlobTest`, `FrozenMapBlobTest`, `MapKeysBlobTest`, `MapValuesBlobTest`, `MapEntriesBlobTest`, `MultiMapBlobTest` - [x] Updated `IndexTermTypeTest` to assert blob is a literal type - [ ] CI (CircleCI) [CASSANDRA-20012](https://issues.apache.org/jira/browse/CASSANDRA-20012) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

