tmoschou opened a new pull request, #4685:
URL: https://github.com/apache/cassandra/pull/4685

   ## Motivation
   Users frequently store fixed-size binary identifiers in blob columns (e.g. 
SHA-256 hashes, proprietary binary formats) and choose blobs over string 
representations to reduce disk space. Previously, creating a SAI index on a 
blob column was rejected with `Unsupported type: blob`, forcing workarounds 
like Base64 encoding.
   
   Blob support was originally excluded due to concerns about indexing 
arbitrarily large blobs (e.g. serialized objects, pagination cursors, or media 
payloads that could be many kilobytes). This patch introduces a dedicated 
`sai_blob_term_size_warn_threshold` (default 1KiB) / 
`sai_blob_term_size_fail_threshold` (default 8KiB) guardrail, following the 
existing pattern of `sai_string_term_size_*`, `sai_frozen_term_size_*`, and 
`sai_vector_term_size_*`. This allows operators to configure blob term size 
limits independently.
   
   ## Summary
   Add support for the `blob` CQL type in Storage Attached Index (SAI) as an 
equality-only (EQ) indexed literal type.
   
   ### Before
   ```sql
   CREATE TABLE mytable (id uuid, blob blob, PRIMARY KEY (id));
   CREATE INDEX blob_idx ON mytable (blob) USING 'sai';
   -- InvalidQueryException: Unsupported type: blob
   ```
   
   ### After
   ```sql
   CREATE TABLE mytable (id uuid, blob blob, PRIMARY KEY (id));
   CREATE INDEX blob_idx ON mytable (blob) USING 'sai';
   -- OK
   
   INSERT INTO mytable (id, blob) VALUES (uuid(), 0xdeadbeef);
   SELECT * FROM mytable WHERE blob = 0xdeadbeef;
   ```
   
   - Add `CQL3Type.Native.BLOB` to `StorageAttachedIndex.SUPPORTED_TYPES`
   - Add `BytesType` to `EQ_ONLY_TYPES` in `IndexTermType` and introduce a new 
`BYTES` capability so blob columns are classified as literal (trie-indexed) 
types
   - Add dedicated `sai_blob_term_size_warn_threshold` / 
`sai_blob_term_size_fail_threshold` guardrails (defaults: 1KiB / 8KiB)
   - Add `BlobDataSet` and parameterized CQL-level tests for the blob type, 
both standalone and within all collection variants (list, set, map 
keys/values/entries, frozen collections)
   - Add `GuardrailSaiBlobTermSizeTest` for the new guardrail
   - Update SAI documentation (`sai-concepts.adoc`, `sai-faq.adoc`) and 
`cassandra.yaml` / `cass_yaml_file.adoc` to reflect blob support
   
   ## Performance
   Each new parameterized test class adds ~7 seconds of runtime. With 11 new 
blob test classes, this adds roughly 75–80 seconds to the SAI type test suite. 
This follows the existing pattern used by all other SAI type tests (e.g. 
`BooleanTest`, `InetTest`, etc.).
   
   ## Test plan
   - [x] New unit tests: `BlobTest`, `ListBlobTest`, `FrozenListBlobTest`, 
`SetBlobTest`, `FrozenSetBlobTest`, `MapBlobTest`, `FrozenMapBlobTest`, 
`MapKeysBlobTest`, `MapValuesBlobTest`, `MapEntriesBlobTest`, `MultiMapBlobTest`
   - [x] Updated `IndexTermTypeTest` to assert blob is a literal type
   - [ ] CI (CircleCI)
   
   [CASSANDRA-20012](https://issues.apache.org/jira/browse/CASSANDRA-20012)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to