If there's already a solution in place for heterogeneous data types, it would be a good idea to extend this to CRS as it will be easier to maintain. From practical perspective, if we work with vector data (points, lines, and polygons), it is not likely that we will have a mix of CRSes in the same dataset so I think this approach will help in utilizing the index while ensuring correct results.
On Mon, Sep 8, 2025 at 11:00 AM Ian Maxon <[email protected]> wrote: > Perhaps the heterogeneous index could simply be extended to work on > geometries, including heterogeneous SRID? I'm imagining basically that > for geometries, we could treat the SRID as we do a type tag. Then, > within each SRID, the geometries are ordered using a space filling > curve, like what Young-Seok did a few years back (if I remember > right). This seems more straightforward than trying to extend the > technique used in the heterogeneous index to RTrees. > > On Sun, Sep 7, 2025 at 8:03 AM Mike Carey <[email protected]> wrote: > > > > Sounds like a valuable project! We should look carefully at the > > detailed syntax for providing the CRS info when the time comes, but I > > like the direction. Also, we should figure out if/how this may interact > > with our new heterogeneous indexes (as the goal for them is to mostly > > replace the older form of indexes that require type knowledge). > > > > On 9/4/25 12:11 AM, Suryaa Charan Shivakumar wrote: > > > Hello all, > > > > > > I’d like to start a discussion on how we might support CRS aware > indexing > > > for* geometry types* in AsterixDB as part of our geospatial work. > > > > > > *Current Situation* > > > > > > > > > - Today we only have the geometry type (AGEOMETRY), with no CRS > > > constraint. > > > - This makes it flexible, but correctness and performance issues > arise > > > when mixed-CRS data is indexed and eventually queried. > > > > > > One idea is to introduce CRS-constrained geometry types, so a field can > > > explicitly declare a CRS, > > > > > > ``` > > > > > > CREATE TYPE LocationType AS { > > > > > > coordinates: geometry(EPSG:4326), -- CRS-constrained geometry > > > > > > mixed_coordinates: geometry, -- Unconstrained > > > > > > address: string > > > > > > }; > > > > > > CREATE DATASET Locations(LocationType) PRIMARY KEY id; > > > > > > CREATE INDEX geo_idx ON Locations(coordinates) TYPE RTREE; -- CRS > > > auto-inferred > > > > > > ``` > > > > > > This separation allows users to decide: > > > > > > > > > - *Constrained geometry*: safe for indexing and cross-column > operations. > > > - *Unconstrained geometry*: flexible for heterogeneous CRS use > cases > > > > > > Benefits > > > > > > > > > - *Clarity*: Schema explicitly encodes CRS. > > > - *Optionality*: Mixed CRS still possible where needed. > > > - *Efficiency*: Query planner can reason about CRS at compile time > (no > > > runtime lookups). > > > - *Safety*: Cross-column CRS compatibility can be validated early. > > > - *Distributed-friendly*: CRS info can travel with type metadata; > > > workers validate independently. > > > > > > Flow diagram: > > > > https://urldefense.com/v3/__https://www.mermaidchart.com/play?utm_source=mermaid_live_editor&utm_medium=toggle*pako:eNp9Uttu2kAQ_ZWRpUjpAz_Qh1YBQ5OUWwCpqkwetusx3sa7a-2uRSjk3zsenMWoVfxmzxmf2xwTaXNMPidFZfeyFC7AJt0aoOfumIogYC3RCKfs17etOQ9ubmDohJEl-g4Jg8EXGGbbZI57yGnLY4ABjFZrkNb44IQymEM41LhNnvtLj7Q0flU-KLPrbVa4E_JwWYjELQHL-mHdS6v5PPHNr50TddnO23H2D65jBRgy74h4UyxIFpPAXoUSdmg1BneAW1L-KSoFGPFOmj0Yj5SQdVBZkbPeiEkZMz6mFj071yLIEnKUlXBknj61EZ7B4xZ8-on-BJPsTkqsAwiTgw_WYdTxfA2f2xN8I90r_I2SVVCwxhfWabi1dVDWiKove8KS7rORQxEQfC2CEhUok-NrBN0z6IG80VealehUODsonNXgqWd9sfnA8O_Zgvi0-oMOGt8ZVqawIAJVrmtVUbBKY7eHJu-VGAv_qMkpnwCX-X98lPTIkqbZqET5AqrgXkB5EgSWCiZtETtl7Ox82o1RHN5VNbOY9bzNujGw3sAmBh0slFbbHRoy38t6dml0kS2dlUid81W9s1xdy5x1LN5fF_y6_LipJYOeuqaE942m5HsmIvKJkats0oSG7knx2XrQjQ_dWfKfeztUUPL2F7cRRQM__;Iw!!CzAuKJ42GuquVTTmVmPViYEvSg!JHDwJO7PkUiM8_s87Jmvr9jMGAznGuVDDMXkeJIu8a9hL_NBu4K5qP8FAwtLXcK9B4meV06R2KZhkQ$ > > > > > > But it doesn't follow the loose typing preferred in the world of > > > semi-structured data. It however ensures performance, valid results for > > > queries and complete index. Other ideas include the below, > > > > > > > > > - Users should be warned about *lossy transformations* if we > enforce > > > converting everything to a single projection (e.g., 4326). > > > - Another option might be to support *multiple indexes* on the same > > > dataset, each tied to a different CRS (with a fixed practical > limit, say > > > 10), extending R-Tree physically to handle multiple CRS domains. > > > - Treat CRS almost like a *type domain similar *to heterogenous > indexing, > > > rather than a hard constraint at the type level. > > > > > > *Questions for the community:* > > > > > > > > > - Should we enforce CRS constraints strictly at the type level, or > > > consider index-level CRS flexibility (multiple CRSs per index)? > > > - How should we handle schema evolution and legacy datasets > without CRS > > > metadata? > > > - Are there better approaches to balance correctness, performance, > and > > > flexibility? > > > > > > Looking forward to everyone’s thoughts. > > > > > > Additional Context below - > > > > > > *What is a CRS?* > > > > > > A *Coordinate Reference System (CRS)* defines how numbers in a geometry > > > (like (x,y) pairs) map to real-world locations. > > > > > > > > > - For example, in EPSG:4326 (WGS84), coordinates are in > *longitude/latitude > > > degrees*. > > > - In EPSG:3857 (Web Mercator), the same numbers represent *meters > on a > > > projected plane*. > > > > > > Without CRS information, two geometries may use different measurement > > > systems, and a “distance” or “intersection” operation between them > would be > > > meaningless. > > > > > > *How AsterixDB Handles CRS Today (as per current patchset in review and > > > APE)* > > > > > > > > > - We have a single type geometry with *no schema-level CRS > constraint*. > > > - Each geometry object in WKB (Well-Known Binary) carries only a > *reference > > > ID* (e.g., 4326), not the full definition. > > > - CRS definitions themselves (EPSG codes, PROJ strings, etc.) are > loaded > > > into memory (Apache Derby) with an API call. > > > - When a user calls ST_Transform, we look up the definition using > the > > > stored ID and perform transformations with *Apache SIS*. > > > - This design means CRS enforcement is *runtime-only*: validation > or > > > transformation happens at query execution, not at schema > declaration. > > > > > > *Why Indexes Must Be in the Same CRS*Spatial indexes (R-Trees) compare > > > bounding boxes of geometries. If geometries use different CRSs: > > > > > > - One geometry might be in *degrees*, another in *meters*, another > in > > > *feet*. > > > - Mixing them breaks the index, since it assumes all values are in > a > > > single coordinate system. > > > > > > *Analogy: *It’s like cataloging items in a warehouse. If half the > > > measurements are in inches and half in centimeters, the index will be > > > inconsistent, a “short” object in inches could look bigger than a > “long” > > > object in centimeters. > > > > > > Thank you, > > > Suryaa > > > >
