It might be helpful to turn this into an APE proposal that folks can read and comment on, with the next level of detail.  My take is that I'd recommend trying regular collections for these at first - see if that's fast enough?  It would be nice if so.  This doesn't really feel like something that belongs in the system catalogs, as those are special and handled quite differently.  Will the additional spatial functions be UDFs or built-ins?  If they were UDFs there is a mechanism (Ian can say more) for doing initialization of some cross-function (read-only) state - reference tables were a use case that was a motivator - maybe that would be a help.  (This could all be in a Java UDF library, perhaps?)  @Ian Maxon?

We should also think again about moving from the current ESRI-based Java library to one that's more widely used and supported, perhaps?

Cheers,

Mike

On 10/3/23 11:03 AM, Ahmed Eldawy wrote:
Hi dev team,

I am working with Riyafa on adding support for coordinate reference systems
(CRS) to AsterixDB geometries. This additional information describes how
data is projected from the physical space to geometry coordinates and is
essential for many data science and GIS analytics projects. The way we plan
to implement this is by having a central CRS dataset that we use. Each CRS
will be identified by an integer Spatial Reference Identifier (SRID) that
we will add to each geometry. This reduces the storage overhead and speeds
up the retrieval of a geometry CRS. My question is about the best way of
storing the CRS information in that central table. Here are our constraints
and requirements.

1. This information should be highly-available. It will be accessed
frequently by worker nodes during data processing.
2. The CRS table should be consistent across all machines so that SRID->CRS
mapping is also consistent.
3. We might need to update the table occasionally, e.g., while loading data
from an external source. This ensures that we parse the external CRS and
use it appropriately.
4. The table is not expected to be super large. The standard CRS database
contains less than 32,000 records and we might extend it occasionally from
external sources.
5. The CRS table is durable and will need to be loaded back upon system
restart.
6. There is a serialized form for CRS, but it will be way more efficient if
we can keep the CRSes as Java objects to reduce the parsing overhead.

Do you have any recommendations on the best way of storing such a table? Is
the catalog the right place to keep this information?


Ahmed Eldawy<http://www.cs.ucr.edu/~eldawy>
<https://star.cs.ucr.edu>
Tel: +1 (951) 827-5654

Reply via email to