Hello Everyone,
I recently merged a longstanding branch and wanted to share the underlying idea.
Some time ago, I made a few changes to Apache Calcite to improve support for
spatial data. Apache Calcite comes with a robust SQL parser, a query optimizer,
and good abstractions to plug in any data sources.
My initial idea was to provide a faster solution than PostGIS for preparing map
data. A memory-mapped file adapter plugged into Calcite sounded like a great
solution, but this goal was probably a bit too ambitious. As an intermediate
step, I created adapters for all the file data sources currently implemented in
Baremaps (Shapefile, FlatGeobuf, GeoPackage, PostGIS, etc.), and it is now
possible to use SQL as an abstraction to move data around (e.g., from files to
PostGIS).
For instance, a Shapefile can now be imported into PostGIS as follows. In this
case, the PostgresDdlExecutor uses PostgreSQL’s COPY API to ensure good
performance, and most ST_ functions (reprojection, buffering, etc.) should
work, as they are available in Calcite [2]. Pretty cool, right? ;-)
// Setup Calcite connection properties
Properties info = new Properties();
info.setProperty("lex", "MYSQL");
info.setProperty("caseSensitive", "false");
info.setProperty("unquotedCasing", "TO_LOWER");
info.setProperty("quotedCasing", "TO_LOWER");
info.setProperty("parserFactory", PostgresDdlExecutor.class.getName() +
"#PARSER_FACTORY");
try (Connection connection = DriverManager.getConnection("jdbc:calcite:",
info)) {
CalciteConnection calciteConnection =
connection.unwrap(CalciteConnection.class);
SchemaPlus rootSchema = calciteConnection.getRootSchema();
// Create a ShapefileTable instance
ShapefileTable shapefileTable = new ShapefileTable(SAMPLE_SHAPEFILE);
// Register the shapefile table in the Calcite schema
rootSchema.add(SHAPEFILE_TABLE_NAME, shapefileTable);
// Create a table in PostgreSQL by selecting from the shapefile table
String createTableSql = "CREATE TABLE " + IMPORTED_TABLE_NAME + " AS " +
"SELECT * FROM " + SHAPEFILE_TABLE_NAME;
// Execute the DDL statement to create the table
try (Statement statement = connection.createStatement()) {
statement.execute(createTableSql);
}
}
All the import tasks of the workflows have been adapted to this new approach,
and a few integration tests have been added to the codebase. However, expect
these changes to introduce bugs or potential performance regressions in the
main branch. They will be addressed before the next release.
Feel free to share your feedback or let me know if you have any cool use cases
for this in mind. As mentioned, one of my upcoming goals is to leverage the
memory-mapped data structures [2] implemented in the baremaps-data module to
speed up workflow execution on large spatial datasets.
Wish you all a good week-end,
Bertil
[1] -
https://calcite.apache.org/docs/reference.html#geometry-conversion-functions-2d
[2] -
https://github.com/apache/incubator-baremaps/tree/main/baremaps-data/src/main/java/org/apache/baremaps/data/collection
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]