Cc += geospatial@. I think allowing WKB and WKT is sufficient.
Perhaps Geometry could be a composite type (WKT, SRID) or (WKB, SRID). SRID (spatial reference identifier) is almost always needed to qualify a geometry value. It is analogous to how TimeZone is needed (implicitly or explicitly) to qualify a DateTime value. For Geospatial queries to perform well requires some kind of indexing (and/or clever data organization). Geospatial indexing is very complex, and there is no “one size fits all” approach. So I recommend that Arrow stays out of the indexing business, and leaves indexing to the engine. Julian > On Jun 25, 2021, at 10:17 AM, Mauricio Vargas <mavarga...@uc.cl.INVALID> > wrote: > > Dear Jon > > Thanks for sending this. Based on previous projects, WKB works well with > SQLite, DuckDB and others, at the expense of creating heavier size columns > compared to PostGIS. > > In order to experiment with, it can be interesting to use the CENSO 2017 > shape files: https://github.com/ropensci/censo2017-cartografias; > https://github.com/ropensci/censo2017-cartografias/releases/download/v0.4/cartografias-censo2017.zip > This includes rivers, streets, etc etc. > > Provided that Arrow is installed in a very straightforward way (for > Windows, at least), creating something based on PostGIS is probably not a > bad idea, but WKB works ok, and it integrates with 0 problems with the SF > package. I clearly see a great compression advantage here if we decide to > use WKB, as LZ4 shall make it very lightweight compared to, say, a CSV. > > Best, > > > > > > > > On Fri, Jun 25, 2021 at 1:05 PM Jonathan Keane <jke...@gmail.com> wrote: > >> Hello, >> >> There is an emerging spec[1] for how to store geospatial data in Arrow >> + pass through parquet files in the geopandas world. There is even a >> new R package that implements a wrapper to do the same in R[2]. These >> both define a serialization[3] for storing geospatial data as an Arrow >> table (and thus also when saving to parquet with Arrow). >> >> I could see a number of ways that we might interact with standards >> like these, and for any of these that we pursue it would be good to >> clarify that in our docs: >> >> 1. Point to the standard — we could mention that this standard exists >> and that if someone is building a geospatial data aware application, >> they _could_ refer to this standard if they want to. >> 2. Adopt a/this standard — this could range from stating that we've >> adopted it as the way that spatial data _ought_ to be stored to asking >> the creators if maintaining it within the Arrow project itself would >> be better (either by adopting it or creating a fork — of course >> communication with the folks working on it now would be critical!) >> 3. Create extension type(s) for geospatial data — this would require >> adopting a standard like the one linked, but on top of that providing >> an extension type within Arrow itself that the various clients could >> implement as they saw fit. >> 4. Create new, fully separate type(s) for geospatial data — again, >> this would require adopting a standard of some sort, but we would >> implement it as a specific type and presumably support it in all of >> the clients as we could. >> >> There are of course pros and cons to all of these. This type of data >> *is* somewhat specialized and I don't think we want to have a huge >> profusion of types for all of the possible specialized data types out >> there. But, at a minimum we should acknowledge (or adopt) a standard >> if it exists and encourage implementations that use Arrow to follow >> that standard (like sfarrow does to be compatible with geopandas) so >> that some level of interoperability is there + people aren't needing >> to reinvent the wheel each time they store spatial data. >> >> Thoughts? Are there other projects out there that already do something >> like this with Arrow that we should consider? >> >> [1] https://github.com/geopandas/geo-arrow-spec/pull/2 >> [2] https://github.com/wcjochem/sfarrow >> [3] for now they create a binary WKB column + attach a bit of metadata >> to the schema that that's what happened, though there are other ways >> one could encode this and the spec might include other way(s) to store >> this data in the future. >> >> -Jon >> > > > -- > — > *Mauricio 'Pachá' Vargas Sepúlveda* > Site: pacha.dev > Blog: pacha.dev/blog