bellipsis opened a new issue, #2372: URL: https://github.com/apache/sedona/issues/2372
### Summary Some of the data is missing when loading the raw osm planet file compared to geofabrik extracts. I know geofabrik removes user identifiable information from the output pbf extracts; however, I am unsure if the missing data is related to that or not. ### Details I'm running Sedona 1.8.0, Spark 3.5, and Java 17 I was loading the [osm pbf planet file](https://planet.openstreetmap.org/pbf/planet-latest.osm.pbf) and discovered a number of ways were missing from Paris Ex: (64955027, 727994377) ``` df = sc.read.format("osmpbf").load("/opt/spark/work-dir/data/tmp/planet-latest.osm.pbf") df.where("id = 64955027").show() ``` +---+----+--------+----+----+---------+---------+--------+ |id |kind|location|tags|refs|ref_roles|ref_types|ref_size| +---+----+--------+----+----+---------+---------+--------+ +---+----+--------+----+----+---------+---------+--------+ In an attempt to troubleshoot, I downloaded the [geofabrik extract of ile-de-france](https://download.geofabrik.de/europe/france/ile-de-france-latest.osm.pbf) and they were present when I ran the same code on the extract file. ``` df = sc.read.format("osmpbf").load("/opt/spark/work-dir/data/tmp/ile-de-france-latest.osm.pbf") df.where("id = 64955027").show() ``` +--------+----+--------+--------------------+--------------------+---------+---------+ | id|kind|location| tags| refs|ref_roles|ref_types| +--------+----+--------+--------------------+--------------------+---------+---------+ |64955027| way| NULL|{note -> invalide...|[8812362566, 7958...| NULL| NULL| +--------+----+--------+--------------------+--------------------+---------+---------+ I confirmed the ways are present in the planet file using the osmium tool: `osmium getid planet-latest.osm.pbf w64955027 -f osm` ``` <?xml version='1.0' encoding='UTF-8'?> <osm version="0.6" generator="osmium/1.14.0"> <bounds minlat="-90" minlon="-180" maxlat="90" maxlon="180"/> <way id="64955027" version="24" timestamp="2025-07-02T12:53:12Z" uid="16734285" user="Rémi_" changeset="168397364"> <nd ref="8812362566"/> <nd ref="795806574"/> <nd ref="795806854"/> <nd ref="795806938"/> <nd ref="795806767"/> <nd ref="795806964"/> <nd ref="795806951"/> <nd ref="795806722"/> <nd ref="795806703"/> <nd ref="795806646"/> <nd ref="795806501"/> <nd ref="795806981"/> <nd ref="795806878"/> <nd ref="795806931"/> <nd ref="795806557"/> <nd ref="795806492"/> <nd ref="795806556"/> <nd ref="795806824"/> <nd ref="795806894"/> <nd ref="795806980"/> <nd ref="795806634"/> <nd ref="795806843"/> <nd ref="795806542"/> <nd ref="795806921"/> <nd ref="795806637"/> <nd ref="795806625"/> <nd ref="795806698"/> <nd ref="795806963"/> <nd ref="795806699"/> <nd ref="795806695"/> <nd ref="795806578"/> <nd ref="1278079896"/> <nd ref="2522145299"/> <nd ref="6216389485"/> <nd ref="2522145300"/> <nd ref="1278079911"/> <nd ref="795806792"/> <nd ref="795806839"/> <nd ref="795806885"/> <nd ref="795806654"/> <nd ref="795806587"/> <nd ref="795806680"/> <nd ref="795806797"/> <nd ref="795806988"/> <nd ref="795806732"/> <nd ref="795806528"/> <nd ref="795806509"/> <nd ref="795806746"/> <nd ref="795806575"/> <nd ref="795806616"/> <nd ref="795806553"/> <nd ref="1278079893"/> <nd ref="795806864"/> <nd ref="1278079909"/> <nd ref="1278079895"/> <nd ref="2522145293"/> <nd ref="1278079897"/> <nd ref="1278079914"/> <nd ref="2522145283"/> <nd ref="795806745"/> <nd ref="2522145280"/> <nd ref="795806640"/> <nd ref="795806947"/> <nd ref="795806526"/> <nd ref="795806569"/> <nd ref="795806506"/> <nd ref="795806816"/> <nd ref="795806779"/> <nd ref="795806877"/> <nd ref="795806613"/> <nd ref="795806841"/> <nd ref="6216629430"/> <nd ref="8812362566"/> <tag k="alt_name" v="Église des soldats"/> <tag k="alt_name:de" v="Soldatenkirche"/> <tag k="amenity" v="place_of_worship"/> <tag k="building" v="cathedral"/> <tag k="building:levels" v="5"/> <tag k="check_date" v="2023-09-16"/> <tag k="denomination" v="catholic"/> <tag k="height" v="33"/> <tag k="name" v="Cathédrale Saint-Louis des Invalides"/> <tag k="name:en" v="Saint Louis of Les Invalids Cathedral"/> <tag k="name:eo" v="Katedralo Saint-Louis-des-Invalides"/> <tag k="name:fr" v="Cathédrale Saint-Louis des Invalides"/> <tag k="note" v="invalides.org/pages/historique.html"/> <tag k="ref:FR:CEF" v="75107_03"/> <tag k="religion" v="christian"/> <tag k="source" v="cadastre-dgi-fr source : Direction Générale des Impôts - Cadastre. Mise à jour : 2010"/> <tag k="tourism" v="attraction"/> <tag k="wikidata" v="Q4992220"/> <tag k="wikipedia" v="fr:Cathédrale Saint-Louis-des-Invalides"/> </way> </osm> ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
