bellipsis opened a new issue, #2372:
URL: https://github.com/apache/sedona/issues/2372

   ### Summary
   Some of the data is missing when loading the raw osm planet file compared to 
geofabrik extracts.  I know geofabrik removes user identifiable information 
from the output pbf extracts; however, I am unsure if the missing data is 
related to that or not.
   
   ### Details
   I'm running Sedona 1.8.0, Spark 3.5, and Java 17
   
   I was loading the [osm pbf planet 
file](https://planet.openstreetmap.org/pbf/planet-latest.osm.pbf) and 
discovered a number of ways were missing from Paris
   Ex: (64955027, 727994377)
   
   ```
   df = 
sc.read.format("osmpbf").load("/opt/spark/work-dir/data/tmp/planet-latest.osm.pbf")
   df.where("id = 64955027").show()
   ```
   
   +---+----+--------+----+----+---------+---------+--------+
   |id |kind|location|tags|refs|ref_roles|ref_types|ref_size|
   +---+----+--------+----+----+---------+---------+--------+
   +---+----+--------+----+----+---------+---------+--------+
   
   
   In an attempt to troubleshoot, I downloaded the [geofabrik extract of 
ile-de-france](https://download.geofabrik.de/europe/france/ile-de-france-latest.osm.pbf)
 and they were present when I ran the same code on the extract file. 
   
   ```
   df = 
sc.read.format("osmpbf").load("/opt/spark/work-dir/data/tmp/ile-de-france-latest.osm.pbf")
   df.where("id = 64955027").show()
   ```
   
   
+--------+----+--------+--------------------+--------------------+---------+---------+
   |      id|kind|location|                tags|                
refs|ref_roles|ref_types|
   
+--------+----+--------+--------------------+--------------------+---------+---------+
   |64955027| way|    NULL|{note -> invalide...|[8812362566, 7958...|     NULL| 
    NULL|
   
+--------+----+--------+--------------------+--------------------+---------+---------+
   
   
   I confirmed the ways are present in the planet file using the osmium tool:
   
   `osmium getid planet-latest.osm.pbf w64955027 -f osm`
   
   ```
   <?xml version='1.0' encoding='UTF-8'?>
   <osm version="0.6" generator="osmium/1.14.0">
     <bounds minlat="-90" minlon="-180" maxlat="90" maxlon="180"/>
     <way id="64955027" version="24" timestamp="2025-07-02T12:53:12Z" 
uid="16734285" user="Rémi_" changeset="168397364">
       <nd ref="8812362566"/>
       <nd ref="795806574"/>
       <nd ref="795806854"/>
       <nd ref="795806938"/>
       <nd ref="795806767"/>
       <nd ref="795806964"/>
       <nd ref="795806951"/>
       <nd ref="795806722"/>
       <nd ref="795806703"/>
       <nd ref="795806646"/>
       <nd ref="795806501"/>
       <nd ref="795806981"/>
       <nd ref="795806878"/>
       <nd ref="795806931"/>
       <nd ref="795806557"/>
       <nd ref="795806492"/>
       <nd ref="795806556"/>
       <nd ref="795806824"/>
       <nd ref="795806894"/>
       <nd ref="795806980"/>
       <nd ref="795806634"/>
       <nd ref="795806843"/>
       <nd ref="795806542"/>
       <nd ref="795806921"/>
       <nd ref="795806637"/>
       <nd ref="795806625"/>
       <nd ref="795806698"/>
       <nd ref="795806963"/>
       <nd ref="795806699"/>
       <nd ref="795806695"/>
       <nd ref="795806578"/>
       <nd ref="1278079896"/>
       <nd ref="2522145299"/>
       <nd ref="6216389485"/>
       <nd ref="2522145300"/>
       <nd ref="1278079911"/>
       <nd ref="795806792"/>
       <nd ref="795806839"/>
       <nd ref="795806885"/>
       <nd ref="795806654"/>
       <nd ref="795806587"/>
       <nd ref="795806680"/>
       <nd ref="795806797"/>
       <nd ref="795806988"/>
       <nd ref="795806732"/>
       <nd ref="795806528"/>
       <nd ref="795806509"/>
       <nd ref="795806746"/>
       <nd ref="795806575"/>
       <nd ref="795806616"/>
       <nd ref="795806553"/>
       <nd ref="1278079893"/>
       <nd ref="795806864"/>
       <nd ref="1278079909"/>
       <nd ref="1278079895"/>
       <nd ref="2522145293"/>
       <nd ref="1278079897"/>
       <nd ref="1278079914"/>
       <nd ref="2522145283"/>
       <nd ref="795806745"/>
       <nd ref="2522145280"/>
       <nd ref="795806640"/>
       <nd ref="795806947"/>
       <nd ref="795806526"/>
       <nd ref="795806569"/>
       <nd ref="795806506"/>
       <nd ref="795806816"/>
       <nd ref="795806779"/>
       <nd ref="795806877"/>
       <nd ref="795806613"/>
       <nd ref="795806841"/>
       <nd ref="6216629430"/>
       <nd ref="8812362566"/>
       <tag k="alt_name" v="Église des soldats"/>
       <tag k="alt_name:de" v="Soldatenkirche"/>
       <tag k="amenity" v="place_of_worship"/>
       <tag k="building" v="cathedral"/>
       <tag k="building:levels" v="5"/>
       <tag k="check_date" v="2023-09-16"/>
       <tag k="denomination" v="catholic"/>
       <tag k="height" v="33"/>
       <tag k="name" v="Cathédrale Saint-Louis des Invalides"/>
       <tag k="name:en" v="Saint Louis of Les Invalids Cathedral"/>
       <tag k="name:eo" v="Katedralo Saint-Louis-des-Invalides"/>
       <tag k="name:fr" v="Cathédrale Saint-Louis des Invalides"/>
       <tag k="note" v="invalides.org/pages/historique.html"/>
       <tag k="ref:FR:CEF" v="75107_03"/>
       <tag k="religion" v="christian"/>
       <tag k="source" v="cadastre-dgi-fr source : Direction Générale des 
Impôts - Cadastre. Mise à jour : 2010"/>
       <tag k="tourism" v="attraction"/>
       <tag k="wikidata" v="Q4992220"/>
       <tag k="wikipedia" v="fr:Cathédrale Saint-Louis-des-Invalides"/>
     </way>
   </osm>
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to