petern48 commented on issue #380:
URL: https://github.com/apache/sedona-db/issues/380#issuecomment-3587777500

   @paleolimbot Something else is wrong. The same query except using `SELECT *` 
instead of `SUM()` executes instantly (`0.6205670833587646 seconds`).
   
   ```
   sd.sql(f"""
   SELECT * FROM population_areas WHERE ST_Intersects(wkb_geometry, 
ST_SetSRID(ST_GeomFromWKT('{wkt}'), 4326))
   """).show()
   ```
   
   Seeing that, I knew the issue was in the sum. Then I looked more carefully. 
I noticed the population field is a utf8 instead of a number.
   
   ```
   
┌────────────┬────────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
   │ population ┆       GISJOIN      ┆                                          
       wkb_geometry                                                 │
   │    utf8    ┆        utf8        ┆                                          
         geometry                                                   │
   ```
   
   I tried various casts, and I never got it to complete. Either the cast isn't 
going smoothly or the whole query is just taking so long because of the casting.
   
   Oddly enough, taking any sort of `count(*)` instead of `*` gets stuck 
similarly to the sum an cast.
   
   ```
   SELECT count(*) FROM population_areas WHERE ST_Intersects(wkb_geometry, 
ST_SetSRID(ST_GeomFromWKT('{wkt}'), 4326))
   SELECT count(*) FROM population_areas
   ```
   
   This part confuses me. I was happy to discover the utf8 issue. This 
`count()` one is concerning, though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to