Re: [PR] Run queries in python benchmarks using only one thread [sedona-db]

via GitHub Sun, 07 Sep 2025 05:57:04 -0700


petern48 commented on PR #24:
URL: https://github.com/apache/sedona-db/pull/24#issuecomment-3259069837


   > - We don't benchmark predicates with realistic input (they are ST_Contains 
with two identical inputs). The array/scalar case is probably best to focus on 
(more likely to affect the perceived speed of our engine's fist release).
   
   Yeah, I knowingly made `geom1` and `geom2` identical 
[here](https://github.com/apache/sedona-db/blob/8bddfa3ca5916fd42bec1968441cf516ba7fb08b/benchmarks/test_bench_base.py#L73-L79).
 I was primarily focused on the non-predicate functions, so I just put 
something together quick for binary functions. I meant to circle back to it 
later, but ran into other things.
   
   > - They benchmark on a "table" and not a Parquet scan. We have the edge on 
a Parquet scan, PostGIS and DuckDB have an edge with their native table format. 
The Parquet scan probably is more realistic.
   
   I knowingly did this too actually. When I started this was just supposed to 
be a scuffed development tool rather than something we'd use for presenting 
results publicly. I created these benchmarks with the intention of comparing 
purely the function implementations (ignoring parquet reading optimizations), 
since I've been focused on function implementations.
   
   I would still argue using "table" is probably more useful as a developer 
when it comes to optimizing functions, since it would be a raw 1-to-1 
comparison. We could make it configurable (later potentially). For now, It's 
totally fine if we want to change it to using parquet scans.
   
   Sorry for all of these issues 😬 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Run queries in python benchmarks using only one thread [sedona-db]

Reply via email to