petern48 commented on PR #24: URL: https://github.com/apache/sedona-db/pull/24#issuecomment-3259069837
> - We don't benchmark predicates with realistic input (they are ST_Contains with two identical inputs). The array/scalar case is probably best to focus on (more likely to affect the perceived speed of our engine's fist release). Yeah, I knowingly made `geom1` and `geom2` identical [here](https://github.com/apache/sedona-db/blob/8bddfa3ca5916fd42bec1968441cf516ba7fb08b/benchmarks/test_bench_base.py#L73-L79). I was primarily focused on the non-predicate functions, so I just put something together quick for binary functions. I meant to circle back to it later, but ran into other things. > - They benchmark on a "table" and not a Parquet scan. We have the edge on a Parquet scan, PostGIS and DuckDB have an edge with their native table format. The Parquet scan probably is more realistic. I knowingly did this too actually. When I started this was just supposed to be a scuffed development tool rather than something we'd use for presenting results publicly. I created these benchmarks with the intention of comparing purely the function implementations (ignoring parquet reading optimizations), since I've been focused on function implementations. I would still argue using "table" is probably more useful as a developer when it comes to optimizing functions, since it would be a raw 1-to-1 comparison. We could make it configurable (later potentially). For now, It's totally fine if we want to change it to using parquet scans. Sorry for all of these issues 😬 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
