e-kotov opened a new issue, #474: URL: https://github.com/apache/sedona-db/issues/474
Hi @paleolimbot , I'm exploring the possibility of building a dplyr-compatible interface on top of sedonadb for R, similar to what [duckplyr](https://github.com/tidyverse/duckplyr) provides for DuckDB. Looking at the current R bindings, I noticed that sedonadb exposes a limited subset of DataFrame operations: - `select_indices()` for column selection - `limit()` for row limiting - `collect()` / `to_view()` for materialization In contrast, [duckdb-r](https://github.com/duckdb/duckdb-r/blob/main/R/relational.R) exposes a full relational algebra API that duckplyr uses: - Expression builders: `expr_reference()`, `expr_constant()`, `expr_function()`, `expr_comparison()` - Relation operations: `rel_filter()`, `rel_project()`, `rel_aggregate()`, `rel_order()`, `rel_join()` This allows duckplyr to translate dplyr verbs directly into relational operations without going through SQL string generation. **Questions:** 1. Are there any plans to expose more of DataFusion's DataFrame API (like `filter()`, `aggregate()`, `sort()`) through the R bindings? 2. Would there be interest in accepting contributions that add an expression/relational API similar to duckdb-r? For now, I'm working on a SQL-based approach using `sd_sql()`, which works but requires R-to-SQL expression translation. A native relational API would be more elegant and potentially more performant (avoiding SQL parsing overhead). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
