e-kotov opened a new issue, #474:
URL: https://github.com/apache/sedona-db/issues/474

   
   Hi @paleolimbot ,
   
   I'm exploring the possibility of building a dplyr-compatible interface on 
top of sedonadb for R, similar to what 
[duckplyr](https://github.com/tidyverse/duckplyr) provides for DuckDB.
   
   Looking at the current R bindings, I noticed that sedonadb exposes a limited 
subset of DataFrame operations:
   - `select_indices()` for column selection
   - `limit()` for row limiting
   - `collect()` / `to_view()` for materialization
   
   In contrast, 
[duckdb-r](https://github.com/duckdb/duckdb-r/blob/main/R/relational.R) exposes 
a full relational algebra API that duckplyr uses:
   - Expression builders: `expr_reference()`, `expr_constant()`, 
`expr_function()`, `expr_comparison()`
   - Relation operations: `rel_filter()`, `rel_project()`, `rel_aggregate()`, 
`rel_order()`, `rel_join()`
   
   This allows duckplyr to translate dplyr verbs directly into relational 
operations without going through SQL string generation.
   
   **Questions:**
   
   1. Are there any plans to expose more of DataFusion's DataFrame API (like 
`filter()`, `aggregate()`, `sort()`) through the R bindings?
   
   2. Would there be interest in accepting contributions that add an 
expression/relational API similar to duckdb-r?
   
   For now, I'm working on a SQL-based approach using `sd_sql()`, which works 
but requires R-to-SQL expression translation. A native relational API would be 
more elegant and potentially more performant (avoiding SQL parsing overhead).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to