sidshehria commented on issue #754:
URL: 
https://github.com/apache/datafusion-python/issues/754#issuecomment-2673992155

   **Problem Statement:**
   
   - When using DataFusion in Python, column names often include a default 
alias like `?table?` .
   
   - This can make the column names less user-friendly when working with 
expressions such as `.select()` , `.filter()` , or `.aggregate()` .
   
   - Example: A column name like `"AVG(?table?.has_parking)"` instead of just 
`"AVG(has_parking)".`
   
   **Proposed Solution:**
   
   - The goal is to automate aliasing internally, so users do not have to 
manually call `.alias()` every time.
   
   **Alternatives Considered:**
   
   - Manually using `.alias()` , but this is tedious and not user-friendly.
   
   **Solution Implementation**
   
   The best way to solve this issue is to **modify the DataFusion API 
internally** to:
   
   - Remove `?table?` from column names automatically.
   
   - Ensure operators like `==`  , `>=` , and `.cast()` work naturally with 
Python literals.
   
   -  Create an alias for `read_parquet()` so that users don’t need to manually 
handle `SessionContext` .
   
   
   After going through `datafusion-python` 's codebase I think that we should 
focus on the following files:
   
   1. `datafusion/dataframe.py`:
   
   - **Purpose**: This file defines the DataFrame class and its associated 
methods, including `.select()`, `.filter()`, and `.aggregate()`.
   
   - **Modification**: Within these methods, implement logic to automatically 
remove or replace the default `?table?` prefix in column names, thereby 
generating cleaner and more user-friendly aliases.
   
   2. `python/src/datafusion/dataframe.rs`:
   
   - **Purpose**: This Rust source file defines the `DataFrame` struct and its 
associated methods, including` .select()`, `.filter()`, and `.aggregate()`.
   
   - **Modification**: Within these methods, implement logic to automatically 
remove or replace the default `?table?` prefix in column names, thereby 
generating cleaner and more user-friendly aliases.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to