sidshehria commented on issue #754: URL: https://github.com/apache/datafusion-python/issues/754#issuecomment-2673992155
**Problem Statement:** - When using DataFusion in Python, column names often include a default alias like `?table?` . - This can make the column names less user-friendly when working with expressions such as `.select()` , `.filter()` , or `.aggregate()` . - Example: A column name like `"AVG(?table?.has_parking)"` instead of just `"AVG(has_parking)".` **Proposed Solution:** - The goal is to automate aliasing internally, so users do not have to manually call `.alias()` every time. **Alternatives Considered:** - Manually using `.alias()` , but this is tedious and not user-friendly. **Solution Implementation** The best way to solve this issue is to **modify the DataFusion API internally** to: - Remove `?table?` from column names automatically. - Ensure operators like `==` , `>=` , and `.cast()` work naturally with Python literals. - Create an alias for `read_parquet()` so that users don’t need to manually handle `SessionContext` . After going through `datafusion-python` 's codebase I think that we should focus on the following files: 1. `datafusion/dataframe.py`: - **Purpose**: This file defines the DataFrame class and its associated methods, including `.select()`, `.filter()`, and `.aggregate()`. - **Modification**: Within these methods, implement logic to automatically remove or replace the default `?table?` prefix in column names, thereby generating cleaner and more user-friendly aliases. 2. `python/src/datafusion/dataframe.rs`: - **Purpose**: This Rust source file defines the `DataFrame` struct and its associated methods, including` .select()`, `.filter()`, and `.aggregate()`. - **Modification**: Within these methods, implement logic to automatically remove or replace the default `?table?` prefix in column names, thereby generating cleaner and more user-friendly aliases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org