logan-keede commented on issue #14118: URL: https://github.com/apache/datafusion/issues/14118#issuecomment-2599032845
I thought so, I looked at some other avenues, Here are some things that might help anyone trying to solve this in future. Problem:- `exclude_using_columns` return the column from lexicographical largest table so if the join is between two table `t2` and `t` then it will return common columns from `t2` If we make the sub-query alias name `t2` too it tries to exclude `a` from that table too. Possible Solution:- Currently `excluded_columns` from sub-query is leaking into super query. A normal statement without join should not have the need to use `exclude_using_columns`, and even if it does, it should not need to look at sub-queries within it since they have already excluded redundant columns. Possible implementation Strategies:- 1. After calculating sub-query The expression should become logical equivalent of `select * from t2`. 2. Make exclude using columns shallower/non-recursive i.e. do not let it search for redundant column in sub-query. @jonahgao, please let me know if you think this approach might work. Regardless, I would like to be unassigned from this issue, as I believe it is beyond my current capabilities. I plan to look for more beginner-friendly issues and spend some time familiarizing myself with the codebase first. PS: I might have circled back to the original issue/Problem, regardless I hope the process contributed something. ^_^ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org