chenkovsky commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2643492241
> > for stopping system column propagation, have you tested other logical plans e.g. union intersect? > > I have not tried constructing logical plans directly. My thought was that even if you do some sort of union you'll still have a projection. For example: > > ```sql > select *, _rowid > from t1 > union all > select *, _rowid > from t2 > ``` > > In this case `_rowid` would come out as a "regular" column. If you did: > > ```python > select * > from t1 > union all > select * > from t2 > ``` > > You get no `_rowid` column in the output at all. > > I'm sure you can create funky stuff by constructing a plan without a projection, but I don't think that happens in the real world even with the optimizer: there is always a projection in the first pass and only after that is evaluated would it get pushed down to a tablescan. > > ```python > > SET datafusion.optimizer.max_passes = 0; > 0 row(s) fetched. > Elapsed 0.001 seconds. > > > create table t (x int, y int); > 0 row(s) fetched. > Elapsed 0.002 seconds. > > > explain > select * > from t; > +---------------+-----------------------------------------------+ > | plan_type | plan | > +---------------+-----------------------------------------------+ > | logical_plan | Projection: t.x, t.y | > | | TableScan: t | > | physical_plan | MemoryExec: partitions=1, partition_sizes=[0] | > | | | > +---------------+-----------------------------------------------+ > 2 row(s) fetched. > Elapsed 0.006 seconds. > > > explain > select x > from t; > +---------------+-----------------------------------------------+ > | plan_type | plan | > +---------------+-----------------------------------------------+ > | logical_plan | Projection: t.x | > | | TableScan: t | > | physical_plan | MemoryExec: partitions=1, partition_sizes=[0] | > | | | > +---------------+-----------------------------------------------+ > ``` > > So I don't think we have to worry about a plan without a projection as long as a plan with a projection correctly evaluates wildcards. as I previously asked, in your implementation "a system column stops being a system column once it's projected" ? If this is correct, then as you said there's no need to add more UTs. I have to call out it seems that this behavior is incompatible with Spark. I know whether follow Spark's standard is another problem. but community should be aware of this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org