chenkovsky commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2644417122
> > > > for stopping system column propagation, have you tested other logical plans e.g. union intersect? > > > > > > > > > I have not tried constructing logical plans directly. My thought was that even if you do some sort of union you'll still have a projection. For example: > > > ```sql > > > select *, _rowid > > > from t1 > > > union all > > > select *, _rowid > > > from t2 > > > ``` > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In this case `_rowid` would come out as a "regular" column. If you did: > > > ```python > > > select * > > > from t1 > > > union all > > > select * > > > from t2 > > > ``` > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > You get no `_rowid` column in the output at all. > > > I'm sure you can create funky stuff by constructing a plan without a projection, but I don't think that happens in the real world even with the optimizer: there is always a projection in the first pass and only after that is evaluated would it get pushed down to a tablescan. > > > ```python > > > > SET datafusion.optimizer.max_passes = 0; > > > 0 row(s) fetched. > > > Elapsed 0.001 seconds. > > > > > > > create table t (x int, y int); > > > 0 row(s) fetched. > > > Elapsed 0.002 seconds. > > > > > > > explain > > > select * > > > from t; > > > +---------------+-----------------------------------------------+ > > > | plan_type | plan | > > > +---------------+-----------------------------------------------+ > > > | logical_plan | Projection: t.x, t.y | > > > | | TableScan: t | > > > | physical_plan | MemoryExec: partitions=1, partition_sizes=[0] | > > > | | | > > > +---------------+-----------------------------------------------+ > > > 2 row(s) fetched. > > > Elapsed 0.006 seconds. > > > > > > > explain > > > select x > > > from t; > > > +---------------+-----------------------------------------------+ > > > | plan_type | plan | > > > +---------------+-----------------------------------------------+ > > > | logical_plan | Projection: t.x | > > > | | TableScan: t | > > > | physical_plan | MemoryExec: partitions=1, partition_sizes=[0] | > > > | | | > > > +---------------+-----------------------------------------------+ > > > ``` > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So I don't think we have to worry about a plan without a projection as long as a plan with a projection correctly evaluates wildcards. > > > > > > as I previously asked, in your implementation "a system column stops being a system column once it's projected" ? > > If this is correct, then as you said there's no need to add more UTs. > > I have to call out it seems that this behavior is incompatible with Spark. I know whether follow Spark's standard is another problem. but community should be aware of this. > > Do you know the rationale for spark that why it doesn't consider projected columns as normal column? I dont know why it doesn't consider metadata/system column as normal column. from my opnion, normal column can be propagated automatically. but metadata/system column should have another propagate strategy, so distinguish them is a correct way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org