chenkovsky commented on PR #14057:
URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2643492241

   > > for stopping system column propagation, have you tested other logical 
plans e.g. union intersect?
   > 
   > I have not tried constructing logical plans directly. My thought was that 
even if you do some sort of union you'll still have a projection. For example:
   > 
   > ```sql
   > select *, _rowid
   > from t1
   > union all
   > select *, _rowid
   > from t2
   > ```
   > 
   > In this case `_rowid` would come out as a "regular" column. If you did:
   > 
   > ```python
   > select *
   > from t1
   > union all
   > select *
   > from t2
   > ```
   > 
   > You get no `_rowid` column in the output at all.
   > 
   > I'm sure you can create funky stuff by constructing a plan without a 
projection, but I don't think that happens in the real world even with the 
optimizer: there is always a projection in the first pass and only after that 
is evaluated would it get pushed down to a tablescan.
   > 
   > ```python
   > > SET datafusion.optimizer.max_passes = 0;
   > 0 row(s) fetched.
   > Elapsed 0.001 seconds.
   > 
   > > create table t (x int, y int);
   > 0 row(s) fetched.
   > Elapsed 0.002 seconds.
   > 
   > > explain
   > select *
   > from t;
   > +---------------+-----------------------------------------------+
   > | plan_type     | plan                                          |
   > +---------------+-----------------------------------------------+
   > | logical_plan  | Projection: t.x, t.y                          |
   > |               |   TableScan: t                                |
   > | physical_plan | MemoryExec: partitions=1, partition_sizes=[0] |
   > |               |                                               |
   > +---------------+-----------------------------------------------+
   > 2 row(s) fetched.
   > Elapsed 0.006 seconds.
   > 
   > > explain
   > select x
   > from t;
   > +---------------+-----------------------------------------------+
   > | plan_type     | plan                                          |
   > +---------------+-----------------------------------------------+
   > | logical_plan  | Projection: t.x                               |
   > |               |   TableScan: t                                |
   > | physical_plan | MemoryExec: partitions=1, partition_sizes=[0] |
   > |               |                                               |
   > +---------------+-----------------------------------------------+
   > ```
   > 
   > So I don't think we have to worry about a plan without a projection as 
long as a plan with a projection correctly evaluates wildcards.
   
   as I previously asked, in your implementation "a system column stops being a 
system column once it's projected" ? 
   If this is correct, then as you said there's no need to add more UTs.
   
   I have to call out it seems that this behavior is incompatible with Spark. I 
know whether follow Spark's standard is another problem. but community should 
be aware of this.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to