chenkovsky commented on PR #14057:
URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2644417122

   > > > > for stopping system column propagation, have you tested other 
logical plans e.g. union intersect?
   > > > 
   > > > 
   > > > I have not tried constructing logical plans directly. My thought was 
that even if you do some sort of union you'll still have a projection. For 
example:
   > > > ```sql
   > > > select *, _rowid
   > > > from t1
   > > > union all
   > > > select *, _rowid
   > > > from t2
   > > > ```
   > > > 
   > > > 
   > > >     
   > > >       
   > > >     
   > > > 
   > > >       
   > > >     
   > > > 
   > > >     
   > > >   
   > > > In this case `_rowid` would come out as a "regular" column. If you did:
   > > > ```python
   > > > select *
   > > > from t1
   > > > union all
   > > > select *
   > > > from t2
   > > > ```
   > > > 
   > > > 
   > > >     
   > > >       
   > > >     
   > > > 
   > > >       
   > > >     
   > > > 
   > > >     
   > > >   
   > > > You get no `_rowid` column in the output at all.
   > > > I'm sure you can create funky stuff by constructing a plan without a 
projection, but I don't think that happens in the real world even with the 
optimizer: there is always a projection in the first pass and only after that 
is evaluated would it get pushed down to a tablescan.
   > > > ```python
   > > > > SET datafusion.optimizer.max_passes = 0;
   > > > 0 row(s) fetched.
   > > > Elapsed 0.001 seconds.
   > > > 
   > > > > create table t (x int, y int);
   > > > 0 row(s) fetched.
   > > > Elapsed 0.002 seconds.
   > > > 
   > > > > explain
   > > > select *
   > > > from t;
   > > > +---------------+-----------------------------------------------+
   > > > | plan_type     | plan                                          |
   > > > +---------------+-----------------------------------------------+
   > > > | logical_plan  | Projection: t.x, t.y                          |
   > > > |               |   TableScan: t                                |
   > > > | physical_plan | MemoryExec: partitions=1, partition_sizes=[0] |
   > > > |               |                                               |
   > > > +---------------+-----------------------------------------------+
   > > > 2 row(s) fetched.
   > > > Elapsed 0.006 seconds.
   > > > 
   > > > > explain
   > > > select x
   > > > from t;
   > > > +---------------+-----------------------------------------------+
   > > > | plan_type     | plan                                          |
   > > > +---------------+-----------------------------------------------+
   > > > | logical_plan  | Projection: t.x                               |
   > > > |               |   TableScan: t                                |
   > > > | physical_plan | MemoryExec: partitions=1, partition_sizes=[0] |
   > > > |               |                                               |
   > > > +---------------+-----------------------------------------------+
   > > > ```
   > > > 
   > > > 
   > > >     
   > > >       
   > > >     
   > > > 
   > > >       
   > > >     
   > > > 
   > > >     
   > > >   
   > > > So I don't think we have to worry about a plan without a projection as 
long as a plan with a projection correctly evaluates wildcards.
   > > 
   > > 
   > > as I previously asked, in your implementation "a system column stops 
being a system column once it's projected" ?
   > > If this is correct, then as you said there's no need to add more UTs.
   > > I have to call out it seems that this behavior is incompatible with 
Spark. I know whether follow Spark's standard is another problem. but community 
should be aware of this.
   > 
   > Do you know the rationale for spark that why it doesn't consider projected 
columns as normal column?
   
   I dont know why  it doesn't consider metadata/system column as normal 
column. 
   
   from my opnion, normal column can be propagated automatically. but 
metadata/system column should have another propagate strategy, so distinguish 
them is a correct way.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to