Re: [PR] feat: metadata columns [datafusion]

via GitHub Fri, 07 Feb 2025 02:41:01 -0800


chenkovsky commented on PR #14057:
URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2642557135


   > @chenkovsky You mentioned that the issues of #14362 are 1) duplicated 
field issues 2) HashMap
   > 
   > I think overall datafusion only care about the system columns that are 
generated by datafusion, other system columns from other engine should be 
considered normal columns, but since this is just based on my guess not from 
any practical experience, is there any concern of this assumption?
   > 
   > For HashMap, I don't think it has performance issue since we only check 
boolean from it and we don't need to access it frequently given the field 
should be fixed once created.
   
   @jayzhan211 
   
   datafusion supports loading files, without such feature, of course, we don't 
need to take care about this. but with this feature we have to take care about 
this. 
   
   and could you please also see the discussion in #14362. we have more 
different opinions.
   
   for example, system/metadata column propagation problem. 
   
   for _rowid save load problem. currently, data engineer have to write a with 
clause in #14362 . when using dataframe api, data engineer also have to take 
more care about metadata dict in #14362. I haven't seen such behavior in other 
systems. It adds a lot of burden to data engineer. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: metadata columns [datafusion]

Reply via email to