chenkovsky commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2642557135
> @chenkovsky You mentioned that the issues of #14362 are 1) duplicated field issues 2) HashMap > > I think overall datafusion only care about the system columns that are generated by datafusion, other system columns from other engine should be considered normal columns, but since this is just based on my guess not from any practical experience, is there any concern of this assumption? > > For HashMap, I don't think it has performance issue since we only check boolean from it and we don't need to access it frequently given the field should be fixed once created. @jayzhan211 datafusion supports loading files, without such feature, of course, we don't need to take care about this. but with this feature we have to take care about this. and could you please also see the discussion in #14362. we have more different opinions. for example, system/metadata column propagation problem. for _rowid save load problem. currently, data engineer have to write a with clause in #14362 . when using dataframe api, data engineer also have to take more care about metadata dict in #14362. I haven't seen such behavior in other systems. It adds a lot of burden to data engineer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org