szehon-ho commented on PR #54488:
URL: https://github.com/apache/spark/pull/54488#issuecomment-3993146715
> meant to reply earlier to this:
No worries, i took another look and the test coverage looks good in the
latest pr.
> non partitioned tables: for REPLACE WHERE, the only reason I'm using
partitioned tables is because the InMemoryCatalog doesn't support partial
overwrite on non-partitioned tables. It didn't seem necessary to implement
this, schema evolution doesn't apply to partition columns (you can't add a new
partition columns, and partition columns can't be nested types so can't use
struct evolution)
Yea makes sense, i think it can be a follow up.
> Constraints: not sure exactly what you had in mind there
Sorry maybe we can ignore that, i was thinking the other case, it was more
applicable for source having fewer columns case (checking whether putting NULLS
violate the constraints, but you are right and it doesnt apply here.
> For the dataframe API: Spark doesn't actually provide a way to enable
schema evolution via the dataframe API, so I've left that out for now. Adding
it would require more discussions: Delta (and Iceberg) do it via an a writer
option mergeSchema, but Spark doesn't really say anything about that.
Yes, also realize after I typed it, forgot that there is no mergeSchema
option for normal inserts. It'd be nice at some point, but definitely a follow
up
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]