To follow up on the idea of multiple physical types for a shredded column, we had a discussion internally about this and I think it's pretty reasonable to add that later if we end up needing it. I agree that there's no pressing need to add that complication to the spec.
On Wed, Aug 14, 2024 at 2:41 PM David Cashman <david.cash...@databricks.com> wrote: > Hi Selcuk, that's a good point. I don't think the spec doesn't discuss > how a null in an array should be interpreted. I had assumed that it > would be an invalid state (and probably should have said so > explicitly), but I agree that we could specify that it should be > interpreted as a JSON null. > > Thanks, > > David > > On Wed, Aug 14, 2024 at 5:11 PM Selcuk Aya > <selcuk....@snowflake.com.invalid> wrote: > > > > Hi David, just to clarify, I think we can shred arrays with json nulls > without having to use untyped_value column, is this correct? > > > > Selcuk > > > > On Wed, Aug 14, 2024 at 11:31 PM David Cashman < > david.cash...@databricks.com.invalid> wrote: > >> > >> Hi Nick, > >> > >> Your understanding is correct. The null in the Variant spec is meant > >> to encode a JSON null. A row-level value can be SQL null as in any > >> nullable column, but within a Variant value, there is only the > >> Variant-encoded null (i.e. JSON null). Some of the Spark expressions > >> (e.g. cast to a non-Variant type) implicitly convert Variant null to > >> SQL null. > >> > >> In the current version of the shredding spec, the intent is for null > >> in a shredded column to represent a missing field. Variant null would > >> need to be encoded in the "untyped_value" column. If JSON null is > >> expected to be common, it might be reasonable to specify a mechanism > >> to shred it (e.g. as a boolean column with a unique name). > >> > >> Thanks, > >> > >> David > >> > >> On Wed, Aug 14, 2024 at 11:38 AM Nick Riasanovsky <n...@bodo.ai> wrote: > >> > > >> > Hello everyone, > >> > > >> > As it seems the Variant spec decisions are nearly finalized, I would > like to ask a clarifying question regarding the difference between SQL Null > (missing) and JSON Null. Reading through the Spark specification, source > code, and also experimenting with Spark locally, it seems that the Variant > type handles the difference between SQL Null and JSON Null at a row level > and can successfully maintain this difference. However, it seems to me that > it's never possible for contents of a variant value to contain a SQL Null > value (only a JSON NULL), such as array(1, missing, 2). Since a variant > value is recursive, there doesn't appear to be any way to encode a SQL NULL > in the actual Variant value. > >> > > >> > If anyone has any insights that can confirm or reject my > understanding, I'd greatly appreciate it. I'm trying to become more > familiar with the Variant encoded and this seemed like it could be a > potential "gotcha" once column shredding is supported. > >> > > >> > Thanks, > >> > Nick Riasanovsky > -- Ryan Blue Databricks