Hi Selcuk, that's a good point. I don't think the spec doesn't discuss
how a null in an array should be interpreted. I had assumed that it
would be an invalid state (and probably should have said so
explicitly), but I agree that we could specify that it should be
interpreted as a JSON null.

Thanks,

David

On Wed, Aug 14, 2024 at 5:11 PM Selcuk Aya
<selcuk....@snowflake.com.invalid> wrote:
>
> Hi David, just to clarify, I think we can shred arrays with json nulls 
> without having to use untyped_value column, is this correct?
>
> Selcuk
>
> On Wed, Aug 14, 2024 at 11:31 PM David Cashman 
> <david.cash...@databricks.com.invalid> wrote:
>>
>> Hi Nick,
>>
>> Your understanding is correct. The null in the Variant spec is meant
>> to encode a JSON null. A row-level value can be SQL null as in any
>> nullable column, but within a Variant value, there is only the
>> Variant-encoded null (i.e. JSON null). Some of the Spark expressions
>> (e.g. cast to a non-Variant type) implicitly convert Variant null to
>> SQL null.
>>
>> In the current version of the shredding spec, the intent is for null
>> in a shredded column to represent a missing field. Variant null would
>> need to be encoded in the "untyped_value" column. If JSON null is
>> expected to be common, it might be reasonable to specify a mechanism
>> to shred it (e.g. as a boolean column with a unique name).
>>
>> Thanks,
>>
>> David
>>
>> On Wed, Aug 14, 2024 at 11:38 AM Nick Riasanovsky <n...@bodo.ai> wrote:
>> >
>> > Hello everyone,
>> >
>> > As it seems the Variant spec decisions are nearly finalized, I would like 
>> > to ask a clarifying question regarding the difference between SQL Null 
>> > (missing) and JSON Null. Reading through the Spark specification, source 
>> > code, and also experimenting with Spark locally, it seems that the Variant 
>> > type handles the difference between SQL Null and JSON Null at a row level 
>> > and can successfully maintain this difference. However, it seems to me 
>> > that it's never possible for contents of a variant value to contain a SQL 
>> > Null value (only a JSON NULL), such as array(1, missing, 2). Since a 
>> > variant value is recursive, there doesn't appear to be any way to encode a 
>> > SQL NULL in the actual Variant value.
>> >
>> > If anyone has any insights that can confirm or reject my understanding, 
>> > I'd greatly appreciate it. I'm trying to become more familiar with the 
>> > Variant encoded and this seemed like it could be a potential "gotcha" once 
>> > column shredding is supported.
>> >
>> > Thanks,
>> > Nick Riasanovsky

Reply via email to