Thank you, Oscar, for the review.

[1] byte <-> string promotion

The spec is unclear about the charset encoding, but it’s probably assumed to be 
UTF-8, which is what `ResolvingDecoder` does.

If writer’s schema is `bytes`, it’s encoded as a json string with ISO_8859_1. 
In my case, I do not know writer’s schema, all I got is a json string; and if 
the reader’s schema is `string`, that json string is returned as is. That is, 
writer’s bytes are converted to reader’s string in ISO_8859_1.

If reader’s schema is `bytes`, I do not know whether it’s `bytes` or `string` 
on writer’s end. I have no way to switch charset encoding based on writer’s 
schema. ISO_8859_1 is chosen to accommodate the more likely case of 
`bytes->bytes`

[2] unions

Indeed, union is a lot of trouble. On one hand, ordering in union may indicate 
user’s preference of types. On the other hand, a lot of users may intuitively 
think of union as unordered, and it would make more sense to choose the most 
specific type to resolve to.

In practice, most people probably only use unions for nullable types, i.e. 
[null, T]. If they have a union of types that aren’t mutually exclusive, they 
are just asking for confusions.

Cheers,
Zhong


From: Oscar Westra van Holthe - Kind <opw...@gmail.com>
Date: Thursday, May 29, 2025 at 8:57 AM
To: dev@avro.apache.org <dev@avro.apache.org>, Zhong Yu <z...@godaddy.com>
Subject: Re: Json decoding & resolving without writer's schema
Hi, The approach used in https: //github. com/zyu-godaddy/avro-json looks ok to 
me, but incomplete. Interpreting strings as ISO_8859_1 is odd (it's been more 
than 2 decades since UTF-8), but not a problem if it becomes configurable. The 
incompleteness
ZjQcmQRYFpfptBannerStart
This Message Is From an Untrusted Sender
You have not previously corresponded with this sender.

ZjQcmQRYFpfptBannerEnd
Hi,

The approach used in 
https://github.com/zyu-godaddy/avro-json<https://urldefense.com/v3/__https:/github.com/zyu-godaddy/avro-json__;!!Hj18uoVe_Lnx!pzBxFKvwwsjgyP1NwuFggWjw72XWhthdyl4rorO3vFbOob_O1stzr-qzsMXSGvdQIKBtCFs3n_Y$>
 looks ok to me, but incomplete.

Interpreting strings as ISO_8859_1 is odd (it's been more than 2 decades since 
UTF-8), but not a problem if it becomes configurable.

The incompleteness comes from union resolution that may not match expectations:
12. (union, j) // for any j
    find the first T in union that rule (T,j) succeeds.

This is actually a tricky case, for several reasons:

  *   Which number is intended in a [long, int, double] union when parsing the 
number 42? The first fit is long , but int is arguably the better fit.
  *   What if we parse a [record1, record2] union, and record1 fits by applying 
default values, but record2 actually contains the properties in the JSON object?
In both cases, the "find the first [...] that succeeds" perfectly solves any 
conflict. However, it does not always match expectations; especially in 
combination with record types with default field values.

IMHO, the best way to solve this dilemma is by disallowing any union that can 
cause such ambiguities in expectations. The simplest option is to disallow any 
union other than a union with null (i.e., a union to make a field optional). A 
more general approach is to disallow any union with multiple number types, with 
both string and bytes, or with multiple record and/or map types.


Kind regards,
Oscar


On Mon, 26 May 2025 at 02:25, z...@godaddy.com.INVALID 
<z...@godaddy.com.invalid> wrote:
Avro’s Json encoding retains enough information of writer's schema such that it 
is possible to decode & resolve without writer's schema; only reader's schema 
is needed.

See 
https://github.com/zyu-godaddy/avro-json<https://urldefense.com/v3/__https:/github.com/zyu-godaddy/avro-json__;!!Hj18uoVe_Lnx!pzBxFKvwwsjgyP1NwuFggWjw72XWhthdyl4rorO3vFbOob_O1stzr-qzsMXSGvdQIKBtCFs3n_Y$>

I know that we do not want to encourage such a practice. Nevertheless, it is an 
interesting observation that this is possible. Appreciated if people want to 
double-check the logic.

Zhong Yu
z...@godaddy.com<mailto:z...@godaddy.com>


--

✉️ Oscar Westra van Holthe - Kind <opw...@apache.org<mailto:opw...@apache.org>>

🌐 
https://github.com/opwvhk/<https://urldefense.com/v3/__https:/github.com/opwvhk/__;!!Hj18uoVe_Lnx!pzBxFKvwwsjgyP1NwuFggWjw72XWhthdyl4rorO3vFbOob_O1stzr-qzsMXSGvdQIKBtKbzcSO4$>

Reply via email to