Thank you, Oscar, for the review. [1] byte <-> string promotion
The spec is unclear about the charset encoding, but it’s probably assumed to be UTF-8, which is what `ResolvingDecoder` does. If writer’s schema is `bytes`, it’s encoded as a json string with ISO_8859_1. In my case, I do not know writer’s schema, all I got is a json string; and if the reader’s schema is `string`, that json string is returned as is. That is, writer’s bytes are converted to reader’s string in ISO_8859_1. If reader’s schema is `bytes`, I do not know whether it’s `bytes` or `string` on writer’s end. I have no way to switch charset encoding based on writer’s schema. ISO_8859_1 is chosen to accommodate the more likely case of `bytes->bytes` [2] unions Indeed, union is a lot of trouble. On one hand, ordering in union may indicate user’s preference of types. On the other hand, a lot of users may intuitively think of union as unordered, and it would make more sense to choose the most specific type to resolve to. In practice, most people probably only use unions for nullable types, i.e. [null, T]. If they have a union of types that aren’t mutually exclusive, they are just asking for confusions. Cheers, Zhong From: Oscar Westra van Holthe - Kind <opw...@gmail.com> Date: Thursday, May 29, 2025 at 8:57 AM To: dev@avro.apache.org <dev@avro.apache.org>, Zhong Yu <z...@godaddy.com> Subject: Re: Json decoding & resolving without writer's schema Hi, The approach used in https: //github. com/zyu-godaddy/avro-json looks ok to me, but incomplete. Interpreting strings as ISO_8859_1 is odd (it's been more than 2 decades since UTF-8), but not a problem if it becomes configurable. The incompleteness ZjQcmQRYFpfptBannerStart This Message Is From an Untrusted Sender You have not previously corresponded with this sender. ZjQcmQRYFpfptBannerEnd Hi, The approach used in https://github.com/zyu-godaddy/avro-json<https://urldefense.com/v3/__https:/github.com/zyu-godaddy/avro-json__;!!Hj18uoVe_Lnx!pzBxFKvwwsjgyP1NwuFggWjw72XWhthdyl4rorO3vFbOob_O1stzr-qzsMXSGvdQIKBtCFs3n_Y$> looks ok to me, but incomplete. Interpreting strings as ISO_8859_1 is odd (it's been more than 2 decades since UTF-8), but not a problem if it becomes configurable. The incompleteness comes from union resolution that may not match expectations: 12. (union, j) // for any j find the first T in union that rule (T,j) succeeds. This is actually a tricky case, for several reasons: * Which number is intended in a [long, int, double] union when parsing the number 42? The first fit is long , but int is arguably the better fit. * What if we parse a [record1, record2] union, and record1 fits by applying default values, but record2 actually contains the properties in the JSON object? In both cases, the "find the first [...] that succeeds" perfectly solves any conflict. However, it does not always match expectations; especially in combination with record types with default field values. IMHO, the best way to solve this dilemma is by disallowing any union that can cause such ambiguities in expectations. The simplest option is to disallow any union other than a union with null (i.e., a union to make a field optional). A more general approach is to disallow any union with multiple number types, with both string and bytes, or with multiple record and/or map types. Kind regards, Oscar On Mon, 26 May 2025 at 02:25, z...@godaddy.com.INVALID <z...@godaddy.com.invalid> wrote: Avro’s Json encoding retains enough information of writer's schema such that it is possible to decode & resolve without writer's schema; only reader's schema is needed. See https://github.com/zyu-godaddy/avro-json<https://urldefense.com/v3/__https:/github.com/zyu-godaddy/avro-json__;!!Hj18uoVe_Lnx!pzBxFKvwwsjgyP1NwuFggWjw72XWhthdyl4rorO3vFbOob_O1stzr-qzsMXSGvdQIKBtCFs3n_Y$> I know that we do not want to encourage such a practice. Nevertheless, it is an interesting observation that this is possible. Appreciated if people want to double-check the logic. Zhong Yu z...@godaddy.com<mailto:z...@godaddy.com> -- ✉️ Oscar Westra van Holthe - Kind <opw...@apache.org<mailto:opw...@apache.org>> 🌐 https://github.com/opwvhk/<https://urldefense.com/v3/__https:/github.com/opwvhk/__;!!Hj18uoVe_Lnx!pzBxFKvwwsjgyP1NwuFggWjw72XWhthdyl4rorO3vFbOob_O1stzr-qzsMXSGvdQIKBtKbzcSO4$>