Re: Ambiguity in IS JSON description and logic

2025-11-18 Thread r314tive
Hi,

Based on this discussion, I’ve prepared a small documentation patch that
tries to clarify the behavior of the IS JSON ... UNIQUE KEYS clause.

The patch explains that:
- the WITH/WITHOUT UNIQUE KEYS clause controls an additional test on
  duplicate object keys,
- WITH UNIQUE KEYS requires that no object contained in the expression
  (recursively) have duplicate keys, and
- WITHOUT UNIQUE KEYS, which is also the default, just disables this
  additional test and does not require duplicates to be present.

Patch is attached.

Regards,
Ilmar Yunusov

вт, 18 нояб. 2025 г. в 16:50, David G. Johnston :

> On Thu, Mar 20, 2025 at 7:22 AM Kirk Parker  wrote:
>
>> On Thu, Mar 20, 2025 at 7:08 AM Kirk Parker  wrote:
>>
>>>
>>> On Thu, Mar 20, 2025 at 2:46 AM PG Doc comments form <
>>> [email protected]> wrote:
>>>
 The following documentation comment has been logged on the website:

 Page: https://www.postgresql.org/docs/17/functions-json.html
 Description:

 On the manual page
 https://www.postgresql.org/docs/current/functions-json.html, in the
 Table
 9.48. "SQL/JSON Testing Functions" there is a description of IS JSON. It
 includes the next sentence: "If WITH UNIQUE KEYS is specified, then any
 object in the expression is also tested to see if it has duplicate
 keys."
 And such text is ambiguous, because the term "object" has certain
 meaning
 regarding json format. In reality the option WITH UNIQUE KEYS allows to
 check for duplicated keys any array element not object. For objects,
 both
 WITH UNIQUE KEYS and WITHOUT UNIQUE KEYS return false, and both IS JSON
 ARRAY WITH UNIQUE KEY and IS JSON ARRAY WITHOUT UNIQUE KEY return true
 (it
 is at the same time with and without unique values, how it is
 possible?),
 i.e. it works the same as just IS JSON ARRAY. The example code that
 confirms
 my reasoning:
 SELECT
 js.vl AS  "tested str",


 js.vl IS JSON OBJECT WITH UNIQUE KEYS AS  ".. object w. UQ
 keys",

 js.vl IS JSON OBJECT WITHOUT UNIQUE KEYS AS   ".. object w/o UQ
 keys",
 js.vl IS JSON ARRAY WITH UNIQUE KEYS AS   ".. array w. UQ
 keys",

 js.vl IS JSON ARRAY WITHOUT UNIQUE KEYS AS".. array w/o UQ
 keys",
 js.vl IS JSON ARRAY ".. array"
 FROM (VALUES ('{{"a": "a1"}, {"a": "a2"}}'), ('[{"a": "a1"}, {"a":
 "a2"}]'),
 ('["a", "a"]')) AS js(vl);

 I'm not sure what should be the right logic for this option, for me it
 looks
 now the same as simple IS JSON ARRAY without any UNIQUE KEY option, but
 if
 we use an option it should be either true for WITH UNIQUE KEYS or
 WITHOUT
 UNIQUE KEYS but not for both at the same time. But anyway the sentence I
 showed above should contain "array" instead of "object" because for
 objects
 it returns false independently of applied option.  I tested it on
 "PostgreSQL 17.0 on x86_64-windows, compiled by msvc-19.41.34120,
 64-bit".

>>>
>>> First, WITHOUT UNIQUE KEYS does not mean "confirm that there are
>>> duplicate keys", it's just a way of stating the default explicitly. In
>>> other words it means "w/o testing for duplicate keys". Thus IS JSON OBJECT
>>> and IS JSON OBJECT WITHOUT UNIQUE KEYS will both always return identical
>>> results on the same JSON expression.
>>>
>>> Secondly, the UNIQUE test is recursive; for objects maybe the meaning is
>>> intuitive, but for JSON arrays -- which don't have any concept of keys;
>>> JSON arrays are just ordered lists -- it means "does this array contain any
>>> embedded objects with duplicate keys".
>>>
>>> See:
>>>
>>>  SELECT js,
>>>   js IS JSON "json?",
>>>   js IS JSON OBJECT "object?",
>>>   js IS JSON OBJECT WITH UNIQUE KEYS "object w. UK?",
>>>   js IS JSON OBJECT WITHOUT UNIQUE KEYS "object w/o UK?",
>>>   js IS JSON ARRAY "array?",
>>>   js IS JSON ARRAY WITH UNIQUE KEYS "array w. UK?",
>>>   js IS JSON ARRAY WITHOUT UNIQUE KEYS "array w/o UK?"
>>> FROM (VALUES
>>>   ('[{"a":1},{"b":2,"b":3}]'),   -- expect t for array, array w/o UK
>>>   ('[{"a":1},{"b":2,"c":3}]'),   -- expect t for ALL array tests
>>>   ('{"b":2,"b":3}'), -- expect t for object, object w/o
>>> UK
>>>   ('{"c":2,"d":3}'), -- expect t for ALL object tests
>>>   ('{"c":2,"d":{ "e": 0, "e": 1}}'), -- WITH UNIQUE is recursive for
>>> nested objects
>>>   ('{"c":2,"d":{ "e": 0, "f": {"g":1,"g":2}}}'), -- no matter how deep
>>>   ('[{"a":1},{"b":2,"c":{"d":1, "d":2}}]') -- and also tests arrays
>>> recursively for embedded objecs
>>> ) foo(js);
>>>
>>>
>>> A couple of side notes:
>>>
>>> 1. Your first data example is not JSON at all.  It's helpful for this
>>> kind of test to include a plain IS JSON column, since any of the IS JSON X
>>> tests can fail for two reasons: (a) it's not JSON, or (b) it is JSON but
>>> it's n

Re: Ambiguity in IS JSON description and logic

2025-11-18 Thread Laurenz Albe
On Tue, 2025-11-18 at 16:55 +0500, r314tive wrote:
> Based on this discussion, I’ve prepared a small documentation patch that
> tries to clarify the behavior of the IS JSON ... UNIQUE KEYS clause.
> 
> The patch explains that:
> - the WITH/WITHOUT UNIQUE KEYS clause controls an additional test on
>   duplicate object keys,
> - WITH UNIQUE KEYS requires that no object contained in the expression
>   (recursively) have duplicate keys, and
> - WITHOUT UNIQUE KEYS, which is also the default, just disables this
>   additional test and does not require duplicates to be present.
> 
> Patch is attached.

I don't think that the patch is really an improvement.  To me, the original
wording contains all the important information.  The patch it somewhat
repetetive and contains unnecessary detail.  For example, I can see from
the syntax diagram that I can use WITHOUT UNIQUE KEYS, and I have no
trouble guessing what that means.

Perhaps it would be enough to add something like "*recursively* checks for
duplicate keys" or "checks for duplicate keys *on any nesting level*".

Yours,
Laurenz Albe