Re: Towards a JSON API for the JDK

Lars Bruun-Hansen Fri, 16 May 2025 07:29:25 -0700

Ahh, as expected some great thoughts have gone into this. :-)

Allowing the ignore of comments when parsing is not "inventing a new
format", IMO. More so because the RFC allows it (section 9) and because
it would be strictly opt-in. So the parser can still be said to be
strictly RFC-8259 while optionally supporting such feature. I'm
speculating the authors of the RFC has thought of it as a form a
pre-processing when they used the word "extension" and that is indeed a
good way to think about it.


In any case my goal was purely comments in the JSON, not about allowing
other forms of leniency, like trailing commas, flexible quotes, etc. I
feel those are in another category and not something that should (ever)
be supported. I can see why someone might ask for that if comments where
allowed to be ignored by the lib. The argument about slippery slope can
be made. I just fear that newbies will try to build their own
pre-processor, using regexp or whatever .. and get it wrong.

I agree with you that JSON5 sneaks in too many odd things.

Keep up the good work.

/Lars


On 16/05/2025 15.17, Brian Goetz wrote:
> At first, we were hopeful that we could jump right to JSON5, which
> appears at first glance to be a strictly lexical, more permissive
> grammar for JSON (supporting comments, trailing commas, more flexible
> quoting, white space, etc.)  If that were actually true, this would
> have been a slam dunk, since all these lexical niceties don't have an
> impact on the parsed results.  And JSON5 has been gaining some
> traction, so it probably could have been a justifiable move to jump
> right to that. 
>
> But then we discovered that JSON5 also sneaks in some semantics, by
> also supporting the exotic numeric values (NaN, infinities, signed
> zero), which now has consequences for "what is a number", the numeric
> representation, the API for unpacking numeric values, etc.  (Having
> multiple parsers is one thing; having multiple parsers that produce
> different semantics is entirely another.)  And inventing a new "JSON5
> but not quite" subset would be doing no one any favors. 
>
> Jsonc seems to be entirely a MS-ecosystem thing; it does not have
> broad enough traction to be the "one grammar" we accept.  So pure
> JSON, as specification-challenged as it is, is the logical, though
> sad, conclusion (for now.)
>
>
>
> On 5/16/2025 9:02 AM, Lars Bruun-Hansen wrote:
>>
>>
>> Great work.
>>
>>
>> I feel the elephant in the room needs to addressed: JSON comments. I
>> haven't tested the proposed lib but I cannot see it mentioned so I'm
>> assuming that comments are not supported.
>>
>> For better or worse, the use of jsonc (JSON with comments) is
>> everywhere in some ecosystems. Unsurprisingly this happens often when
>> JSON is used as a config file format. Looking at you, Microsoft.
>>
>> It would be nice if the JDK's build-in JSON parser at least
>> recognized this.
>>
>> I'm well aware that comments are frowned upon in JSON and not part of
>> neither the spec at www.json.org nor the RFC-8259.
>>
>> Yet, I advocate the JDK JSON library should optionally allow comments
>> to be ignored when PARSING. This should be an opt-in feature that
>> would technically treat comments as whitespace during the parsing
>> process.
>>
>> This would also be in line with what many other parsers do. For
>> example, Jackson has "ALLOW_COMMENTS" feature [1]. Also, by
>> comparison, the build-in parser in the .NET world, known as
>> System.Text.Json, also supports this [2].
>>
>>
>>
>> The "discoverer" of JSON, Douglas Crowford, had this to say [3] on
>> the topic:
>>
>>
>> [QUOTE]
>>
>> I removed comments from JSON because I saw people were using them to
>> hold parsing directives, a practice which would have destroyed
>> interoperability. I know that the lack of comments makes some people
>> sad, but it shouldn't.
>>
>> Suppose you are using JSON to keep configuration files, which you
>> would like to annotate. Go ahead and insert all the comments you
>> like. Then pipe it through JSMin before handing it to your JSON parser.
>>
>>  [/QUOTE]
>>
>>
>> By not having the ability to ignore comments when parsing we would
>> effectively force users to use another parser first or a minifier. I
>> doubt beginners would appreciate that.
>>
>>
>> BTW: The test suite already has tests for comments.
>>
>>
>> /Lars
>>
>>
>> [1]:
>> https://www.javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.19.0/com/fasterxml/jackson/core/JsonParser.Feature.html#ALLOW_COMMENTS
>>
>> [2]:
>> https://learn.microsoft.com/en-us/dotnet/api/system.text.json.jsonreaderoptions?view=net-9.0#properties
>>
>> [3]: https://plus.google.com/118095276221607585885/posts/RK8qyGVaGSr
>>
>>
>>
>>
>> On 16/05/2025 01.44, Ethan McCue wrote:
>>> I present for your consideration the library I made when spiraling
>>> about this problem space a few years ago
>>>
>>> https://github.com/bowbahdoe/json
>>>
>>> https://javadoc.io/doc/dev.mccue/json/latest/dev.mccue.json/dev/mccue/json/package-summary.html
>>>
>>> Notably missing during the design process here were patterns, hence
>>> the JsonDecoder design. I haven't been able to evaluate how patterns
>>> affect that on account of them not being out.
>>>
>>> I will more thoroughly peruse the draft of java.util.json at a later
>>> date, but my initial observations/comments:
>>>
>>> * I am not sure having JsonValue be distinct from Json has value.
>>> * toUntyped feels a little strange to me - the only type information
>>> presumably lost is the sealed-ness of the hierarchy. The interplay
>>> between that and toNumber is also a little unnerving.
>>> * One notion that I found helpful was that a class could be "json
>>> encodable," meaning there is a method to call to obtain a canonical
>>> json representation.
>>>
>>> record Person(String name) implements JsonEncodable {
>>>     @Override
>>>     public Json toJson() {
>>>         return Json.objectBuilder()
>>>             .put("namen", name)
>>>             .build();
>>>     }
>>> }
>>>
>>> Which helper methods like Json#of(List<? extends
>>> JsonEncodable>) could make use of. Json itself (JsonValue in your
>>> prototype) could then have a vacuous implementation.
>>>
>>> * Terminology wise - I went with reading/writing for the actual
>>> parsing/generation of json and encoding/decoding for the mapping of
>>> those representations to/from specific classes. The merits are not
>>> top of mind, just noting the difference. read/write vs
>>> parse/toString+toDisplayString
>>> * One thing I did was make the helper methods in Json null tolerant
>>> and the ones in the specific subtypes like JsonString not. This was
>>> because from what I saw of usages of javax.json/jakarta.json that
>>> nullability was a footgun and correcting for it required changes to
>>> code structure (breaking up builder chains with if (x != null) checks)
>>> * The functionality you want from JsonNumber could be achieved by
>>> making it just extend Number
>>> (https://github.com/bowbahdoe/json/blob/main/src/main/java/dev/mccue/json/JsonNumber.java)
>>> instead of a bespoke toNumber. You need the extra methods to go to
>>> big decimal and co, but it's just an extension to the behavior of
>>> Number at that point.
>>> * JsonObject and JsonArray could implement Map<String, Json> and
>>> List<Json> respectively. This lowers the need for toUntyped() -
>>> since presumably one of the use cases for that is turning the json
>>> tree into something that more generic map/list traversal code can
>>> handle. It also complicates any lazy loading somewhat.
>>> * Assuming patterns can be placed on interfaces, you might want to
>>> consider something similar to JsonDecoder, but with a pattern
>>> instead of a method that throws an exception.
>>>
>>> // Where here fromJson would box up the logic for testing and
>>> extracting from each element in the array.
>>> List<Person> people = array(json, Person::fromJson);
>>>
>>> * I don't think there is sufficient cause for anything to be
>>> non-sealed at this point.
>>> * JsonBoolean and JsonNull do not have reasonable alternative
>>> implementations - as far as I can imagine, maybe i'm wrong - so
>>> maybe those can just be final classes?
>>> * If you seal up the whole hierarchy then its pretty trivial to make
>>> it serializable
>>> (https://github.com/bowbahdoe/json/blob/main/src/main/java/dev/mccue/json/serialization/JsonSerializationProxy.java)
>>>
>>>
>>>
>>>
>>> On Thu, May 15, 2025 at 11:29 PM Remi Forax <fo...@univ-mlv.fr> wrote:
>>>
>>>     Hi Paul,
>>>     yes, not having a simple JSON API in Java is an issue for beginners.
>>>
>>>     It's not clear to me why JsonArray (for example) has to be an
>>>     interface instead of a record ?
>>>
>>>     I understand why Json.parse() only works on String and char[]
>>>     but the API make it too easy to have many performance issues.
>>>     I think you need versions using a Reader and a Path.
>>>     Bonus point, if there is a method walk() that also returns a
>>>     JsonValue but the List/Map inside JsonArray/JsonObject are
>>>     populated lazily.
>>>
>>>     Minor point: Json.toDisplayString() should takes a second
>>>     parameters indicating the number of spaces used for the
>>>     indentation (like JSON.stringify in JS).
>>>
>>>     regards,
>>>     Rémi
>>>
>>>     ----- Original Message -----
>>>     > From: "Paul Sandoz" <paul.san...@oracle.com>
>>>     > To: "core-libs-dev" <core-libs-dev@openjdk.org>
>>>     > Sent: Thursday, May 15, 2025 10:30:42 PM
>>>     > Subject: Towards a JSON API for the JDK
>>>
>>>     > Hi,
>>>     >
>>>     > We would like to share with you our thoughts and plans towards
>>>     a JSON API for
>>>     > the JDK.
>>>     > Please see the document below.
>>>     >
>>>     > -
>>>     >
>>>     > We have had the pleasure of using a clone of this API in some
>>>     experiments we are
>>>     > conducting with
>>>     > ONNX and code reflection [1]. Using the API we were able to
>>>     quickly write code
>>>     > to ingest and convert
>>>     > a JSON document representing ONNX operation schema into
>>>     instances of records
>>>     > modeling the schema
>>>     > (see here [2]).
>>>     >
>>>     > The overall out-of-box experience with such a minimal
>>>     "batteries included” API
>>>     > has so far been positive.
>>>     >
>>>     > Thanks,
>>>     > Paul.
>>>     >
>>>     > [1] https://openjdk.org/projects/babylon/
>>>     > [2]
>>>     >
>>>     
>>> https://github.com/openjdk/babylon/blob/code-reflection/cr-examples/onnx/opgen/src/main/java/oracle/code/onnx/opgen/OpSchemaParser.java#L87
>>>     >
>>>     > # Towards a JSON API for the JDK
>>>     >
>>>     > One of the most common requests for the JDK is an API for
>>>     parsing and generating
>>>     > JSON. While JSON originated as a text-based serialization
>>>     format for JSON
>>>     > objects ("JSON" stands for "JavaScript Object Notation"),
>>>     because of its simple
>>>     > and flexible syntax, it eventually found use outside the
>>>     JavaScript ecosystem as
>>>     > a general data interchange format, such as framework
>>>     configuration files and web
>>>     > service requests/response formats.
>>>     >
>>>     > While the JDK cannot, and should not, provide libraries for
>>>     every conceivable
>>>     > file format or protocol, the JDK philosophy is one of
>>>     "batteries included",
>>>     > which is to say we should be able to write basic programs that
>>>     use common
>>>     > protocols such as HTTP, without having to appeal to third
>>>     party libraries.
>>>     > The Java ecosystem already has plenty of JSON libraries, so
>>>     inclusion in
>>>     > the JDK is largely meant to be a convenience, rather than
>>>     needing to be the "one
>>>     > true" JSON library to meet the needs of all users. Users with
>>>     specific needs
>>>     > are always free to select one of the existing third-party
>>>     libraries.
>>>     >
>>>     > ## Goals and requirements
>>>     >
>>>     > Our primary goal is that the library be simple to use for
>>>     parsing, traversing,
>>>     > and generating conformant JSON documents. Advanced features,
>>>     such as data
>>>     > binding or path-based traversal should be possible to
>>>     implement as layered
>>>     > features, but for simplicity are not included in the core API.
>>>     We adopt a goal
>>>     > that the performance should be "good enough", but where
>>>     performance
>>>     > considerations conflict with simplicity and usability, we will
>>>     choose in favor
>>>     > of the latter.
>>>     >
>>>     > ## API design approach
>>>     >
>>>     > The description of JSON at `https:://json.org
>>>     <http://json.org>` describes a JSON document using
>>>     > the familiar "railroad diagram":
>>>     > ![image](https://www.json.org/img/value.png)
>>>     >
>>>     > This diagram describes an algebraic data type (a sum of
>>>     products), which we
>>>     > model directly with a set of Java interfaces:
>>>     >
>>>     > ```
>>>     > interface JsonValue { }
>>>     > interface JsonArray extends JsonValue { List<JsonValue>
>>>     values(); }
>>>     > interface JsonObject extends JsonValue { Map<String,
>>>     JsonValue> members(); }
>>>     > interface JsonNumber extends JsonValue { Number toNumber(); }
>>>     > interface JsonString extends JsonValue { String value(); }
>>>     > interface JsonBoolean extends JsonValue  { boolean value(); }
>>>     > interface JsonNull extends JsonValue { }
>>>     > ```
>>>     >
>>>     > These interfaces have (hidden) companion implementation
>>>     classes that admit
>>>     > greater flexibility of implementation than modeling them
>>>     directly with records
>>>     > would permit.
>>>     > Further, these interfaces are unsealed. We compromise on the
>>>     sealed sum of
>>>     > products to enable
>>>     > alternative implementations, for example to support
>>>     alternative formats that
>>>     > encode the same information in a JSON document but in a more
>>>     efficient form than
>>>     > text.
>>>     >
>>>     > The API has static methods for parsing strings into a
>>>     `JsonValue`, conversion to
>>>     > and from purely untyped representations (lists and maps), and
>>>     factory methods
>>>     > for building JSON documents. We apply composition
>>>     consistently, e.g, a
>>>     > JsonString has a string, a JsonObject has a map of string to
>>>     JsonValue, as
>>>     > opposed to extension for structural JSON values.
>>>     >
>>>     > It turns out that this simple API is almost all we need for
>>>     traversal. It gives
>>>     > us an immutable representation of a document, and we can use
>>>     pattern matching to
>>>     > answer the myriad questions that will come up (Does this
>>>     object have key X? Does
>>>     > it map to a number? Is that number representable as an
>>>     integer?) when going
>>>     > from an untyped format like JSON to a more strongly typed
>>>     domain model.
>>>     > Given a simple document like:
>>>     >
>>>     > ```
>>>     >    {
>>>     >        "name": "John”,
>>>     >        "age": 30
>>>     >    }
>>>     > ```
>>>     >
>>>     > we can parse and traverse the document as follows:
>>>     >
>>>     > ```
>>>     > JsonValue doc = Json.parse(inputString);
>>>     > if (doc instanceof JsonObject o
>>>     >    && o.members().get("name") instanceof JsonString s
>>>     >    && s.value() instanceof String name
>>>     >    && o.members().get("age") instanceof JsonNumber n
>>>     >    && n.toNumber() instanceof Long l && l instanceof int age) {
>>>     >            // use "name" and "age"
>>>     >        }
>>>     > ```
>>>     >
>>>     > Later, when the language acquires the ability to expose
>>>     deconstruction patterns
>>>     > for arbitrary interfaces (similar to today's record patterns, see
>>>     >
>>>     
>>> https://openjdk.org/projects/amber/design-notes/patterns/towards-member-patterns),
>>>     > this will be simplifiable to:
>>>     >
>>>     > ```
>>>     > JsonValue doc = Json.parse(inputString);
>>>     > if (doc instanceof JsonObject(var members)
>>>     >    && members.get("name") instanceof JsonString(String name)
>>>     >    && members.get("age") instanceof JsonNumber(int age)) {
>>>     >            // use "name" and "age"
>>>     >        }
>>>     > ```
>>>     >
>>>     > So, overtime, as more pattern matching features are introduced
>>>     we anticipate
>>>     > improved use of the API. This is a primary reason why the API
>>>     is so minimal.
>>>     > Convenience methods we add today, such as a method that
>>>     accesses a JSON
>>>     > object component as say a JSON string or throws an exception,
>>>     will become
>>>     > redundant in the future.
>>>     >
>>>     > ## JSON numbers
>>>     >
>>>     > The specification of JSON number makes no explicit distinction
>>>     between integral
>>>     > and decimal numbers, nor specifies limits on the size of those
>>>     numbers.
>>>     > This is a common source of interoperability issues when
>>>     consuming JSON
>>>     > documents. Generally users cannot always but often do assume
>>>     JSON numbers are
>>>     > parsable, without loss of precision, to IEEE double-precision
>>>     floating point
>>>     > numbers or 32-bit signed integers.
>>>     >
>>>     > In this respect the API provides three means to operate on the
>>>     JSON number,
>>>     > giving the user full control:
>>>     >
>>>     > 1. Underlying string representation can be obtained, if
>>>     preserving syntactic
>>>     >   details such as leading or trailing zeros is important.
>>>     > 2. The string representation can be parsed to an instance of
>>>     `BigDecimal`, using
>>>     >   `toBigDecimal` if preserving decimal numbers is important.
>>>     > 3. The string representation can be parsed into an instance of
>>>     `Long`, `Double`,
>>>     >   `BigInteger`, or `BigDecimal`, using `toNumber`. The result
>>>     of this method
>>>     >   depends on how the representation can be parsed, possibly
>>>     losing precision,
>>>     >   choosing a suitably convenient numeric type that can then be
>>>     pattern
>>>     >   matched on.
>>>     >
>>>     > Primitive pattern matching will help as will further pattern
>>>     matching features
>>>     > enabling the user to partially match.
>>>     >
>>>     > ## Prototype implementation
>>>     >
>>>     > The prototype implementation is currently located into the JDK
>>>     sandbox
>>>     > repository
>>>     > under the `json` branch, see
>>>     > here
>>>     >
>>>     
>>> https://github.com/openjdk/jdk-sandbox/tree/json/src/java.base/share/classes/java/util/json
>>>     > The prototype API javadoc generated from the repository is
>>>     also available at
>>>     >
>>>     
>>> https://cr.openjdk.org/~naoto/json/javadoc/api/java.base/java/util/json/package-summary.html
>>>     >
>>>     > ### Testing and conformance
>>>     >
>>>     > The prototype implementation passes all conformance test cases
>>>     but two,
>>>     > available
>>>     > on https://github.com/nst/JSONTestSuite. The two exceptions
>>>     are the ones which
>>>     > the
>>>     > prototype specifically prohibits, i.e, duplicated names in
>>>     JSON objects
>>>     >
>>>     
>>> (https://cr.openjdk.org/~naoto/json/conformance/results/parsing.html#35).
>>>     >
>>>     > ### Performance
>>>     >
>>>     > Our main focus so far has been on the API design and a functional
>>>     > implementation.
>>>     > Hence, there has been less focus on performance even though we
>>>     know there are a
>>>     > number of performance enhancements we can make eventually.
>>>     > We are reasonably happy with the current performance. The
>>>     > implementation performs well when compared to other JSON
>>>     implementations
>>>     > parsing from string instances and traversing documents.
>>>     >
>>>     > An example of where we may choose simplicity and usability
>>>     over performance
>>>     > is the rejection of JSON documents containing objects that in
>>>     turn contain
>>>     > members
>>>     > with duplicate names. That may increase the cost of parsing,
>>>     but simplifies the
>>>     > user
>>>     > experience for the majority of cases since if we reasonably
>>>     assume JsonObjects
>>>     > are
>>>     > map-like, what should the user do with such members, pick one
>>>     the last one?
>>>     > merge
>>>     > the values? or reject?
>>>     >
>>>     > ## A JSON JEP?
>>>     >
>>>     > We plan to draft JEP when we are ready. Attentive readers will
>>>     observe that
>>>     > a JEP already exists, JEP 198: Light-Weight JSON API
>>>     > (https://openjdk.org/jeps/198). We will
>>>     > either update this JEP, or withdraw it and draft a new one.
>>>
>

Re: Towards a JSON API for the JDK

Reply via email to