As an update to my character arc, I documented and wrote up an explanation for the prototype library I was working on.[1]
And I've gotten a good deal of feedback on reddit[2] and in private. I think its relevant to the conversation here in the sense of - There are more of rzwitserloot's objections to read on the general concept JSON as a built in.[3] - There are a lot of well reasoned objections to the manner in which I am interpreting a JSON tree, as well as objections to the usage of a tree as the core. JEP 198's current writeup (which I know is subject to a rewrite/retraction) presumes that an immutable tree would be the core data structure. - The peanut gallery might be interested in a "base" to implement whatever their take on an API should be. For that last category, I have a method-handle proxy written up for those who want to try the "push parser into a pull parser" transformation I alluded to in my first email of this thread. [1]: https://mccue.dev/pages/2-26-23-json [2]: https://www.reddit.com/r/java/comments/11cyoh1/please_try_my_json_library/ [3]: Including one that reddit took down, but can be seen through reveddit https://www.reveddit.com/y/rzwitserloot/?after=t1_jacpsj6&limit=1&sort=new&show=t1_jaa3x0q&removal_status=all On Fri, Dec 16, 2022 at 6:23 PM Ethan McCue <et...@mccue.dev> wrote: > Sidenote about "Project Galahad" - I know Graal uses json for a few things > including a reflection-config.json. Food for thought. > > > the java.util.log experiment shows that trying to ‘core-librarize’ needs > that the community at large already fulfills with third party deps isn’t a > good move, > > I, personally, do not have much historical context for java.util.log. What > feels distinct about providing a JSON api is that > logging is an implicitly global thing. If a JSON api doesn't fill all > ecosystem niches, multiple can be used alongside > each other. > > > The root issue with JSON is that you just can’t tell how to interpret > any given JSON token > > The point where this could be an issue is numbers. Once something is > identified as a number we can > > 1. Parse it immediately. Using a long and falling back to a BigInteger. > For decimals its harder to know > whether to use a double or BigDecimal internally. In the library I've been > copy pasting from to build > a prototype that last one is an explicit option and it defaults to doubles > for the whole parse. > 2. Store the string and parse it upon request. We can still model it as a > Json.Number, but the > work of interpreting is deferred. > > But in general, making a tree of json values doesn't particularly affect > our ability to interpret it > in a certain way. That interpretation is just positional. That's just as > true as when making assertions > in the form of class structure and field types as it is when making > assertions in the form of code.[2] > > record Thing(Instant a) {} > > // vs. > > Decoder.field(json, "a", a -> Instant.ofEpochSecond(Decoder.long_(a))) > > If anything, using a named type as a lookup key for a deserialization > function is the less obvious > way to do this. > > > I’m not sure how to square this circle > > I don’t like the idea of shipping a non-data-binding JSON API in the > core libs. > > I think the way to cube this rhombus is to find ways to like the idea of a > non-data-binding JSON API. ¯\_(ツ)_/¯ > > My personal journey with that is reaching its terminus here I think. > > Look on the bright side though - there are legit upsides to explicit tree > plucking! > > Yeah, the friction per field is slightly higher, but the relative > friction of custom types, or multiple construction methods for a > particular type, or maintaining compatibility with > legacy representations, or even just handling a top level list of things - > its much lower. > > And all that complexity - that an instant is made by looking for a long or > that it is parsed from a string in a > particular format - it lives in Java code you can see, touch, feel and > taste. > > I know "nobody does this"[2] but it's not that bad, actually. > > [1]: I do apologize for the code sketches consistently being "what I think > an interaction with a tree api should look like." > That is what I have been thinking about for a while so it's hard to resist. > [2]: https://youtu.be/dOgfWXw9VrI?t=1225 > > On Thu, Dec 15, 2022 at 6:34 PM Ethan McCue <et...@mccue.dev> wrote: > >> > are pure JSON parsers really the go-to for most people? >> >> Depends on what you mean by JSON parsers and it depends on what you mean >> by people. >> >> To the best of my knowledge, both python and Javascript do not include >> streaming, databinding, or path navigation capabilities in their json >> parsers. >> >> >> On Thu, Dec 15, 2022 at 6:26 PM Ethan McCue <et...@mccue.dev> wrote: >> >>> > The 95%+ use case for working with JSON for your average java coder is >>> best done with data binding. >>> >>> To be brave yet controversial: I'm not sure this is neccesarily true. >>> >>> I will elaborate and respond to the other points after a hot cocoa, but >>> the last point is part of why I think that tree-crawling needs _something_ >>> better as an API to fit the bill. >>> >>> With my sketch that set of requirements would be represented as >>> >>> record Thing( >>> List<Long> xs >>> ) { >>> static Thing fromJson(Json json) >>> var defaultList = List.of(0L); >>> return new Thing(Decoder.optionalNullableField( >>> json >>> "xs", >>> Decoder.oneOf( >>> Decoder.array(Decoder.oneOf( >>> x -> Long.parseLong(Decoder.string(x)), >>> Decoder::long >>> )) >>> Decoder.null_(defaultList), >>> x -> List.of(Decoder.long_(x)) >>> ), >>> defaultList >>> )); >>> ) >>> } >>> >>> Which isn't amazing at first glance, but also >>> >>> {} >>> {"xs": null} >>> {"xs": 5} >>> {"xs": [5]} {"xs": ["5"]} >>> {"xs": [1, "2", "3"]} >>> >>> these are some wildly varied structures. You could make a solid argument >>> that something which silently treats these all the same is >>> a bad API for all the reasons you would consider it a good one. >>> >>> On Thu, Dec 15, 2022 at 6:18 PM Johannes Lichtenberger < >>> lichtenberger.johan...@gmail.com> wrote: >>> >>>> I'll have to read the whole thing, but are pure JSON parsers really the >>>> go-to for most people? I'm a big advocate of providing also something >>>> similar to XPath/XQuery and that's IMHO JSONiq (90% XQuery). I might be >>>> biased, of course, as I'm working on Brackit[1] in my spare time (which is >>>> also a query compiler and intended to be used with proven optimizations by >>>> document stores / JSON stores), but also can be used as an in-memory query >>>> engine. >>>> >>>> kind regards >>>> Johannes >>>> >>>> [1] https://github.com/sirixdb/brackit >>>> >>>> Am Do., 15. Dez. 2022 um 23:03 Uhr schrieb Reinier Zwitserloot < >>>> rein...@zwitserloot.com>: >>>> >>>>> A recent Advent-of-Code puzzle also made me double check the support >>>>> of JSON in the java core libs and it is indeed a curious situation that >>>>> the >>>>> java core libs don’t cater to it particularly well. >>>>> >>>>> However, I’m not seeing an easy way forward to try to close this hole >>>>> in the core library offerings. >>>>> >>>>> If you need to stream huge swaths of JSON, generally there’s a clear >>>>> unit size that you can just databind. Something like: >>>>> >>>>> String jsonStr = """ { "version": 5, "data": [ >>>>> -- 1 million relatively small records in this list -- >>>>> ] } """; >>>>> >>>>> >>>>> The usual swath of JSON parsers tend to support this (giving you a >>>>> stream of java instances created by databinding those small records one by >>>>> one), or if not, the best move forward is presumably to file a pull >>>>> request >>>>> with those projects; the java.util.log experiment shows that trying to >>>>> ‘core-librarize’ needs that the community at large already fulfills with >>>>> third party deps isn’t a good move, especially if the core library variant >>>>> tries to oversimplify to avoid the trap of being too opinionated (which >>>>> core libs shouldn’t be). In other words, the need for ’stream this JSON >>>>> for >>>>> me’ style APIs is even more exotic that Ethan is suggesting. >>>>> >>>>> I see a fundamental problem here: >>>>> >>>>> >>>>> - The 95%+ use case for working with JSON for your average java >>>>> coder is best done with data binding. >>>>> - core libs doesn’t want to provide it, partly because it’s got a >>>>> large design space, partly because the field’s already covered by GSON >>>>> and >>>>> Jackson-json; java.util.log proves this doesn’t work. At least, I >>>>> gather >>>>> that’s what Ethan thinks and I agree with this assessment. >>>>> - A language that claims to be “batteries included” that doesn’t >>>>> ship with a JSON parser in this era is dubious, to say the least. >>>>> >>>>> >>>>> I’m not sure how to square this circle. Hence it feels like core-libs >>>>> needs to hold some more fundamental debates first: >>>>> >>>>> >>>>> - Maybe it’s time to state in a more or less official decree that >>>>> well-established, large design space jobs will remain the purview of >>>>> dependencies no matter how popular it has, unless being part of the >>>>> core-libs adds something more fundamental the third party deps cannot >>>>> bring >>>>> to the table (such as language integration), or the community >>>>> standardizes >>>>> on a single library (JSR310’s story, more or less). JSON parsing would >>>>> qualify as ‘well-established’ (GSON and Jackson) and ‘large design >>>>> space’ >>>>> as Ethan pointed out. >>>>> - Given that 99% of java projects, even really simple ones, start >>>>> with maven/gradle and a list of deps, is that really a problem? >>>>> >>>>> >>>>> I’m honestly not sure what the right answer is. On one hand, the npm >>>>> ecosystem seems to be doing very well even though their ‘batteries >>>>> included’ situation is an utter shambles. Then again, the notion that your >>>>> average nodejs project includes 10x+ more dependencies than other >>>>> languages >>>>> is likely a significant part of the security clown fiesta going on over >>>>> there as far as 3rd party deps is concerned, so by no means should java >>>>> just blindly emulate their solutions. >>>>> >>>>> I don’t like the idea of shipping a non-data-binding JSON API in the >>>>> core libs. The root issue with JSON is that you just can’t tell how to >>>>> interpret any given JSON token, because that’s not how JSON is used in >>>>> practice. What does 5 mean? Could be that I’m to take that as an int, >>>>> or as a double, or perhaps even as a j.t.Instant (epoch-millis), and >>>>> defaulting behaviour (similar to j.u.Map’s .getOrDefault is *very* >>>>> convenient to parse most JSON out there in the real world - omitting k/v >>>>> pairs whose value is still on default is very common). That’s what makes >>>>> those databind libraries so enticing: Instead of trying to pattern match >>>>> my >>>>> way into this behaviour: >>>>> >>>>> >>>>> - If the element isn’t there at all or null, give me a >>>>> list-of-longs with a single 0 in it. >>>>> - If the element is a number, make me a list-of-longs with 1 value >>>>> in it, that is that number, as long. >>>>> - If the element is a string, parse it into a long, then get me a >>>>> list with this one long value (because IEEE double rules mean >>>>> sometimes you >>>>> have to put these things in string form or they get mangled by >>>>> javascript- >>>>> eval style parsers). >>>>> >>>>> >>>>> And yet the above is quite common, and can easily be done by a >>>>> databinder, which sees you want a List<Long> for a field whose >>>>> default value is List.of(1L), and, armed with that knowledge, can >>>>> transit the JSON into java in that way. >>>>> >>>>> You don’t *need* databinding to cater to this idea: You could for >>>>> example have a jsonNode.asLong(123) method that would parse a string >>>>> if need be, even. But this has nothing to do with pattern matching either. >>>>> >>>>> --Reinier Zwitserloot >>>>> >>>>> >>>>> On 15 Dec 2022 at 21:30:17, Ethan McCue <et...@mccue.dev> wrote: >>>>> >>>>>> I'm writing this to drive some forward motion and to nerd-snipe those >>>>>> who know better than I do into putting their thoughts into words. >>>>>> >>>>>> There are three ways to process JSON[1] >>>>>> - Streaming (Push or Pull) >>>>>> - Traversing a Tree (Realized or Lazy) >>>>>> - Declarative Databind (N ways) >>>>>> >>>>>> Of these, JEP-198 explicitly ruled out providing "JAXB style type >>>>>> safe data binding." >>>>>> >>>>>> No justification is given, but if I had to insert my own: mapping the >>>>>> Json model to/from the Java/JVM object model is a cursed combo of >>>>>> - Huge possible design space >>>>>> - Unpalatably large surface for backwards compatibility >>>>>> - Serialization! Boo![2] >>>>>> >>>>>> So for an artifact like the JDK, it probably doesn't make sense to >>>>>> include. That tracks. >>>>>> It won't make everyone happy, people like databind APIs, but it >>>>>> tracks. >>>>>> >>>>>> So for the "read flow" these are the things to figure out. >>>>>> >>>>>> | Should Provide? | Intended User(s) | >>>>>> ----------------+-----------------+------------------+ >>>>>> Streaming Push | | | >>>>>> ----------------+-----------------+------------------+ >>>>>> Streaming Pull | | | >>>>>> ----------------+-----------------+------------------+ >>>>>> Realized Tree | | | >>>>>> ----------------+-----------------+------------------+ >>>>>> Lazy Tree | | | >>>>>> ----------------+-----------------+------------------+ >>>>>> >>>>>> At which point, we should talk about what "meets needs of Java >>>>>> developers using JSON" implies. >>>>>> >>>>>> JSON is ubiquitous. Most kinds of software us schmucks write could >>>>>> have a reason to interact with it. >>>>>> The full set of "user personas" therefore aren't practical for me to >>>>>> talk about.[3] >>>>>> >>>>>> JSON documents, however, are not so varied. >>>>>> >>>>>> - There are small ones (1-10kb) >>>>>> - There are medium ones (10-1000kb) >>>>>> - There are big ones (1000kb-???) >>>>>> >>>>>> - There are shallow ones >>>>>> - There are deep ones >>>>>> >>>>>> So that feels like an easier direction to talk about it from. >>>>>> >>>>>> >>>>>> This repo[4] has some convenient toy examples of how some of those >>>>>> APIs look in libraries >>>>>> in the ecosystem. Specifically the Streaming Pull and Realized Tree >>>>>> models. >>>>>> >>>>>> User r = new User(); >>>>>> while (true) { >>>>>> JsonToken token = reader.peek(); >>>>>> switch (token) { >>>>>> case BEGIN_OBJECT: >>>>>> reader.beginObject(); >>>>>> break; >>>>>> case END_OBJECT: >>>>>> reader.endObject(); >>>>>> return r; >>>>>> case NAME: >>>>>> String fieldname = reader.nextName(); >>>>>> switch (fieldname) { >>>>>> case "id": >>>>>> r.setId(reader.nextString()); >>>>>> break; >>>>>> case "index": >>>>>> r.setIndex(reader.nextInt()); >>>>>> break; >>>>>> ... >>>>>> case "friends": >>>>>> r.setFriends(new ArrayList<>()); >>>>>> Friend f = null; >>>>>> carryOn = true; >>>>>> while (carryOn) { >>>>>> token = reader.peek(); >>>>>> switch (token) { >>>>>> case BEGIN_ARRAY: >>>>>> reader.beginArray(); >>>>>> break; >>>>>> case END_ARRAY: >>>>>> reader.endArray(); >>>>>> carryOn = false; >>>>>> break; >>>>>> case BEGIN_OBJECT: >>>>>> reader.beginObject(); >>>>>> f = new Friend(); >>>>>> break; >>>>>> case END_OBJECT: >>>>>> reader.endObject(); >>>>>> r.getFriends().add(f); >>>>>> break; >>>>>> case NAME: >>>>>> String fn = reader.nextName(); >>>>>> switch (fn) { >>>>>> case "id": >>>>>> >>>>>> f.setId(reader.nextString()); >>>>>> break; >>>>>> case "name": >>>>>> >>>>>> f.setName(reader.nextString()); >>>>>> break; >>>>>> } >>>>>> break; >>>>>> } >>>>>> } >>>>>> break; >>>>>> } >>>>>> } >>>>>> >>>>>> I think its not hard to argue that the streaming apis are brutalist. >>>>>> The above is Gson, but Jackson, moshi, etc >>>>>> seem at least morally equivalent. >>>>>> >>>>>> Its hard to write, hard to write *correctly*, and theres is a curious >>>>>> protensity towards pairing it >>>>>> with anemic, mutable models. >>>>>> >>>>>> That being said, it handles big documents and deep documents really >>>>>> well. It also performs >>>>>> pretty darn well and is good enough as a "fallback" when the intended >>>>>> user experience >>>>>> is through something like databind. >>>>>> >>>>>> So what could we do meaningfully better with the language we have >>>>>> today/will have tommorow? >>>>>> >>>>>> - Sealed interfaces + Pattern matching could give a nicer model for >>>>>> tokens >>>>>> >>>>>> sealed interface JsonToken { >>>>>> record Field(String name) implements JsonToken {} >>>>>> record BeginArray() implements JsonToken {} >>>>>> record EndArray() implements JsonToken {} >>>>>> record BeginObject() implements JsonToken {} >>>>>> record EndObject() implements JsonToken {} >>>>>> // ... >>>>>> } >>>>>> >>>>>> // ... >>>>>> >>>>>> User r = new User(); >>>>>> while (true) { >>>>>> JsonToken token = reader.peek(); >>>>>> switch (token) { >>>>>> case BeginObject __: >>>>>> reader.beginObject(); >>>>>> break; >>>>>> case EndObject __: >>>>>> reader.endObject(); >>>>>> return r; >>>>>> case Field("id"): >>>>>> r.setId(reader.nextString()); >>>>>> break; >>>>>> case Field("index"): >>>>>> r.setIndex(reader.nextInt()); >>>>>> break; >>>>>> >>>>>> // ... >>>>>> >>>>>> case Field("friends"): >>>>>> r.setFriends(new ArrayList<>()); >>>>>> Friend f = null; >>>>>> carryOn = true; >>>>>> while (carryOn) { >>>>>> token = reader.peek(); >>>>>> switch (token) { >>>>>> // ... >>>>>> >>>>>> - Value classes can make it all more efficient >>>>>> >>>>>> sealed interface JsonToken { >>>>>> value record Field(String name) implements JsonToken {} >>>>>> value record BeginArray() implements JsonToken {} >>>>>> value record EndArray() implements JsonToken {} >>>>>> value record BeginObject() implements JsonToken {} >>>>>> value record EndObject() implements JsonToken {} >>>>>> // ... >>>>>> } >>>>>> >>>>>> - (Fun One) We can transform a simpler-to-write push parser into a >>>>>> pull parser with Coroutines >>>>>> >>>>>> This is just a toy we could play with while making something in >>>>>> the JDK. I'm pretty sure >>>>>> we could make a parser which feeds into something like >>>>>> >>>>>> interface Listener { >>>>>> void onObjectStart(); >>>>>> void onObjectEnd(); >>>>>> void onArrayStart(); >>>>>> void onArrayEnd(); >>>>>> void onField(String name); >>>>>> // ... >>>>>> } >>>>>> >>>>>> and invert a loop like >>>>>> >>>>>> while (true) { >>>>>> char c = next(); >>>>>> switch (c) { >>>>>> case '{': >>>>>> listener.onObjectStart(); >>>>>> // ... >>>>>> // ... >>>>>> } >>>>>> } >>>>>> >>>>>> by putting a Coroutine.yield in the callback. >>>>>> >>>>>> That might be a meaningful simplification in code structure, I >>>>>> don't know enough to say. >>>>>> >>>>>> But, I think there are some hard questions like >>>>>> >>>>>> - Is the intent[5] to be make backing parser for ecosystem databind >>>>>> apis? >>>>>> - Is the intent that users who want to handle big/deep documents fall >>>>>> back to this? >>>>>> - Are those new language features / conveniences enough to offset the >>>>>> cost of committing to a new api? >>>>>> - To whom exactly does a low level api provide value? >>>>>> - What benefit is standardization in the JDK? >>>>>> >>>>>> and just generally - who would be the consumer(s) of this? >>>>>> >>>>>> The other kind of API still on the table is a Tree. There are two >>>>>> ways to handle this >>>>>> >>>>>> 1. Load it into `Object`. Use a bunch of instanceof checks/casts to >>>>>> confirm what it actually is. >>>>>> >>>>>> Object v; >>>>>> User u = new User(); >>>>>> >>>>>> if ((v = jso.get("id")) != null) { >>>>>> u.setId((String) v); >>>>>> } >>>>>> if ((v = jso.get("index")) != null) { >>>>>> u.setIndex(((Long) v).intValue()); >>>>>> } >>>>>> if ((v = jso.get("guid")) != null) { >>>>>> u.setGuid((String) v); >>>>>> } >>>>>> if ((v = jso.get("isActive")) != null) { >>>>>> u.setIsActive(((Boolean) v)); >>>>>> } >>>>>> if ((v = jso.get("balance")) != null) { >>>>>> u.setBalance((String) v); >>>>>> } >>>>>> // ... >>>>>> if ((v = jso.get("latitude")) != null) { >>>>>> u.setLatitude(v instanceof BigDecimal ? ((BigDecimal) >>>>>> v).doubleValue() : (Double) v); >>>>>> } >>>>>> if ((v = jso.get("longitude")) != null) { >>>>>> u.setLongitude(v instanceof BigDecimal ? ((BigDecimal) >>>>>> v).doubleValue() : (Double) v); >>>>>> } >>>>>> if ((v = jso.get("greeting")) != null) { >>>>>> u.setGreeting((String) v); >>>>>> } >>>>>> if ((v = jso.get("favoriteFruit")) != null) { >>>>>> u.setFavoriteFruit((String) v); >>>>>> } >>>>>> if ((v = jso.get("tags")) != null) { >>>>>> List<Object> jsonarr = (List<Object>) v; >>>>>> u.setTags(new ArrayList<>()); >>>>>> for (Object vi : jsonarr) { >>>>>> u.getTags().add((String) vi); >>>>>> } >>>>>> } >>>>>> if ((v = jso.get("friends")) != null) { >>>>>> List<Object> jsonarr = (List<Object>) v; >>>>>> u.setFriends(new ArrayList<>()); >>>>>> for (Object vi : jsonarr) { >>>>>> Map<String, Object> jso0 = (Map<String, Object>) vi; >>>>>> Friend f = new Friend(); >>>>>> f.setId((String) jso0.get("id")); >>>>>> f.setName((String) jso0.get("name")); >>>>>> u.getFriends().add(f); >>>>>> } >>>>>> } >>>>>> >>>>>> 2. Have an explicit model for Json, and helper methods that do said >>>>>> casts[6] >>>>>> >>>>>> >>>>>> this.setSiteSetting(readFromJson(jsonObject.getJsonObject("site"))); >>>>>> JsonArray groups = jsonObject.getJsonArray("group"); >>>>>> if(groups != null) >>>>>> { >>>>>> int len = groups.size(); >>>>>> for(int i=0; i<len; i++) >>>>>> { >>>>>> JsonObject grp = groups.getJsonObject(i); >>>>>> SNMPSetting grpSetting = readFromJson(grp); >>>>>> String grpName = grp.getString("dbgroup", null); >>>>>> if(grpName != null && grpSetting != null) >>>>>> this.groupSettings.put(grpName, grpSetting); >>>>>> } >>>>>> } >>>>>> JsonArray hosts = jsonObject.getJsonArray("host"); >>>>>> if(hosts != null) >>>>>> { >>>>>> int len = hosts.size(); >>>>>> for(int i=0; i<len; i++) >>>>>> { >>>>>> JsonObject host = hosts.getJsonObject(i); >>>>>> SNMPSetting hostSetting = readFromJson(host); >>>>>> String hostName = host.getString("dbhost", null); >>>>>> if(hostName != null && hostSetting != null) >>>>>> this.hostSettings.put(hostName, hostSetting); >>>>>> } >>>>>> } >>>>>> >>>>>> I think what has become easier to represent in the language nowadays >>>>>> is that explicit model for Json. >>>>>> Its the 101 lesson of sealed interfaces.[7] It feels nice and clean. >>>>>> >>>>>> sealed interface Json { >>>>>> final class Null implements Json {} >>>>>> final class True implements Json {} >>>>>> final class False implements Json {} >>>>>> final class Array implements Json {} >>>>>> final class Object implements Json {} >>>>>> final class String implements Json {} >>>>>> final class Number implements Json {} >>>>>> } >>>>>> >>>>>> And the cast-and-check approach is now more viable on account of >>>>>> pattern matching. >>>>>> >>>>>> if (jso.get("id") instanceof String v) { >>>>>> u.setId(v); >>>>>> } >>>>>> if (jso.get("index") instanceof Long v) { >>>>>> u.setIndex(v.intValue()); >>>>>> } >>>>>> if (jso.get("guid") instanceof String v) { >>>>>> u.setGuid(v); >>>>>> } >>>>>> >>>>>> // or >>>>>> >>>>>> if (jso.get("id") instanceof String id && >>>>>> jso.get("index") instanceof Long index && >>>>>> jso.get("guid") instanceof String guid) { >>>>>> return new User(id, index, guid, ...); // look ma, no >>>>>> setters! >>>>>> } >>>>>> >>>>>> >>>>>> And on the horizon, again, is value types. >>>>>> >>>>>> But there are problems with this approach beyond the performance >>>>>> implications of loading into >>>>>> a tree. >>>>>> >>>>>> For one, all the code samples above have different behaviors around >>>>>> null keys and missing keys >>>>>> that are not obvious from first glance. >>>>>> >>>>>> This won't accept any null or missing fields >>>>>> >>>>>> if (jso.get("id") instanceof String id && >>>>>> jso.get("index") instanceof Long index && >>>>>> jso.get("guid") instanceof String guid) { >>>>>> return new User(id, index, guid, ...); >>>>>> } >>>>>> >>>>>> This will accept individual null or missing fields, but also will >>>>>> silently ignore >>>>>> fields with incorrect types >>>>>> >>>>>> if (jso.get("id") instanceof String v) { >>>>>> u.setId(v); >>>>>> } >>>>>> if (jso.get("index") instanceof Long v) { >>>>>> u.setIndex(v.intValue()); >>>>>> } >>>>>> if (jso.get("guid") instanceof String v) { >>>>>> u.setGuid(v); >>>>>> } >>>>>> >>>>>> And, compared to databind where there is information about the >>>>>> expected structure of the document >>>>>> and its the job of the framework to assert that, I posit that the >>>>>> errors that would be encountered >>>>>> when writing code against this would be more like >>>>>> >>>>>> "something wrong with user" >>>>>> >>>>>> than >>>>>> >>>>>> "problem at users[5].name, expected string or null. got 5" >>>>>> >>>>>> Which feels unideal. >>>>>> >>>>>> >>>>>> One approach I find promising is something close to what Elm does >>>>>> with its decoders[8]. Not just combining assertion >>>>>> and binding like what pattern matching with records allows, but >>>>>> including a scheme for bubbling/nesting errors. >>>>>> >>>>>> static String string(Json json) throws JsonDecodingException { >>>>>> if (!(json instanceof Json.String jsonString)) { >>>>>> throw JsonDecodingException.of( >>>>>> "expected a string", >>>>>> json >>>>>> ); >>>>>> } else { >>>>>> return jsonString.value(); >>>>>> } >>>>>> } >>>>>> >>>>>> static <T> T field(Json json, String fieldName, Decoder<? extends >>>>>> T> valueDecoder) throws JsonDecodingException { >>>>>> var jsonObject = object(json); >>>>>> var value = jsonObject.get(fieldName); >>>>>> if (value == null) { >>>>>> throw JsonDecodingException.atField( >>>>>> fieldName, >>>>>> JsonDecodingException.of( >>>>>> "no value for field", >>>>>> json >>>>>> ) >>>>>> ); >>>>>> } >>>>>> else { >>>>>> try { >>>>>> return valueDecoder.decode(value); >>>>>> } catch (JsonDecodingException e) { >>>>>> throw JsonDecodingException.atField( >>>>>> fieldName, >>>>>> e >>>>>> ); >>>>>> } catch (Exception e) { >>>>>> throw JsonDecodingException.atField(fieldName, >>>>>> JsonDecodingException.of(e, value)); >>>>>> } >>>>>> } >>>>>> } >>>>>> >>>>>> Which I think has some benefits over the ways I've seen of working >>>>>> with trees. >>>>>> >>>>>> >>>>>> >>>>>> - It is declarative enough that folks who prefer databind might be >>>>>> happy enough. >>>>>> >>>>>> static User fromJson(Json json) { >>>>>> return new User( >>>>>> Decoder.field(json, "id", Decoder::string), >>>>>> Decoder.field(json, "index", Decoder::long_), >>>>>> Decoder.field(json, "guid", Decoder::string), >>>>>> ); >>>>>> } >>>>>> >>>>>> / ... >>>>>> >>>>>> List<User> users = Decoders.array(json, User::fromJson); >>>>>> >>>>>> - Handling null and optional fields could be less easily conflated >>>>>> >>>>>> Decoder.field(json, "id", Decoder::string); >>>>>> >>>>>> Decoder.nullableField(json, "id", Decoder::string); >>>>>> >>>>>> Decoder.optionalField(json, "id", Decoder::string); >>>>>> >>>>>> Decoder.optionalNullableField(json, "id", Decoder::string); >>>>>> >>>>>> >>>>>> - It composes well with user defined classes >>>>>> >>>>>> record Guid(String value) { >>>>>> Guid { >>>>>> // some assertions on the structure of value >>>>>> } >>>>>> } >>>>>> >>>>>> Decoder.string(json, "guid", guid -> new >>>>>> Guid(Decoder.string(guid))); >>>>>> >>>>>> // or even >>>>>> >>>>>> record Guid(String value) { >>>>>> Guid { >>>>>> // some assertions on the structure of value >>>>>> } >>>>>> >>>>>> static Guid fromJson(Json json) { >>>>>> return new Guid(Decoder.string(guid)); >>>>>> } >>>>>> } >>>>>> >>>>>> Decoder.string(json, "guid", Guid::fromJson); >>>>>> >>>>>> >>>>>> - When something goes wrong, the API can handle the fiddlyness of >>>>>> capturing information for feedback. >>>>>> >>>>>> In the code I've sketched out its just what field/index things >>>>>> went wrong at. Potentially >>>>>> capturing metadata like row/col numbers of the source would be >>>>>> sensible too. >>>>>> >>>>>> Its just not reasonable to expect devs to do extra work to get >>>>>> that and its really nice to give it. >>>>>> >>>>>> There are also some downsides like >>>>>> >>>>>> - I do not know how compatible it would be with lazy trees. >>>>>> >>>>>> Lazy trees being the only way that a tree api could handle big >>>>>> or deep documents. >>>>>> The general concept as applied in libraries like json-tree[9] is >>>>>> to navigate without >>>>>> doing any work, and that clashes with wanting to instanceof >>>>>> check the info at the >>>>>> current path. >>>>>> >>>>>> - It *almost* gives enough information to be a general schema approach >>>>>> >>>>>> If one field fails, that in the model throws an exception >>>>>> immediately. If an API should >>>>>> return "errors": [...], that is inconvenient to construct. >>>>>> >>>>>> - None of the existing popular libraries are doing this >>>>>> >>>>>> The only mechanics that are strictly required to give this sort >>>>>> of API is lambdas. Those have >>>>>> been out for a decade. Yes sealed interfaces make the data model >>>>>> prettier but in concept you >>>>>> can build the same thing on top of anything. >>>>>> >>>>>> I could argue that this is because of "cultural momentum" of >>>>>> databind or some other reason, >>>>>> but the fact remains that it isn't a proven out approach. >>>>>> >>>>>> Writing Json libraries is a todo list[10]. There are a lot of >>>>>> bad ideas and this might be one of the, >>>>>> >>>>>> - Performance impact of so many instanceof checks >>>>>> >>>>>> I've gotten a 4.2% slowdown compared to the "regular" tree code >>>>>> without the repeated casts. >>>>>> >>>>>> But that was with a parser that is 5x slower than Jacksons. >>>>>> (using the same benchmark project as for the snippets). >>>>>> I think there could be reason to believe that the JIT does well >>>>>> enough with repeated instanceof >>>>>> checks to consider it. >>>>>> >>>>>> >>>>>> My current thinking is that - despite not solving for large or deep >>>>>> documents - starting with a really "dumb" realized tree api >>>>>> might be the right place to start for the read side of a potential >>>>>> incubator module. >>>>>> >>>>>> But regardless - this feels like a good time to start more concrete >>>>>> conversations. I fell I should cap this email since I've reached the >>>>>> point >>>>>> of decoherence and haven't even mentioned the write side of things >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> [1]: http://www.cowtowncoder.com/blog/archives/2009/01/entry_131.html >>>>>> [2]: https://security.snyk.io/vuln/maven?search=jackson-databind >>>>>> [3]: I only know like 8 people >>>>>> [4]: >>>>>> https://github.com/fabienrenaud/java-json-benchmark/blob/master/src/main/java/com/github/fabienrenaud/jjb/stream/UsersStreamDeserializer.java >>>>>> [5]: When I say "intent", I do so knowing full well no one has been >>>>>> actively thinking of this for an entire Game of Thrones >>>>>> [6]: >>>>>> https://github.com/yahoo/mysql_perf_analyzer/blob/master/myperf/src/main/java/com/yahoo/dba/perf/myperf/common/SNMPSettings.java >>>>>> [7]: https://www.infoq.com/articles/data-oriented-programming-java/ >>>>>> [8]: >>>>>> https://package.elm-lang.org/packages/elm/json/latest/Json-Decode >>>>>> [9]: https://github.com/jbee/json-tree >>>>>> [10]: https://stackoverflow.com/a/14442630/2948173 >>>>>> [11]: In 30 days JEP-198 it will be recognizably PI days old for the >>>>>> 2nd time in its history. >>>>>> [12]: To me, the fact that is still an open JEP is more a social >>>>>> convenience than anything. I could just as easily writing this exact same >>>>>> email about TOML. >>>>>> >>>>>