On Sat, Dec 1, 2018 at 9:16 PM Marko Rauhamaa <ma...@pacujo.net> wrote: > > Paul Rubin <no.email@nospam.invalid>: > > > Marko Rauhamaa <ma...@pacujo.net> writes: > >> Having rejected different options (<URL: > >> https://en.wikipedia.org/wiki/JSON_streaming>), I settled with > >> terminating each JSON value with an ASCII NUL character, which is > >> illegal in JSON proper. > > > > Thanks, that Wikipedia article is helpful. I'd prefer to not use stuff > > like NUL or RS because I like keeping the file human readable. I might > > use netstring format (http://cr.yp.to/proto/netstrings.txt) but I'm even > > more convinced now that adding a streaming feature to the existing json > > module is the right way to do it. > > We all have our preferences. > > In my case, I need an explicit terminator marker to know when a JSON > value is complete. For example, if I should read from a socket: > > 123 > > I can't yet parse it because there might be another digit coming. On the > other hand, the peer might not see any reason to send any further bytes > because "123" is all they wanted to send at the moment.
This is actually the only special case. Every other JSON value has a clear end. So the only thing you need to say is that, if the sender wishes to transmit a bare number, it must append a space. Seriously, how often do you ACTUALLY send a bare number? I've sometimes sent a string on its own, but even that is incredibly rare. Having to send a simple space after a bare number is unlikely to cause much trouble. > As for NUL, a control character that is illegal in all JSON contexts is > practical so the JSON chunks don't need to be escaped. An ASCII-esque > solution would be to pick ETX (= end of text). Unfortunately, a human > operator typing ETX (= ctrl-C) to terminate a JSON value will cause a > KeyboardInterrupt in many modern command-line interfaces. > > It happens NUL (= ctrl-SPC = ctrl-@) is pretty easy to generate and > manipulate in editors and the command line. I have no idea which editors YOU use, but if you poll across platforms and systems, I'm pretty sure you'll find that not everyone can type it. Furthermore, many tools use the presence of an 0x00 byte as evidence that a file is binary, not text. (For instance, git does this.) That might be a good choice for your personal use-case, but not the general case, whereas the much simpler options listed on the Wikipedia page are far more general, and actually wouldn't be THAT hard for you to use. > The need for the format to be "typable" (and editable) is essential for > ad-hoc manual testing of components. That precludes all framing formats > that would necessitate a length prefix. HTTP would be horrible to have > to type even without the content-length problem, but BEEP (RFC 3080) > would suffer from the content-length (and CRLF!) issue as well. I dunno, I type HTTP manually often enough that it can't be all *that* horrible. > Finally, couldn't any whitespace character work as a terminator? Yes, it > could, but it would force you to use a special JSON parser that is > prepared to handle the self-delineation. A NUL gives you many more > degrees of freedom in choosing your JSON tools. Either non-delimited or newline-delimited JSON is supported in a lot of tools. I'm quite at a loss here as to how an unprintable character gives you more freedom. I get it: you have a bizarre set of tools and the normal solutions don't work for you. But you can't complain about the tools not supporting your use-cases. Just code up your own styles of doing things that are unique to you. ChrisA -- https://mail.python.org/mailman/listinfo/python-list