On Monday, August 1, 2022 at 5:43:54 PM UTC Etienne Posthumus wrote:
> Thanks for the excellent spelunking Graham. > Happy to help, thanks for the kind words. > Is it common practice nowadays for most serializers to just do UTF-8 and > not do \-escape sequences anymore? I guess if this has been the behaviour > in rdflib for years now and no-one complains too much, we can just assume > it is OK and keep on doing it. > I don't know about "common practice" but I treat Jena's behaviour as a useful ad hoc yardstick, if it passes muster with Andy Seaborn then it's probably the right way to go. > Maybe it is a good idea for us to add a line in the docs that the rdflib > serializer intentionally deviates from the spec. > Yes, either document the difference or, given that known-working code still exists, perhaps just enabling strictness by setting an *args flag might be a viable solution ... something along the lines of: diff --git a/rdflib/plugins/serializers/nt.py b/rdflib/plugins/serializers/nt.py index 913dbedf..b73f223f 100644 --- a/rdflib/plugins/serializers/nt.py +++ b/rdflib/plugins/serializers/nt.py @@ -38,7 +38,11 @@ class NTSerializer(Serializer): ) for triple in self.store: - stream.write(_nt_row(triple).encode()) + stream.write( + _nt_row(triple).encode("ascii", "_rdflib_nt_escape") + if "w3c" in args + else _nt_row(triple).encode() + ) class NT11Serializer(NTSerializer): Which, on casual testing, behaves as desired, producing “<urn:aap> <urn:noot> "mi\u00EBs" .” with the flag set and “<urn:aap> <urn:noot> "miës" .” when not set. What does the team think? Cheers, Graham -- http://github.com/RDFLib --- You received this message because you are subscribed to the Google Groups "rdflib-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to rdflib-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/rdflib-dev/1b1503b0-dc7a-40a5-963e-0875a6f4b843n%40googlegroups.com.