Hi Graham, All,
I suspect that the spec has changed in regards to N-Triples over the
years. Specifically when Turtle became a W3C standard.
For example the spec for 1.1 N-Triples says [1]
Encoding considerations:
The syntax of N-Triples is expressed over code points in Unicode
[UNICODE]. The encoding is always UTF-8 [UTF-8].
Unicode code points may also be expressed using an \uXXXX (U+0 to
U+FFFF) or \UXXXXXXXX syntax (for U+10000 onwards) where X is a
hexadecimal digit [0-9A-F]
And also the note here
6.1 Other Media Types
N-Triples has been historically provided with other media types.
N-Triples may also be provided as text/plain. When used in this way
N-Triples MUST use the escaped form of any character outside US-ASCII.
Hope that helps pointing out if it is a bug or not.
Regards,
Jerven
[1] https://www.w3.org/TR/2014/REC-n-triples-20140225/
On 01/08/2022 21:13, Graham Higgins wrote:
On Monday, August 1, 2022 at 5:43:54 PM UTC Etienne Posthumus wrote:
Thanks for the excellent spelunking Graham.
Happy to help, thanks for the kind words.
Is it common practice nowadays for most serializers to just do UTF-8
and not do \-escape sequences anymore? I guess if this has been the
behaviour in rdflib for years now and no-one complains too much, we
can just assume it is OK and keep on doing it.
I don't know about "common practice" but I treat Jena's behaviour as a
useful ad hoc yardstick, if it passes muster with Andy Seaborn then it's
probably the right way to go.
Maybe it is a good idea for us to add a line in the docs that the
rdflib serializer intentionally deviates from the spec.
Yes, either document the difference or, given that known-working code
still exists, perhaps just enabling strictness by setting an *args flag
might be a viable solution ... something along the lines of:
diff --git a/rdflib/plugins/serializers/nt.py
b/rdflib/plugins/serializers/nt.py
index 913dbedf..b73f223f 100644
--- a/rdflib/plugins/serializers/nt.py
+++ b/rdflib/plugins/serializers/nt.py
@@ -38,7 +38,11 @@ class NTSerializer(Serializer):
)
for triple in self.store:
- stream.write(_nt_row(triple).encode())
+ stream.write(
+ _nt_row(triple).encode("ascii", "_rdflib_nt_escape")
+ if "w3c" in args
+ else _nt_row(triple).encode()
+ )
class NT11Serializer(NTSerializer):
Which, on casual testing, behaves as desired, producing “<urn:aap>
<urn:noot> "mi\u00EBs" .” with the flag set and “<urn:aap> <urn:noot>
"miës" .” when not set.
What does the team think?
Cheers,
Graham
--
http://github.com/RDFLib <http://github.com/RDFLib>
---
You received this message because you are subscribed to the Google
Groups "rdflib-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to rdflib-dev+unsubscr...@googlegroups.com
<mailto:rdflib-dev+unsubscr...@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/rdflib-dev/1b1503b0-dc7a-40a5-963e-0875a6f4b843n%40googlegroups.com
<https://groups.google.com/d/msgid/rdflib-dev/1b1503b0-dc7a-40a5-963e-0875a6f4b843n%40googlegroups.com?utm_medium=email&utm_source=footer>.
--
*Jerven Tjalling Bolleman*
Principal Software Developer
*SIB | Swiss Institute of Bioinformatics*
1, rue Michel Servet - CH 1211 Geneva 4 - Switzerland
t +41 22 379 58 85
Jerven.Bolleman@sib.swiss - www.sib.swiss
--
http://github.com/RDFLib
---
You received this message because you are subscribed to the Google Groups "rdflib-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to rdflib-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/rdflib-dev/286be9dd-6674-def2-3691-f19fc9e9b39b%40sib.swiss.