Hi, > On 10 Sep 2018, at 12:53, PBKResearch <pe...@pbkresearch.co.uk> wrote: > > Hi Petr > > I have used #urlEncoded in the past, with success, to deal with German > umlauts. The secret is to urlEncode just the part containing the diacritics. > If you encode the whole url, the slashes are encoded, and this confuses Zinc, > which segments the url before decoding. > > So I would expect you to be able to read your file with: > > ZnEasy get: 'http://domain.com/’,’ěščýž.html' urlEncoded. > > However, this also fails with ‘ASCII character expected’, and I can’t > understand why. The debug trace has too many levels for me to understand. > Zinc is evidently getting in a mess trying to decode the urlEncoded string, > but if we try: > > ’ěščýž.html' urlEncoded urlDecoded > > as a separate operation, it works OK. > > I think only Sven can explain this for you.
The external representation of a URL with special characters is not the same as what an address bar or browser search field accepts. The latter is quite intelligent and accepts much broader input. ZnUrl parses the official external representation according to the spec. Internally, ZnUrl represents all components as resolved strings. The solution is to construct difficult/special URLs by hand. Here is an example: let's say we want to access the English Wikipedia page of the Czech Republic (the country) using its native name 'Česká republika' (which is not only non-ASCII, but non-Latin1 as well, so it needs a WideString and UTF-8 encoding). Here is one way to construct such a string. ZnUrl new scheme: #http; host: 'en.wikipedia.org'; addPathSegment: 'wiki'; addPathSegment: 'Česká republika'; yourself. Which gives a URL with the following external representation: http://en.wikipedia.org/wiki/%C4%8Cesk%C3%A1%20republika This can be parsed without problems. 'http://en.wikipedia.org/wiki/%C4%8Cesk%C3%A1%20republika' asUrl. You can send #retrieveContents to a URL to actually fetch it. ZnUrl new scheme: #http; host: 'en.wikipedia.org'; addPathSegment: 'wiki'; addPathSegment: 'Česká republika'; retrieveContents. Or you could use the url in a ZnClient object. BTW, there are many ways to construct URLs, I would maybe do the following. 'https://en.wikipedia.org/wiki' asUrl addPathSegment: 'Česká republika'; yourself. Or something like ZnClient new url: 'https://en.wikipedia.org/wiki'; addPathSegment: 'Česká republika'; get. HTH, Sven > HTH > > Peter Kenny > > > From: Pharo-users <pharo-users-boun...@lists.pharo.org> On Behalf Of Petr > Fischer via Pharo-users > Sent: 10 September 2018 10:07 > To: pharo-users@lists.pharo.org > Cc: Petr Fischer <petr.fisc...@me.com> > Subject: [Pharo-users] ZnURL and parsing URL with diacritics > > Hello, > > when I try to parse this URL asUrl, error "ZnCharacterEncodingError: ASCII > character expected" occurs: > > 'http://domain.com/ěščýž.html' asUrl. > > this also does not work: > > ZnEasy get: 'http://domain.com/ěščýž.html' > > How to solve this? In the web browser, URL with diacritics is OK. > > I tried also this: > > ZnEasy get: 'http://domain.com/ěščýž.html' urlEncoded. > > but this cripples the whole URL. > > Thanks! Petr Fischer