Re: [Pharo-users] ZnURL and parsing URL with diacritics

Petr Fischer via Pharo-users Mon, 10 Sep 2018 05:18:14 -0700

--- Begin Message ---

OK. Thanks for examples. But in my case, the bad URL (with diacritics) comes 
directly from the Zomato.com REST API (they probably do not read specs), so 
I'll end up with a few "hacks" with strings.


pf


> Hi,
> 
> > On 10 Sep 2018, at 12:53, PBKResearch <pe...@pbkresearch.co.uk> wrote:
> > 
> > Hi Petr
> >  
> > I have used #urlEncoded in the past, with success, to deal with German 
> > umlauts. The secret is to urlEncode just the part containing the 
> > diacritics. If you encode the whole url, the slashes are encoded, and this 
> > confuses Zinc, which segments the url before decoding.
> >  
> > So I would expect you to be able to read your file with:
> >  
> > ZnEasy get: 'http://domain.com/’,’ěščýž.html' urlEncoded.
> >  
> > However, this also fails with ‘ASCII character expected’, and I can’t 
> > understand why. The debug trace has too many levels for me to understand. 
> > Zinc is evidently getting in a mess trying to decode the urlEncoded string, 
> > but if we try:
> >  
> > ’ěščýž.html' urlEncoded urlDecoded
> >  
> > as a separate operation, it works OK.
> >  
> > I think only Sven can explain this for you.
> 
> The external representation of a URL with special characters is not the same 
> as what an address bar or browser search field accepts. The latter is quite 
> intelligent and accepts much broader input.
> 
> ZnUrl parses the official external representation according to the spec.
> 
> Internally, ZnUrl represents all components as resolved strings. The solution 
> is to construct difficult/special URLs by hand.
> 
> Here is an example: let's say we want to access the English Wikipedia page of 
> the Czech Republic (the country) using its native name 'Česká republika' 
> (which is not only non-ASCII, but non-Latin1 as well, so it needs a 
> WideString and UTF-8 encoding).
> 
> Here is one way to construct such a string.
> 
> ZnUrl new 
>   scheme: #http; 
>   host: 'en.wikipedia.org'; 
>   addPathSegment: 'wiki'; 
>   addPathSegment: 'Česká republika';
>   yourself.
> 
> Which gives a URL with the following external representation:
> 
>   http://en.wikipedia.org/wiki/%C4%8Cesk%C3%A1%20republika
> 
> This can be parsed without problems.
> 
>   'http://en.wikipedia.org/wiki/%C4%8Cesk%C3%A1%20republika' asUrl.
> 
> You can send #retrieveContents to a URL to actually fetch it.
> 
> ZnUrl new 
>   scheme: #http; 
>   host: 'en.wikipedia.org'; 
>   addPathSegment: 'wiki'; 
>   addPathSegment: 'Česká republika'; 
>   retrieveContents.
> 
> Or you could use the url in a ZnClient object.
> 
> BTW, there are many ways to construct URLs, I would maybe do the following.
> 
>   'https://en.wikipedia.org/wiki' asUrl addPathSegment: 'Česká republika'; 
> yourself.
> 
> Or something like
> 
> ZnClient new
>   url: 'https://en.wikipedia.org/wiki';
>   addPathSegment: 'Česká republika';
>   get.
> 
> HTH,
> 
> Sven
> 
> > HTH
> >  
> > Peter Kenny
> >  
> >  
> > From: Pharo-users <pharo-users-boun...@lists.pharo.org> On Behalf Of Petr 
> > Fischer via Pharo-users
> > Sent: 10 September 2018 10:07
> > To: pharo-users@lists.pharo.org
> > Cc: Petr Fischer <petr.fisc...@me.com>
> > Subject: [Pharo-users] ZnURL and parsing URL with diacritics
> >  
> > Hello, 
> >  
> > when I try to parse this URL asUrl, error "ZnCharacterEncodingError: ASCII 
> > character expected" occurs:
> >  
> > 'http://domain.com/ěščýž.html' asUrl.
> >  
> > this also does not work:
> >  
> > ZnEasy get: 'http://domain.com/ěščýž.html'
> >  
> > How to solve this? In the web browser, URL with diacritics is OK. 
> >  
> > I tried also this:
> >  
> > ZnEasy get: 'http://domain.com/ěščýž.html' urlEncoded.
> >  
> > but this cripples the whole URL.
> >  
> > Thanks! Petr Fischer
> 
>

--- End Message ---

Re: [Pharo-users] ZnURL and parsing URL with diacritics

Reply via email to