It would probably help if you gave a real example, a REST call that returns 
something (presumable JSON or XML) that contains a URL that is problematic.

FWIW, the following do also work

('https://en.wikipedia.org/wiki/' , 'Česká republika' urlEncoded) asUrl.
('https://en.wikipedia.org/wiki/' , 'Česká republika' urlEncoded) asUrl 
retrieveContents.

> On 10 Sep 2018, at 14:16, Petr Fischer via Pharo-users 
> <pharo-users@lists.pharo.org> wrote:
> 
> 
> From: Petr Fischer <petr.fisc...@me.com>
> Subject: Re: [Pharo-users] ZnURL and parsing URL with diacritics
> Date: 10 September 2018 at 14:16:53 GMT+2
> To: Any question about pharo is welcome <pharo-users@lists.pharo.org>
> 
> 
> OK. Thanks for examples. But in my case, the bad URL (with diacritics) comes 
> directly from the Zomato.com REST API (they probably do not read specs), so 
> I'll end up with a few "hacks" with strings.
> 
> pf
> 
> 
>> Hi,
>> 
>>> On 10 Sep 2018, at 12:53, PBKResearch <pe...@pbkresearch.co.uk> wrote:
>>> 
>>> Hi Petr
>>> 
>>> I have used #urlEncoded in the past, with success, to deal with German 
>>> umlauts. The secret is to urlEncode just the part containing the 
>>> diacritics. If you encode the whole url, the slashes are encoded, and this 
>>> confuses Zinc, which segments the url before decoding.
>>> 
>>> So I would expect you to be able to read your file with:
>>> 
>>> ZnEasy get: 'http://domain.com/’,’ěščýž.html' urlEncoded.
>>> 
>>> However, this also fails with ‘ASCII character expected’, and I can’t 
>>> understand why. The debug trace has too many levels for me to understand. 
>>> Zinc is evidently getting in a mess trying to decode the urlEncoded string, 
>>> but if we try:
>>> 
>>> ’ěščýž.html' urlEncoded urlDecoded
>>> 
>>> as a separate operation, it works OK.
>>> 
>>> I think only Sven can explain this for you.
>> 
>> The external representation of a URL with special characters is not the same 
>> as what an address bar or browser search field accepts. The latter is quite 
>> intelligent and accepts much broader input.
>> 
>> ZnUrl parses the official external representation according to the spec.
>> 
>> Internally, ZnUrl represents all components as resolved strings. The 
>> solution is to construct difficult/special URLs by hand.
>> 
>> Here is an example: let's say we want to access the English Wikipedia page 
>> of the Czech Republic (the country) using its native name 'Česká republika' 
>> (which is not only non-ASCII, but non-Latin1 as well, so it needs a 
>> WideString and UTF-8 encoding).
>> 
>> Here is one way to construct such a string.
>> 
>> ZnUrl new 
>>  scheme: #http; 
>>  host: 'en.wikipedia.org'; 
>>  addPathSegment: 'wiki'; 
>>  addPathSegment: 'Česká republika';
>>  yourself.
>> 
>> Which gives a URL with the following external representation:
>> 
>>  http://en.wikipedia.org/wiki/%C4%8Cesk%C3%A1%20republika
>> 
>> This can be parsed without problems.
>> 
>>  'http://en.wikipedia.org/wiki/%C4%8Cesk%C3%A1%20republika' asUrl.
>> 
>> You can send #retrieveContents to a URL to actually fetch it.
>> 
>> ZnUrl new 
>>  scheme: #http; 
>>  host: 'en.wikipedia.org'; 
>>  addPathSegment: 'wiki'; 
>>  addPathSegment: 'Česká republika'; 
>>  retrieveContents.
>> 
>> Or you could use the url in a ZnClient object.
>> 
>> BTW, there are many ways to construct URLs, I would maybe do the following.
>> 
>>  'https://en.wikipedia.org/wiki' asUrl addPathSegment: 'Česká republika'; 
>> yourself.
>> 
>> Or something like
>> 
>> ZnClient new
>>  url: 'https://en.wikipedia.org/wiki';
>>  addPathSegment: 'Česká republika';
>>  get.
>> 
>> HTH,
>> 
>> Sven
>> 
>>> HTH
>>> 
>>> Peter Kenny
>>> 
>>> 
>>> From: Pharo-users <pharo-users-boun...@lists.pharo.org> On Behalf Of Petr 
>>> Fischer via Pharo-users
>>> Sent: 10 September 2018 10:07
>>> To: pharo-users@lists.pharo.org
>>> Cc: Petr Fischer <petr.fisc...@me.com>
>>> Subject: [Pharo-users] ZnURL and parsing URL with diacritics
>>> 
>>> Hello, 
>>> 
>>> when I try to parse this URL asUrl, error "ZnCharacterEncodingError: ASCII 
>>> character expected" occurs:
>>> 
>>> 'http://domain.com/ěščýž.html' asUrl.
>>> 
>>> this also does not work:
>>> 
>>> ZnEasy get: 'http://domain.com/ěščýž.html'
>>> 
>>> How to solve this? In the web browser, URL with diacritics is OK. 
>>> 
>>> I tried also this:
>>> 
>>> ZnEasy get: 'http://domain.com/ěščýž.html' urlEncoded.
>>> 
>>> but this cripples the whole URL.
>>> 
>>> Thanks! Petr Fischer
>> 
>> 
> 
> 
> 


Reply via email to