That's for general URI (Uniform Resource Identifier).
 I believe RFC #1738 covers URL (Uniform Resource
Locator) specifically.  It narrows down considerably
the scope of escaping as only a few valid
combinations, characters, and formats are valid for
WWW addresses.

  -dZ.


>------- Original Message -------
>From    : Arno Garrels[mailto:[EMAIL PROTECTED]
>Sent    : 4/20/2007 10:41:53 AM
>To      : twsocket@elists.org
>Cc      : 
>Subject : RE: Re: [twsocket] httpcli v6 "bad request"
>
 >[EMAIL PROTECTED] wrote:
>> ------- Original Message -------
>>> From    : Francois
> PIETTE[ mailto:[EMAIL PROTECTED] 
>> Sent    : 4/19/2007 2:46:48 PM
>> To      : twsocket@elists.org
>> Cc      :
>> Subject : RE: Re: [twsocket] httpcli v6 "bad request"
>> 
>  >>> Agreed, so we need a FAST routine. URLEncode
> currently
>>> isn't smart enough to encode a complete URL, and
it is
>>> slow (result := result + ..)
>> 
>> Next question is how smart should such a routine act?
>> Should it check for a valid URL in general or shall it
>> just check for valid encoding?
>> Should it auto-complete incomplete as well as
> auto-correct
>> invalid URLs like IE? When you start thinking about
> this stuff
>> the routine in mind becomes slower and slower :(
> 
>> Making URLEncode faster is probably enough for
>> the component. Checking valid
>> URL and autocomplete is another thing.
> 
> I don't know if this is what Arno had in mind with
> the validation, but as I mentioned before, there is
> one more catch:  what if the application encoded the
> URL to begin with?  
> Then all percent symbols will be
> re-encoded and the URL mangled.  For this reason you
> either need to unencode-reencode (slow!), or check
> for encoding and only encode if necessary.

RFC2396 sounds rather complicated: 

" 2.4.2. When to Escape and Unescape

   A URI is always in an "escaped" form, since
escaping or unescaping a
   completed URI might change its semantics. 
Normally, the only time
   escape encodings can safely be made is when the
URI is being created
   from its component parts; each component may have
its own set of
   characters that are reserved, so only the
mechanism responsible for
   generating or interpreting that component can
determine whether or
   not escaping a character will change its
semantics. Likewise, a URI
   must be separated into its components before the
escaped characters
   within those components can be safely decoded. "

(  http://www.faqs.org/rfcs/rfc2396.html  )

We probably have to bind encoding to function
ParseUrl() somehow.

--
Arno Garrels [TeamICS]
 http://www.overbyte.be/eng/overbyte/teamics.html  


 
>     -dZ.
-- 
To unsubscribe or change your settings for TWSocket
mailing list
please goto 
http://www.elists.org/mailman/listinfo/twsocket 
Visit our website at  http://www.overbyte.be 


-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://www.elists.org/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be

Reply via email to