That's for general URI (Uniform Resource Identifier). I believe RFC #1738 covers URL (Uniform Resource Locator) specifically. It narrows down considerably the scope of escaping as only a few valid combinations, characters, and formats are valid for WWW addresses.
-dZ. >------- Original Message ------- >From : Arno Garrels[mailto:[EMAIL PROTECTED] >Sent : 4/20/2007 10:41:53 AM >To : twsocket@elists.org >Cc : >Subject : RE: Re: [twsocket] httpcli v6 "bad request" > >[EMAIL PROTECTED] wrote: >> ------- Original Message ------- >>> From : Francois > PIETTE[ mailto:[EMAIL PROTECTED] >> Sent : 4/19/2007 2:46:48 PM >> To : twsocket@elists.org >> Cc : >> Subject : RE: Re: [twsocket] httpcli v6 "bad request" >> > >>> Agreed, so we need a FAST routine. URLEncode > currently >>> isn't smart enough to encode a complete URL, and it is >>> slow (result := result + ..) >> >> Next question is how smart should such a routine act? >> Should it check for a valid URL in general or shall it >> just check for valid encoding? >> Should it auto-complete incomplete as well as > auto-correct >> invalid URLs like IE? When you start thinking about > this stuff >> the routine in mind becomes slower and slower :( > >> Making URLEncode faster is probably enough for >> the component. Checking valid >> URL and autocomplete is another thing. > > I don't know if this is what Arno had in mind with > the validation, but as I mentioned before, there is > one more catch: what if the application encoded the > URL to begin with? > Then all percent symbols will be > re-encoded and the URL mangled. For this reason you > either need to unencode-reencode (slow!), or check > for encoding and only encode if necessary. RFC2396 sounds rather complicated: " 2.4.2. When to Escape and Unescape A URI is always in an "escaped" form, since escaping or unescaping a completed URI might change its semantics. Normally, the only time escape encodings can safely be made is when the URI is being created from its component parts; each component may have its own set of characters that are reserved, so only the mechanism responsible for generating or interpreting that component can determine whether or not escaping a character will change its semantics. Likewise, a URI must be separated into its components before the escaped characters within those components can be safely decoded. " ( http://www.faqs.org/rfcs/rfc2396.html ) We probably have to bind encoding to function ParseUrl() somehow. -- Arno Garrels [TeamICS] http://www.overbyte.be/eng/overbyte/teamics.html > -dZ. -- To unsubscribe or change your settings for TWSocket mailing list please goto http://www.elists.org/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be -- To unsubscribe or change your settings for TWSocket mailing list please goto http://www.elists.org/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be