[web2py:12667] Re: non-ascii chars URL

achipa Sun, 30 Nov 2008 03:03:48 -0800

The idea of unicode is not to care about encodings, a ж should be a ж
no matter what is ithe underlying encoding, utf8, cp1251 or anything
else capable of actually displaying that glyph. This is where people
start mixing the terms unicode and, say, utf-8. IS_URL should
certainly have unicode as the parameter, however, here lies the
dilemma - if you pass validation (as an escaped unicode object), IMHO
you should insert *that*, escaped, RFC compliant unicode string into
the database, and not the 'pretty' utf8 one. This is what I'm talking
about when saying helpers beign smart enough to help human-friendly
use and display, but to remain standards compliant in the process. I
hope I make sense as it's a pretty convoluted topic, encodings are
known to cause serious headache on exposure :)


On Nov 30, 5:15 am, Jonathan Benn <[EMAIL PROTECTED]> wrote:
> Hi Achipa,
>
> On Nov 29, 9:20 pm, achipa <[EMAIL PROTECTED]> wrote:
>
> > Storing and validating URI's should be pretty straightforward with the
> > RFC's, the trouble comes when you have to decide how do you the URI to
> > *appear* to the user/developer. Obviously looking at a bunch of %XY-s
> > or xn--es is not a pretty sight, and in case the language uses a non-
> > latin alphabet, it becomes completely unreadable. The real question is
> > therefore that of presentation and ease of use, and the RFC can't help
> > much there as that's not it's subject.
>
> Right, I agree with you completely.  That's why I was suggesting a
> helper function as one possible solution.  This function could be
> called, for example, unicode_to_latin(). You'd have to design your
> application taking into account the idea that end-users might type in
> non-latin characters in the URL. When your app receives a URL, it
> passes the unicode string to a helper function, which returns a latin-
> character equivalent string in non-unicode. You can then pass the
> result to IS_URL to evaluate its correctness. unicode_to_latin() would
> also accept a non-unicode string, in which case it returns the string
> unchanged.
>
> The actual use would look something like this:
>
> IS_URL(unicode_to_latin(unicode_string))
>
> Alternatively, IS_URL could make use of this helper function
> internally. Since unicode_to_latin() does not change a regular string,
> IS_URL will still deal with all latin strings correctly, and will now
> gain the ability to handle unicode strings as well. The actual use
> would now look like this:
>
> IS_URL(unicode_string)
>
> I hope that helps,
>
> --Jonathan
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"web2py Web Framework" group.
To post to this group, send email to web2py@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/web2py?hl=en
-~----------~----~----~----~------~----~------~--~---

[web2py:12667] Re: non-ascii chars URL

Reply via email to