The idea of unicode is not to care about encodings, a ж should be a ж no matter what is ithe underlying encoding, utf8, cp1251 or anything else capable of actually displaying that glyph. This is where people start mixing the terms unicode and, say, utf-8. IS_URL should certainly have unicode as the parameter, however, here lies the dilemma - if you pass validation (as an escaped unicode object), IMHO you should insert *that*, escaped, RFC compliant unicode string into the database, and not the 'pretty' utf8 one. This is what I'm talking about when saying helpers beign smart enough to help human-friendly use and display, but to remain standards compliant in the process. I hope I make sense as it's a pretty convoluted topic, encodings are known to cause serious headache on exposure :)
On Nov 30, 5:15 am, Jonathan Benn <[EMAIL PROTECTED]> wrote: > Hi Achipa, > > On Nov 29, 9:20 pm, achipa <[EMAIL PROTECTED]> wrote: > > > Storing and validating URI's should be pretty straightforward with the > > RFC's, the trouble comes when you have to decide how do you the URI to > > *appear* to the user/developer. Obviously looking at a bunch of %XY-s > > or xn--es is not a pretty sight, and in case the language uses a non- > > latin alphabet, it becomes completely unreadable. The real question is > > therefore that of presentation and ease of use, and the RFC can't help > > much there as that's not it's subject. > > Right, I agree with you completely. That's why I was suggesting a > helper function as one possible solution. This function could be > called, for example, unicode_to_latin(). You'd have to design your > application taking into account the idea that end-users might type in > non-latin characters in the URL. When your app receives a URL, it > passes the unicode string to a helper function, which returns a latin- > character equivalent string in non-unicode. You can then pass the > result to IS_URL to evaluate its correctness. unicode_to_latin() would > also accept a non-unicode string, in which case it returns the string > unchanged. > > The actual use would look something like this: > > IS_URL(unicode_to_latin(unicode_string)) > > Alternatively, IS_URL could make use of this helper function > internally. Since unicode_to_latin() does not change a regular string, > IS_URL will still deal with all latin strings correctly, and will now > gain the ability to handle unicode strings as well. The actual use > would now look like this: > > IS_URL(unicode_string) > > I hope that helps, > > --Jonathan --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "web2py Web Framework" group. To post to this group, send email to web2py@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/web2py?hl=en -~----------~----~----~----~------~----~------~--~---