Rudolph wrote:

Batiste wrote:
But there is possible to increase the quality of the generated slug
with some European symbols like (é,è,à,â,È,É,À,Â,ö,ä ...)

é -> e
è -> e
à -> a
À -> a

I did this once in PHP, it worked really well (yes, I know it's PHP,
that's why I switched to Django):
$slug = strtolower(htmlentities($title, ENT_NOQUOTES, 'UTF-8'));
$slug_no_accents =
preg_replace("/&(.)(acute|cedil|circ|ring|tilde|uml);/", "$1", $slug);

One should be abled to port this to Django in no-time.


isn't this something that unicode should be able to  do?

try this:

def strip(text):
    decomposed_form = unicodedata.normalize('NFD',text)
simplechars = [c for c in decomposed_form if unicodedata.category(c)[0] == 'L']
    return ''.join(simplechars)

first it asks the python unicode module to decompose the strings accented characters into separate character and accent-mark characters.

then he goes through the string, and only takes the characters that are normal characters.

please note that it's 1:38AM here, so my code can be very wrong :) (but it works :)...and it's clearly not optimized for speed :)

gabor

Reply via email to