Re: Matching of german umlauts with LIKE

Markus Fischer Thu, 23 Mar 2006 21:15:11 -0800

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,


I realized this also. Martin Schwarz off list suggested to me to use
REGEXP for that purpose. That means that I'm rewriting the user input
"übersee" behind the scenes to "^(ü|ue)bersee.*" which works quote well.

The only gotcha is that the user may enter any special regexp characters
also, so I had to make a list of them and escape them prior to modifying
the string.

thanks,
- - Markus

sheeri kritzer wrote:
> MySQL doesn't have anything like that.  You can use the wildcard
> characters instead of the umlauts if you want, such as
> 
> SELECT * from person where name like "%bersee"
> which would get
> "übersee" and "uebersee"
> but also a whole lot more.
> 
> But doing something like
> SELECT * from person where name like "_bersee" or name like "__bersee"
> might work -- the underscore means "1 of any character", so here the
> only noise you'd get are other folks whose names are
> _  _  bersee
> So there's still a margin for error.
> 
> Unfortunately, there's no special case for "hey, when you're looking
> at LIKE, I want to define that x=y" -- particularly when x and y have
> differing #'s of characters.
> 
> -Sheeri
> On 3/22/06, Markus Fischer <[EMAIL PROTECTED]> wrote:
> Hi,
> 
> what is the best way to match german umlauts like 'ä' also their
> alternative writing 'ae'?
> 
> For example I'm searching for "übersee" and I also want to find the word
> "uebersee" in the database. The "words" are actually names of persons.
> 
> One possibility  is to dynamically expand the SQL statement if such
> special characters are found. So the search term "übersee" will be
> expanded to "SELECT * FROM person WHERE name LIKE 'übersee%' AND name
> LIKE 'uebersee%'" but this is getting dirty and very very long if
> multiple umlauts are used to cover all cases ...
> 
> So the other idea is to have the name twice in the database for every
> person and the second "version" of the name is a normalized for where
> all special characters are replaced with their alternative writing. E.g.
> I store the field name "übersee" and also name2 "uebersee" and when
> matching I match against name2. If the field would container more
> special characters it still would work without much more work, e.g. name
> is "überseemöbel" then name2 would be "ueberseemoebel" and when the term
> "überseemö" is entered it's also normalized to "ueberseemoe" and the
> LIKE statement will still match. Basically this is some kind of
> primitive stemming like lucene does it.
> 
> Is there maybe some built-in support from MySQL for such special cases?
> 
> thanks for any pointers,
> - Markus
>>
- --
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]
>>
>>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEI4BC1nS0RcInK9ARAnMrAJ9jZ5LAxS1S2bjqrPvIUBSiTGsxxQCgv+5l
xyxQhd7B9HTnc8sTa7Tsekk=
=uycm
-----END PGP SIGNATURE-----

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Matching of german umlauts with LIKE

Reply via email to