Howdy, I don't know of any modules, but I have a few suggestions on how to implement it. For speed, I'd recommend caching a simplified 'eyedex' version of each person's username, either as a new column in their main record, or in a secondary list that is cross-indexed by the user_id.
Just recommendations, but I'd try: make arrays to convert normal strings to cached 'eyedex' strings, for each similar character group, make an array like of "1, I, l, !, | " for all the 'bar-like' characters. When someone types in a username of 'EIIa' into a new username field to join a msgboard, you can use a regex to make the cached version by searching for characters that match elements [1] through [4] (not [0]) and change them to the character at [0]. This produces a storable string that turns 'Ella', 'E||a', 'E!!a', or 'EIIa' (but not 'E11a') into just 'E11a'. Users will never see this string, as it's not the real username, but its stored for utility purposes. Then, if a previous user had chosen 'Ella', and someone types 'EIIa' or any similar version, you can match the new username's 'eyedex' of 'E11a' against the 'eyedex' usernames and find a match as quickly as if you were matching any normal ASCII matches because at that point they are. I'd put the lists of similar characters into text files and load them at runtime, so you can swap out and tweek different lists as needed if the similarities are too confining or too open. As long as the primary 'convert-to' values remain the same (that is, element [0] in this case), then all the cached Eyedex values for the users will not break. You may like to code a 'by font-name-group' list so if you publish this module other people with obcsure fonts can build their own lists for some pages and use the standard lists for others. When running the code, The overhead is only in generating the new Eyedex string when a new user enters a new username, and when you have to run a batch update on all the existing usernames and cache thier Eyedex values. After that, as long as the Eyedex values are stored in an as easily accessible manner as the usernames, you can check that instead as fast as checking agianst other username values, at the moment you have to validate the new user name on a 'create new user' page. The only place I can see this getting tricky is if you need a 7 to look like a 9 and a 7 to look like an / but a 9 NOT to look like a /. Keep in mind, when you run a batch for the first time you'll need to handle re-existing similar names from before the new validation rule went into practice. If you run a check when you first build each username eyedex agianst the existing list, you can build a report of the duplicates and deal with that after. Sometimes, someone will loose their password innocently and create a new username that is visually similar without being guilty of a typo attack. Anyway, I hope this helps Cheers -----Original Message----- From: David Garamond [mailto:davegaramond@;icqmail.com] Sent: Thursday, October 24, 2002 7:40 AM To: [EMAIL PROTECTED] Subject: detecting "visually similar" strings hi, i want to develop a perl module to determine whether two strings are too "visually similar". for example, "ella" (double lowercase ell) and "e11a" (double 1 digit). so this is kind of Soundex but for the eyes. this checking is usable for avoiding "typo attacks" (like in the case of an attacker registering an email account/username on a forum and sending mails/postings to make people think he's someone else). i'm also wondering whether there's a more scalable algorithm to check against thousands of existing strings. is there already a perl module on CPAN to do this? from a couple of quick searching, it doesn't seem to be. there's only String::Similarity, which is not exactly what i'm looking for. PS: i'm sorry if [EMAIL PROTECTED] is not the appropriate place for asking these kinds of questions, and in that case could someone please direct me to the right one, but so far this list has been the best, healthiest environment for perl support questions. -- dave -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]