Howdy,

I don't know of any modules, but I have a few suggestions on how to
implement it.
For speed, I'd recommend caching a simplified 'eyedex' version of each
person's username, either as a new column in their main record, or in a
secondary list that is cross-indexed by the user_id.

Just recommendations, but I'd try:

make arrays to convert normal strings to cached 'eyedex' strings,
for each similar character group, make an array like of  "1, I, l, !, | "
for all the 'bar-like' characters.

When someone types in a username of 'EIIa' into a new username field to join
a msgboard, you can use a regex to make the cached version by searching for
characters that match elements [1] through [4]  (not [0]) and change them to
the character at [0].

This produces a storable string that turns 'Ella', 'E||a', 'E!!a', or 'EIIa'
(but not 'E11a') into just 'E11a'.

Users will never see this string, as it's not the real username, but its
stored for utility purposes.

Then, if a previous user had chosen 'Ella', and someone types 'EIIa' or any
similar version, you can match the new username's 'eyedex' of 'E11a' against
the 'eyedex' usernames and find a match as quickly as if you were matching
any normal ASCII matches because at that point they are.

I'd put the lists of similar characters into text files and load them at
runtime, so you can swap out and tweek different lists as needed if the
similarities are too confining or too open.  As long as the primary
'convert-to' values remain the same (that is, element [0] in this case),
then all the cached Eyedex values for the users will not break.  You may
like to code a 'by font-name-group' list so if you publish this module other
people with obcsure fonts can build their own lists for some pages and use
the standard lists for others.

When running the code,

The overhead is only in generating the new Eyedex string when a new user
enters a new username, and when you have to run a batch update on all the
existing usernames and cache thier Eyedex values.

After that, as long as the Eyedex values are stored in an as easily
accessible manner as the usernames, you can check that instead as fast as
checking agianst other username values, at the moment you have to validate
the new user name on a 'create new user' page.


The only place I can see this getting tricky is if you need a 7 to look like
a 9 and a 7 to look like an / but a 9 NOT to look like a /.

Keep in mind, when you run a batch for the first time you'll need to handle
re-existing similar names from before the new validation rule went into
practice.  If you run a check when you first build each username eyedex
agianst the existing list, you can build a report of the duplicates and deal
with that after.

Sometimes, someone will loose their password innocently and create a new
username that is visually similar without being guilty of a typo attack.

Anyway, I hope this helps

Cheers






-----Original Message-----
From: David Garamond [mailto:davegaramond@;icqmail.com]
Sent: Thursday, October 24, 2002 7:40 AM
To: [EMAIL PROTECTED]
Subject: detecting "visually similar" strings


hi,

i want to develop a perl module to determine whether two strings are too
"visually similar". for example, "ella" (double lowercase ell) and
"e11a" (double 1 digit). so this is kind of Soundex but for the eyes.
this checking is usable for avoiding "typo attacks" (like in the case of
an attacker registering an email account/username on a forum and sending
mails/postings to make people think he's someone else). i'm also
wondering whether there's a more scalable algorithm to check against
thousands of existing strings.

is there already a perl module on CPAN to do this? from a couple of
quick searching, it doesn't seem to be. there's only String::Similarity,
which is not exactly what i'm looking for.

PS: i'm sorry if [EMAIL PROTECTED] is not the appropriate place for
asking these kinds of questions, and in that case could someone please
direct me to the right one, but so far this list has been the best,
healthiest environment for perl support questions.

--
dave


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to