ion, you mentioned the data you need to run comparisons on
is stored in a database. Is this string comparison a one-time
processing kind of thing to clean up the data, or are you going to have
to continually do fuzzy string comparison on the data in the database?
There are some papers out ther
ion, you mentioned the data you need to run comparisons on
is stored in a database. Is this string comparison a one-time
processing kind of thing to clean up the data, or are you going to have
to continually do fuzzy string comparison on the data in the database?
There are some papers out ther
At Wednesday 27/12/2006 18:59, John Machin wrote:
> Thanks, all. Yes, Levenshtein seems to be the magic word I was looking
> for. (It's blazingly fast, too.)
In case you need something more, this article is a good starting point:
Record Linkage: A Machine Learning Approach, A Toolbox, and A D
Steve Bergman wrote:
> Thanks, all. Yes, Levenshtein seems to be the magic word I was looking
> for. (It's blazingly fast, too.)
>
> I suspect that if I strip out all the punctuation, etc. from both the
> itemnumber and description columns, as suggested, and concatenate them,
> pairing the record
Thanks, all. Yes, Levenshtein seems to be the magic word I was looking
for. (It's blazingly fast, too.)
I suspect that if I strip out all the punctuation, etc. from both the
itemnumber and description columns, as suggested, and concatenate them,
pairing the record with its closest match in the ot
On Wed, 27 Dec 2006 02:52:42 -0800, John Machin wrote:
>
> Duncan Booth wrote:
>> "John Machin" <[EMAIL PROTECTED]> wrote:
>>
>> > To compare two strings, take copies, and:
>>
>> Taking a copy of a string seems kind of superfluous in Python.
>
> You are right, I really meant don't do:
> orig
Duncan Booth wrote:
> "John Machin" <[EMAIL PROTECTED]> wrote:
>
> > To compare two strings, take copies, and:
>
> Taking a copy of a string seems kind of superfluous in Python.
You are right, I really meant don't do:
original = original.strip().replace().replace()
(a strange way of d
"Steve Bergman" <[EMAIL PROTECTED]> writes:
> I'm looking for a module to do fuzzy comparison of strings. I have 2
> item master files which are supposed to be identical, but they have
> thousands of records where the item numbers don't match in various
> ways. One might include a '-' or have le
"John Machin" <[EMAIL PROTECTED]> wrote:
> To compare two strings, take copies, and:
Taking a copy of a string seems kind of superfluous in Python.
--
http://mail.python.org/mailman/listinfo/python-list
Carsten Haese wrote:
> On Tue, 2006-12-26 at 13:08 -0800, John Machin wrote:
> > Wojciech Mula wrote:
> > > Steve Bergman wrote:
> > > > I'm looking for a module to do fuzzy comparison of strings. [...]
> > >
> > > Check module difflib, it returns difference between two sequences.
> >
> > and it's
At Tuesday 26/12/2006 18:08, John Machin wrote:
Wojciech Mula wrote:
> Steve Bergman wrote:
> > I'm looking for a module to do fuzzy comparison of strings. [...]
>
> Check module difflib, it returns difference between two sequences.
and it's intended for comparing text files, and is relatively
On Tue, 2006-12-26 at 13:08 -0800, John Machin wrote:
> Wojciech Mula wrote:
> > Steve Bergman wrote:
> > > I'm looking for a module to do fuzzy comparison of strings. [...]
> >
> > Check module difflib, it returns difference between two sequences.
>
> and it's intended for comparing text files, a
Wojciech Mula wrote:
> Steve Bergman wrote:
> > I'm looking for a module to do fuzzy comparison of strings. [...]
>
> Check module difflib, it returns difference between two sequences.
and it's intended for comparing text files, and is relatively slow.
Google "python levenshtein". You'll probably
Steve Bergman wrote:
> I'm looking for a module to do fuzzy comparison of strings. [...]
Check module difflib, it returns difference between two sequences.
--
http://mail.python.org/mailman/listinfo/python-list
I'm looking for a module to do fuzzy comparison of strings. I have 2
item master files which are supposed to be identical, but they have
thousands of records where the item numbers don't match in various
ways. One might include a '-' or have leading zeros, or have a single
character missing, or a
15 matches
Mail list logo