2010/8/28 Alexander Korotkov :
> Now test for levenshtein_less_equal performance.
Nice results. I'll try to find time to look at this.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
SELECT SUM(levenshtein(a, 'foo')) from words;
SELECT SUM(levenshtein(a, 'Urbański')) FROM words;
SELECT SUM(levenshtein(a, 'ańs')) FROM words;
SELECT SUM(levenshtein(a, 'foo')) from words2;
SELECT SUM(levenshtein(a, 'дом')) FROM words2;
SELECT SUM(levenshtein(a, 'компьютер')) FROM words2;
Before t
Here is the patch which adds levenshtein_less_equal function. I'm going to
add it to current commitfest.
With best regards,
Alexander Korotkov.
On Tue, Aug 3, 2010 at 3:23 AM, Robert Haas wrote:
> On Mon, Aug 2, 2010 at 5:07 PM, Alexander Korotkov
> wrote:
> > Now I think patch is as goo
On Aug 28, 2010, at 8:34 AM, Alexander Korotkov wrote:
> Here is the patch which adds levenshtein_less_equal function. I'm going to
> add it to current commitfest.
Cool. Please submit some performance results comparing levenshtein in HEAD vs.
levenshtein with this patch vs. levenshtein_less_equ
Now I think patch is as good as can be. :)
I'm going to prepare less-or-equal function in same manner as this patch.
With best regards,
Alexander Korotkov.
On Mon, Aug 2, 2010 at 5:20 AM, Robert Haas wrote:
> I reviewed this code in a fair amount of detail today and ended up
> rewriting it. In general terms, it's best to avoid changing things
> that are not relevant to the central purpose of the patch. This patch
> randomly adds a whole bunch of w
On Mon, Aug 2, 2010 at 5:07 PM, Alexander Korotkov wrote:
> Now I think patch is as good as can be. :)
OK, committed.
> I'm going to prepare less-or-equal function in same manner as this patch.
Sounds good. Since we're now more than half-way through this
CommitFest and this patch has undergone
2010/8/2 Alexander Korotkov :
> The dump of the table with russian dictionary is in attachment.
>
> I use following tests:
> SELECT SUM(levenshtein(a, 'foo')) from words;
> SELECT SUM(levenshtein(a, 'Urbański')) FROM words;
> SELECT SUM(levenshtein(a, 'ańs')) FROM words;
> SELECT SUM(levenshtein(a,
2010/8/2 Alexander Korotkov :
> On Mon, Aug 2, 2010 at 5:20 AM, Robert Haas wrote:
>> I reviewed this code in a fair amount of detail today and ended up
>> rewriting it. In general terms, it's best to avoid changing things
>> that are not relevant to the central purpose of the patch. This patch
On Fri, Jul 30, 2010 at 1:14 PM, Alexander Korotkov
wrote:
> Ok, here is the patch for multi-byte characters.
> I changed arguments of levenshtein_internal function from text * to const
> char * and int. I think that it makes levenshtein_internal more reusable.
> For example, this function can be
On Wed, Jul 21, 2010 at 5:59 PM, Robert Haas wrote:
> On Wed, Jul 21, 2010 at 2:47 PM, Alexander Korotkov
> wrote:
>> On Wed, Jul 21, 2010 at 10:25 PM, Robert Haas wrote:
>>>
>>> *scratches head* Aren't you just moving the same call to a different
>>> place?
>>
>> So, where you can find this di
I forgot attribution in levenshtein.c file.
With best regards,
Alexander Korotkov.
fuzzystrmatch-0.5.1.tar.gz
Description: GNU Zip compressed data
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref
Here is new version of my patch. There are following changes:
1) I've merged singlebyte and multibyte versions of levenshtein_internal and
levenshtein_less_equal_internal using macros and includes.
2) I found that levenshtein takes reasonable time even for long strings.
There is an example with st
Excerpts from Alexander Korotkov's message of jue jul 22 03:21:57 -0400 2010:
> On Thu, Jul 22, 2010 at 1:59 AM, Robert Haas wrote:
>
> > Ah, I see. That's pretty compelling, I guess. Although it still
> > seems like a lot of code...
> >
> I think there is a way to merge single-byte and multi-b
On Thu, Jul 22, 2010 at 1:59 AM, Robert Haas wrote:
> Ah, I see. That's pretty compelling, I guess. Although it still
> seems like a lot of code...
>
I think there is a way to merge single-byte and multi-byte versions of
functions without loss in performance using macros and includes (like in
'
Such version with macros and includes can look like this:
#ifdef MULTIBYTE
#define NEXT_X (x+= char_lens[i-1])
#define NEXT_Y (y+= y_char_len)
#define CMP (char_cmp(x, char_lens[i-1], y, y_char_len))
#else
#define NEXT_X (x++)
#define NEXT_Y (y++)
#define CMP (*x == *y)
#endif
static int
levensht
On Thu, Jul 22, 2010 at 3:21 AM, Alexander Korotkov
wrote:
> On Thu, Jul 22, 2010 at 1:59 AM, Robert Haas wrote:
>>
>> Ah, I see. That's pretty compelling, I guess. Although it still
>> seems like a lot of code...
>
> I think there is a way to merge single-byte and multi-byte versions of
> func
On Wed, Jul 21, 2010 at 2:47 PM, Alexander Korotkov
wrote:
> On Wed, Jul 21, 2010 at 10:25 PM, Robert Haas wrote:
>>
>> *scratches head* Aren't you just moving the same call to a different
>> place?
>
> So, where you can find this different place? :) In this patch
> null-terminated strings are n
Excerpts from Robert Haas's message of mié jul 21 14:25:47 -0400 2010:
> On Wed, Jul 21, 2010 at 7:40 AM, Alexander Korotkov
> wrote:
> > On Wed, Jul 21, 2010 at 5:54 AM, Robert Haas wrote:
> > Same benefit can be achived by replacing char * with
> > char * and length.
> > I changed !m to m == 0
On Wed, Jul 21, 2010 at 10:25 PM, Robert Haas wrote:
> *scratches head* Aren't you just moving the same call to a different
> place?
>
So, where you can find this different place? :) In this patch
null-terminated strings are not used at all.
> Yeah, we usually try to avoid changing that sort o
On Wed, Jul 21, 2010 at 7:40 AM, Alexander Korotkov
wrote:
> On Wed, Jul 21, 2010 at 5:54 AM, Robert Haas wrote:
>> This patch still needs some work. It includes a bunch of stylistic
>> changes that aren't relevant to the purpose of the patch. There's no
>> reason that I can see to change the e
On Wed, Jul 21, 2010 at 5:54 AM, Robert Haas wrote:
> This patch still needs some work. It includes a bunch of stylistic
> changes that aren't relevant to the purpose of the patch. There's no
> reason that I can see to change the existing levenshtein_internal
> function to take text arguments i
On Tue, Jul 20, 2010 at 3:37 AM, Itagaki Takahiro
wrote:
> 2010/7/13 Alexander Korotkov :
>> Anyway I think that overhead is not ignorable. That's why I have splited
>> levenshtein_internal into levenshtein_internal and levenshtein_internal_mb,
>> and levenshtein_less_equal_internal into levenshte
2010/7/13 Alexander Korotkov :
> Anyway I think that overhead is not ignorable. That's why I have splited
> levenshtein_internal into levenshtein_internal and levenshtein_internal_mb,
> and levenshtein_less_equal_internal into levenshtein_less_equal_internal and
> levenshtein_less_equal_internal_mb
Hi!
* levenshtein_internal() and levenshtein_less_equal_internal() are very
> similar. Can you merge the code? We can always use less_equal_internal()
> if the overhead is ignorable. Did you compare them?
>
With big value of max_d overhead is significant. Here is example on
american-english dict
Hi, I'm reviewing "Multibyte charater set in levenshtein function" patch.
https://commitfest.postgresql.org/action/patch_view?id=304
The main logic seems to be good, but I have some comments about
the coding style and refactoring.
* levenshtein_internal() and levenshtein_less_equal_internal() are
Hello Hackers!
I have extended my patch by introducing levenshtein_less_equal function.
This function have additional argument max_d and stops calculating when
distance exceeds max_d. With low values of max_d function works much faster
than original one.
The example of original levenshtein functi
On Wed, May 12, 2010 at 11:04 PM, Alvaro Herrera wrote:
> On a quick look, I didn't like the way you separated the
> "pg_database_encoding_max_length() > 1" cases. There seem to be too
> much common code. Can that be refactored a bit better?
>
I did a little refactoring in order to avoid some si
On Thu, May 13, 2010 at 6:03 AM, Alvaro Herrera wrote:
> Well, since it's only used in one place, why are you defining a macro at
> all?
>
In order to structure code better. My question was about another. Is memcmp
function good choice to compare very short sequences of bytes (from 1 to 4
bytes)
Alexander Korotkov escribió:
> On Wed, May 12, 2010 at 11:04 PM, Alvaro Herrera
> wrote:
>
> > On a quick look, I didn't like the way you separated the
> > "pg_database_encoding_max_length() > 1" cases. There seem to be too
> > much common code. Can that be refactored a bit better?
> >
> I did
Excerpts from Alexander Korotkov's message of lun may 10 11:35:02 -0400 2010:
> Hackers,
>
> The current version of levenshtein function in fuzzystrmatch contrib modulte
> doesn't work properly with multibyte charater sets.
> My patch make this function works properly with multibyte charater sets
Hackers,
The current version of levenshtein function in fuzzystrmatch contrib modulte
doesn't work properly with multibyte charater sets.
test=# select levenshtein('фыва','аыва');
levenshtein
-
2
(1 row)
My patch make this function works properly with multibyte charater s
32 matches
Mail list logo