Re: [HACKERS] PATCH: Allow empty targets in unaccent dictionary

2014-07-01 Thread Abhijit Menon-Sen
At 2014-06-30 22:06:30 -0400, t...@sss.pgh.pa.us wrote: > > I went ahead and committed this patch, and also some further work to > fix the multicharacter-source problem. I took it on myself to make > the code issue warnings about misformatted lines, too. Thanks, looks good. I found the multichara

Re: [HACKERS] PATCH: Allow empty targets in unaccent dictionary

2014-06-30 Thread Tom Lane
Abhijit Menon-Sen writes: > At 2014-06-30 15:19:17 -0400, t...@sss.pgh.pa.us wrote: >> It's not unlikely that we want this patch *and* an improvement that >> allows multi-character src strings > I think it's enough to apply just this patch, but I wouldn't object to > doing both if it were easy. I

Re: [HACKERS] PATCH: Allow empty targets in unaccent dictionary

2014-06-30 Thread Tom Lane
Abhijit Menon-Sen writes: > At 2014-06-30 15:19:17 -0400, t...@sss.pgh.pa.us wrote: >> Anyway, this raises the question of whether the current patch is >> actually a desirable way to do things, or whether it would be better >> if the unaccenting rules were like "base-char accent-char" -> >> "base-

Re: [HACKERS] PATCH: Allow empty targets in unaccent dictionary

2014-06-30 Thread Abhijit Menon-Sen
At 2014-06-30 15:19:17 -0400, t...@sss.pgh.pa.us wrote: > > Anyway, this raises the question of whether the current patch is > actually a desirable way to do things, or whether it would be better > if the unaccenting rules were like "base-char accent-char" -> > "base-char". It might be useful to b

Re: [HACKERS] PATCH: Allow empty targets in unaccent dictionary

2014-06-30 Thread Tom Lane
Abhijit Menon-Sen writes: > I've attached a patch to contrib/unaccent as outlined in my review the > other day. I went to commit this, and while testing I realized that the current implementation of unaccent_lexize is only capable of coping with "src" strings that are single characters in the cur

Re: [HACKERS] PATCH: Allow empty targets in unaccent dictionary

2014-06-29 Thread Mohammad Alhashash
Hi, Thanks a lot for the review and comments. Here is an updated patch. On 6/25/2014 8:20 AM, Abhijit Menon-Sen wrote: Your patch should definitely add a test case or two to sql/unaccent.sql and expected/unaccent.out showing the behaviour that didn't work before the change. That would require

Re: [HACKERS] PATCH: Allow empty targets in unaccent dictionary

2014-06-29 Thread Abhijit Menon-Sen
Hi. I've attached a patch to contrib/unaccent as outlined in my review the other day. I'm familiar with multiple languages in which modifiers are separate characters (but not Arabic), so I decided to try a quick test because I was curious. I added a line containing only U+0940 (DEVANAGARI VOWEL S

Re: [HACKERS] PATCH: Allow empty targets in unaccent dictionary

2014-06-24 Thread Abhijit Menon-Sen
Hi. At 2014-04-20 01:06:43 +0200, alhash...@alhashash.net wrote: > > To use unaccent dictionary for these languages, we need to allow empty > targets to remove diacritics instead of replacing them. Your patch should definitely add a test case or two to sql/unaccent.sql and expected/unaccent.out s

Re: [HACKERS] PATCH: Allow empty targets in unaccent dictionary

2014-04-20 Thread David Fetter
Please add this to the next commitfest. https://commitfest.postgresql.org/action/commitfest_view?id=22 Cheers, David. On Sun, Apr 20, 2014 at 01:06:43AM +0200, Mohammad Alhashash wrote: > Hi, > > Currently, unaccent extension only allows replacing one source > character with one or more target c

[HACKERS] PATCH: Allow empty targets in unaccent dictionary

2014-04-19 Thread Mohammad Alhashash
Hi, Currently, unaccent extension only allows replacing one source character with one or more target characters. In Arabic, Hebrew and possibly other languages, diacritics are standalone characters that are being added to normal letters. To use unaccent dictionary for these languages, we need