Sorry, I don't know why this thread was moved to another thread.
I duplicate the patch here.
On 28.01.2016 14:19, Alvaro Herrera wrote:
Artur Zakirov wrote:
I undo the changes and the error will be raised. I will update the patch
soon.
I don't think you ever did this. I'm closing it now, but it sounds
useful stuff so please do resubmit for 2016-03.
I'm working on the patch. I wanted to send this changes after all changes.
This version of the patch has a top-level comment. Another comments I will
provides soon.
Also this patch has some changes with ternary operators.
I don't think you ever did this. I'm closing it now, but it sounds
useful stuff so please do resubmit for 2016-03.
Moved to next CF.
--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
*** a/doc/src/sgml/textsearch.sgml
--- b/doc/src/sgml/textsearch.sgml
***************
*** 2615,2632 **** SELECT plainto_tsquery('supernova star');
</para>
<para>
! To create an <application>Ispell</> dictionary, use the built-in
! <literal>ispell</literal> template and specify several parameters:
</para>
!
<programlisting>
! CREATE TEXT SEARCH DICTIONARY english_ispell (
TEMPLATE = ispell,
! DictFile = english,
! AffFile = english,
! StopWords = english
! );
</programlisting>
<para>
Here, <literal>DictFile</>, <literal>AffFile</>, and <literal>StopWords</>
--- 2615,2655 ----
</para>
<para>
! To create an <application>Ispell</> dictionary perform these steps:
</para>
! <itemizedlist spacing="compact" mark="bullet">
! <listitem>
! <para>
! download dictionary configuration files. <productname>OpenOffice</>
! extension files have the <filename>.oxt</> extension. It is necessary
! to extract <filename>.aff</> and <filename>.dic</> files, change extensions
! to <filename>.affix</> and <filename>.dict</>. For some dictionary
! files it is also needed to convert characters to the UTF-8 encoding
! with commands (for example, for norwegian language dictionary):
<programlisting>
! iconv -f ISO_8859-1 -t UTF-8 -o nn_no.affix nn_NO.aff
! iconv -f ISO_8859-1 -t UTF-8 -o nn_no.dict nn_NO.dic
! </programlisting>
! </para>
! </listitem>
! <listitem>
! <para>
! copy files to the <filename>$SHAREDIR/tsearch_data</> directory
! </para>
! </listitem>
! <listitem>
! <para>
! load files into PostgreSQL with the following command:
! <programlisting>
! CREATE TEXT SEARCH DICTIONARY english_hunspell (
TEMPLATE = ispell,
! DictFile = en_us,
! AffFile = en_us,
! Stopwords = english);
</programlisting>
+ </para>
+ </listitem>
+ </itemizedlist>
<para>
Here, <literal>DictFile</>, <literal>AffFile</>, and <literal>StopWords</>
***************
*** 2643,2648 **** CREATE TEXT SEARCH DICTIONARY english_ispell (
--- 2666,2720 ----
</para>
<para>
+ The <filename>.affix</> file of <application>Ispell</> has the following structure:
+ <programlisting>
+ prefixes
+ flag *A:
+ . > RE # As in enter > reenter
+ suffixes
+ flag T:
+ E > ST # As in late > latest
+ [^AEIOU]Y > -Y,IEST # As in dirty > dirtiest
+ [AEIOU]Y > EST # As in gray > grayest
+ [^EY] > EST # As in small > smallest
+ </programlisting>
+ </para>
+ <para>
+ And the <filename>.dict</> file has the following structure:
+ <programlisting>
+ lapse/ADGRS
+ lard/DGRS
+ large/PRTY
+ lark/MRS
+ </programlisting>
+ </para>
+
+ <para>
+ Format of the <filename>.dict</> file is:
+ <programlisting>
+ basic_form/affix_class_name
+ </programlisting>
+ </para>
+
+ <para>
+ In the <filename>.affix</> file every affix flag is described in the
+ following format:
+ <programlisting>
+ condition > [-stripping_letters,] adding_affix
+ </programlisting>
+ </para>
+
+ <para>
+ Here, condition has a format similar to the format of regular expressions.
+ It can use groupings <literal>[...]</> and <literal>[^...]</>.
+ For example, <literal>[AEIOU]Y</> means that the last letter of the word
+ is <literal>"y"</> and the penultimate letter is <literal>"a"</>,
+ <literal>"e"</>, <literal>"i"</>, <literal>"o"</> or <literal>"u"</>.
+ <literal>[^EY]</> means that the last letter is neither <literal>"e"</>
+ nor <literal>"y"</>.
+ </para>
+
+ <para>
Ispell dictionaries support splitting compound words;
a useful feature.
Notice that the affix file should specify a special flag using the
***************
*** 2663,2668 **** SELECT ts_lexize('norwegian_ispell', 'sjokoladefabrikk');
--- 2735,2796 ----
</programlisting>
</para>
+ <para>
+ <application>MySpell</> is very similar to <application>Hunspell</>.
+ The <filename>.affix</> file of <application>Hunspell</> has the following structure:
+ <programlisting>
+ PFX A Y 1
+ PFX A 0 re .
+ SFX T N 4
+ SFX T 0 st e
+ SFX T y iest [^aeiou]y
+ SFX T 0 est [aeiou]y
+ SFX T 0 est [^ey]
+ </programlisting>
+ </para>
+
+ <para>
+ The first line of an affix class is the header. Fields of an affix rules are listed after the header:
+ </para>
+ <itemizedlist spacing="compact" mark="bullet">
+ <listitem>
+ <para>
+ parameter name (PFX or SFX)
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ flag (name of the affix class)
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ stripping characters from beginning (at prefix) or end (at suffix) of the word
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ adding affix
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ condition that has a format similar to the format of regular expressions.
+ </para>
+ </listitem>
+ </itemizedlist>
+
+ <para>
+ The <filename>.dict</> file looks like the <filename>.dict</> file of
+ <application>Ispell</>:
+ <programlisting>
+ larder/M
+ lardy/RT
+ large/RSPMYT
+ largehearted
+ </programlisting>
+ </para>
+
<note>
<para>
<application>MySpell</> does not support compound words.
*** a/src/backend/tsearch/spell.c
--- b/src/backend/tsearch/spell.c
***************
*** 5,10 ****
--- 5,54 ----
*
* Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
*
+ * Ispell dictionary
+ * --------------------------------
+ *
+ * Rules of dictionaries are defined in two files with .affix and .dict
+ * extensions. They are used by spell checker programs Ispell and Hunspell.
+ *
+ * An .affix file declares morphological rules to get a basic form of words.
+ * The format of an .affix file has different structure for Ispell and Hunspell
+ * dictionaries. The Hunspell format is more complicated. But when an .affix
+ * file is imported and compiled, it is stored in the same structure AffixNode.
+ *
+ * A .dict file stores a list of basic forms of words with references to
+ * affix rules. The format of a .dict file has the same structure for Ispell
+ * and Hunspell dictionaries.
+ *
+ * Compilation of a dictionary
+ * ---------------------------
+ *
+ * A compiled dictionary is stored in the IspellDict structure. Compilation of
+ * a dictionary is divided into the several steps:
+ * - NIImportDictionary() - stores each word of a .dict file in the
+ * temporary Spell field.
+ * - NIImportAffixes() - stores affix rules of an .affix file in the
+ * Affix field (not temporary) if an .affix file has the Ispell format.
+ * -> NIImportOOAffixes() - stores affix rules if an .affix file has the
+ * Hunspell format. The AffixData field is initialized if AF parameter
+ * is defined.
+ * - NISortDictionary() - builds a prefix tree (Trie) from the words list
+ * and stores it in the Dictionary field. The AffixData field is initialized
+ * if AF parameter is not defined.
+ * - NISortAffixes():
+ * - builds a list of compond affixes and stores it in the CompoundAffix.
+ * - builds prefix trees (Trie) from the affix list for prefixes and suffixes
+ * and stores them in Suffix and Prefix fields.
+ *
+ * Memory management
+ * -----------------
+ *
+ * The IspellDict structure has the Spell field which is used only in compile
+ * time. The Spell field stores a words list. It can take a lot of memory.
+ * Therefore when a dictionary is compiled this field is cleared by NIFinishBuild.
+ *
+ * All resources which should cleared by NIFinishBuild is initialized using
+ * tmpalloc() and tmpalloc0().
*
* IDENTIFICATION
* src/backend/tsearch/spell.c
***************
*** 153,159 **** cmpspell(const void *s1, const void *s2)
static int
cmpspellaffix(const void *s1, const void *s2)
{
! return (strncmp((*(SPELL *const *) s1)->p.flag, (*(SPELL *const *) s2)->p.flag, MAXFLAGLEN));
}
static char *
--- 197,203 ----
static int
cmpspellaffix(const void *s1, const void *s2)
{
! return (strncmp((*(SPELL *const *) s1)->flag, (*(SPELL *const *) s2)->flag, MAXFLAGLEN));
}
static char *
***************
*** 237,242 **** cmpaffix(const void *s1, const void *s2)
--- 281,353 ----
(const unsigned char *) a2->repl);
}
+ static unsigned short
+ decodeFlag(IspellDict *Conf, char *sflag, char **sflagnext)
+ {
+ unsigned short s;
+ char *next;
+
+ switch (Conf->flagMode)
+ {
+ case FM_LONG:
+ s = (int)sflag[0] << 8 | (int)sflag[1];
+ if (sflagnext)
+ *sflagnext = sflag + 2;
+ break;
+ case FM_NUM:
+ s = (unsigned short) strtol(sflag, &next, 10);
+ if (sflagnext)
+ {
+ if (next)
+ {
+ *sflagnext = next;
+ while (**sflagnext)
+ {
+ if (**sflagnext == ',')
+ {
+ *sflagnext = *sflagnext + 1;
+ break;
+ }
+ *sflagnext = *sflagnext + 1;
+ }
+ }
+ else
+ *sflagnext = 0;
+ }
+ break;
+ default:
+ s = (unsigned short) *((unsigned char *)sflag);
+ if (sflagnext)
+ *sflagnext = sflag + 1;
+ }
+
+ return s;
+ }
+
+ static bool
+ isAffixFlagInUse(IspellDict *Conf, int affix, unsigned short affixflag)
+ {
+ char *flagcur;
+ char *flagnext = 0;
+
+ if (affixflag == 0)
+ return true;
+
+ flagcur = Conf->AffixData[affix];
+
+ while (*flagcur)
+ {
+ if (decodeFlag(Conf, flagcur, &flagnext) == affixflag)
+ return true;
+ if (flagnext)
+ flagcur = flagnext;
+ else
+ break;
+ }
+
+ return false;
+ }
+
static void
NIAddSpell(IspellDict *Conf, const char *word, const char *flag)
{
***************
*** 255,261 **** NIAddSpell(IspellDict *Conf, const char *word, const char *flag)
}
Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1);
strcpy(Conf->Spell[Conf->nspell]->word, word);
! strlcpy(Conf->Spell[Conf->nspell]->p.flag, flag, MAXFLAGLEN);
Conf->nspell++;
}
--- 366,372 ----
}
Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1);
strcpy(Conf->Spell[Conf->nspell]->word, word);
! Conf->Spell[Conf->nspell]->flag = (*flag != '\0') ? cpstrdup(Conf, flag) : VoidString;
Conf->nspell++;
}
***************
*** 355,361 **** FindWord(IspellDict *Conf, const char *word, int affixflag, int flag)
else if ((flag & StopMiddle->compoundflag) == 0)
return 0;
! if ((affixflag == 0) || (strchr(Conf->AffixData[StopMiddle->affix], affixflag) != NULL))
return 1;
}
node = StopMiddle->node;
--- 466,472 ----
else if ((flag & StopMiddle->compoundflag) == 0)
return 0;
! if (isAffixFlagInUse(Conf, StopMiddle->affix, affixflag))
return 1;
}
node = StopMiddle->node;
***************
*** 394,400 **** NIAddAffix(IspellDict *Conf, int flag, char flagflags, const char *mask, const c
Affix = Conf->Affix + Conf->naffixes;
! if (strcmp(mask, ".") == 0)
{
Affix->issimple = 1;
Affix->isregis = 0;
--- 505,511 ----
Affix = Conf->Affix + Conf->naffixes;
! if (strcmp(mask, ".") == 0 || *mask == '\0')
{
Affix->issimple = 1;
Affix->isregis = 0;
***************
*** 403,409 **** NIAddAffix(IspellDict *Conf, int flag, char flagflags, const char *mask, const c
{
Affix->issimple = 0;
Affix->isregis = 1;
! RS_compile(&(Affix->reg.regis), (type == FF_SUFFIX) ? true : false,
*mask ? mask : VoidString);
}
else
--- 514,520 ----
{
Affix->issimple = 0;
Affix->isregis = 1;
! RS_compile(&(Affix->reg.regis), (type == FF_SUFFIX),
*mask ? mask : VoidString);
}
else
***************
*** 576,582 **** parse_affentry(char *str, char *mask, char *find, char *repl)
*pmask = *pfind = *prepl = '\0';
! return (*mask && (*find || *repl)) ? true : false;
}
static void
--- 687,693 ----
*pmask = *pfind = *prepl = '\0';
! return (*mask && (*find || *repl));
}
static void
***************
*** 595,604 **** addFlagValue(IspellDict *Conf, char *s, uint32 val)
(errcode(ERRCODE_CONFIG_FILE_ERROR),
errmsg("multibyte flag character is not allowed")));
! Conf->flagval[*(unsigned char *) s] = (unsigned char) val;
Conf->usecompound = true;
}
/*
* Import an affix file that follows MySpell or Hunspell format
*/
--- 706,763 ----
(errcode(ERRCODE_CONFIG_FILE_ERROR),
errmsg("multibyte flag character is not allowed")));
! Conf->flagval[decodeFlag(Conf, s, (char **)NULL)] = (unsigned char) val;
Conf->usecompound = true;
}
+ static int
+ getFlagValues(IspellDict *Conf, char *s)
+ {
+ uint32 flag = 0;
+ char *flagcur;
+ char *flagnext = 0;
+
+ flagcur = s;
+ while (*flagcur)
+ {
+ flag |= Conf->flagval[decodeFlag(Conf, flagcur, &flagnext)];
+ if (flagnext)
+ flagcur = flagnext;
+ else
+ break;
+ }
+
+ return flag;
+ }
+
+ /*
+ * Get flag set from "s".
+ *
+ * Returns flag set from AffixData array if AF parameter used (useFlagAliases is true).
+ * In this case "s" is alias for flag set.
+ *
+ * Otherwise returns "s".
+ */
+ static char *
+ getFlags(IspellDict *Conf, char *s)
+ {
+ int curaffix;
+ if (Conf->useFlagAliases)
+ {
+ curaffix = strtol(s, (char **)NULL, 10);
+ if (curaffix && curaffix <= Conf->nAffixData)
+ /*
+ * Do not substract 1 from curaffix
+ * because empty string was added in NIImportOOAffixes
+ */
+ return Conf->AffixData[curaffix];
+ else
+ return VoidString;
+ }
+ else
+ return s;
+ }
+
/*
* Import an affix file that follows MySpell or Hunspell format
*/
***************
*** 615,621 **** NIImportOOAffixes(IspellDict *Conf, const char *filename)
char repl[BUFSIZ],
*prepl;
bool isSuffix = false;
! int flag = 0;
char flagflags = 0;
tsearch_readline_state trst;
int scanread = 0;
--- 774,784 ----
char repl[BUFSIZ],
*prepl;
bool isSuffix = false;
! int naffix = 0,
! curaffix = 0;
! int flag = 0,
! flagprev = 0,
! sflaglen = 0;
char flagflags = 0;
tsearch_readline_state trst;
int scanread = 0;
***************
*** 625,630 **** NIImportOOAffixes(IspellDict *Conf, const char *filename)
--- 788,795 ----
/* read file to find any flag */
memset(Conf->flagval, 0, sizeof(Conf->flagval));
Conf->usecompound = false;
+ Conf->useFlagAliases = false;
+ Conf->flagMode = FM_CHAR;
if (!tsearch_readline_begin(&trst, filename))
ereport(ERROR,
***************
*** 672,681 **** NIImportOOAffixes(IspellDict *Conf, const char *filename)
while (*s && t_isspace(s))
s += pg_mblen(s);
! if (*s && STRNCMP(s, "default") != 0)
! ereport(ERROR,
(errcode(ERRCODE_CONFIG_FILE_ERROR),
! errmsg("Ispell dictionary supports only default flag value")));
}
pfree(recoded);
--- 837,853 ----
while (*s && t_isspace(s))
s += pg_mblen(s);
! if (*s)
! {
! if (STRNCMP(s, "long") == 0)
! Conf->flagMode = FM_LONG;
! else if (STRNCMP(s, "num") == 0)
! Conf->flagMode = FM_NUM;
! else if (STRNCMP(s, "default") != 0)
! ereport(ERROR,
(errcode(ERRCODE_CONFIG_FILE_ERROR),
! errmsg("Ispell dictionary supports only default, long and num flag value")));
! }
}
pfree(recoded);
***************
*** 695,725 **** NIImportOOAffixes(IspellDict *Conf, const char *filename)
if (*recoded == '\0' || t_isspace(recoded) || t_iseq(recoded, '#'))
goto nextline;
scanread = sscanf(recoded, scanbuf, type, sflag, find, repl, mask);
if (ptype)
pfree(ptype);
ptype = lowerstr_ctx(Conf, type);
if (scanread < 4 || (STRNCMP(ptype, "sfx") && STRNCMP(ptype, "pfx")))
goto nextline;
! if (scanread == 4)
{
! if (strlen(sflag) != 1)
! goto nextline;
! flag = *sflag;
! isSuffix = (STRNCMP(ptype, "sfx") == 0) ? true : false;
if (t_iseq(find, 'y') || t_iseq(find, 'Y'))
flagflags = FF_CROSSPRODUCT;
else
flagflags = 0;
}
else
{
char *ptr;
int aflg = 0;
! if (strlen(sflag) != 1 || flag != *sflag || flag == 0)
goto nextline;
prepl = lowerstr_ctx(Conf, repl);
/* affix flag */
--- 867,941 ----
if (*recoded == '\0' || t_isspace(recoded) || t_iseq(recoded, '#'))
goto nextline;
+ *find = *repl = *mask = '\0';
scanread = sscanf(recoded, scanbuf, type, sflag, find, repl, mask);
if (ptype)
pfree(ptype);
ptype = lowerstr_ctx(Conf, type);
+
+ /* First try to parse AF parameter (alias compression) */
+ if (STRNCMP(ptype, "af") == 0)
+ {
+ /* First line is the number of aliases */
+ if (!Conf->useFlagAliases)
+ {
+ Conf->useFlagAliases = true;
+ naffix = atoi(sflag);
+ if (naffix == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_CONFIG_FILE_ERROR),
+ errmsg("invalid number of flag vector aliases")));
+
+ /* Also reserve place for empty flag set */
+ naffix++;
+
+ Conf->AffixData = (char **) palloc0(naffix * sizeof(char *));
+ Conf->lenAffixData = Conf->nAffixData = naffix;
+
+ /* Add empty flag set into AffixData */
+ Conf->AffixData[curaffix] = VoidString;
+ curaffix++;
+ }
+ /* Other lines is aliases */
+ else
+ {
+ if (curaffix < naffix)
+ {
+ Conf->AffixData[curaffix] = cpstrdup(Conf, sflag);
+ curaffix++;
+ }
+ }
+ goto nextline;
+ }
+ /* Else try to parse prefixes and suffixes */
if (scanread < 4 || (STRNCMP(ptype, "sfx") && STRNCMP(ptype, "pfx")))
goto nextline;
! sflaglen = strlen(sflag);
! if (sflaglen == 0
! || (sflaglen > 1 && Conf->flagMode == FM_CHAR)
! || (sflaglen > 2 && Conf->flagMode == FM_LONG))
! goto nextline;
! flag = decodeFlag(Conf, sflag, (char **)NULL);
!
! /* Affix header */
! if (flag != flagprev)
{
! flagprev = flag;
! isSuffix = (STRNCMP(ptype, "sfx") == 0);
if (t_iseq(find, 'y') || t_iseq(find, 'Y'))
flagflags = FF_CROSSPRODUCT;
else
flagflags = 0;
}
+ /* Affix fields */
else
{
char *ptr;
int aflg = 0;
! if (flag == 0)
goto nextline;
prepl = lowerstr_ctx(Conf, repl);
/* affix flag */
***************
*** 727,737 **** NIImportOOAffixes(IspellDict *Conf, const char *filename)
{
*ptr = '\0';
ptr = repl + (ptr - prepl) + 1;
! while (*ptr)
! {
! aflg |= Conf->flagval[*(unsigned char *) ptr];
! ptr++;
! }
}
pfind = lowerstr_ctx(Conf, find);
pmask = lowerstr_ctx(Conf, mask);
--- 943,949 ----
{
*ptr = '\0';
ptr = repl + (ptr - prepl) + 1;
! aflg |= getFlagValues(Conf, getFlags(Conf, ptr));
}
pfind = lowerstr_ctx(Conf, find);
pmask = lowerstr_ctx(Conf, mask);
***************
*** 789,794 **** NIImportAffixes(IspellDict *Conf, const char *filename)
--- 1001,1008 ----
memset(Conf->flagval, 0, sizeof(Conf->flagval));
Conf->usecompound = false;
+ Conf->useFlagAliases = false;
+ Conf->flagMode = FM_CHAR;
while ((recoded = tsearch_readline(&trst)) != NULL)
{
***************
*** 931,946 **** MergeAffix(IspellDict *Conf, int a1, int a2)
static uint32
makeCompoundFlags(IspellDict *Conf, int affix)
{
! uint32 flag = 0;
! char *str = Conf->AffixData[affix];
!
! while (str && *str)
! {
! flag |= Conf->flagval[*(unsigned char *) str];
! str++;
! }
!
! return (flag & FF_DICTFLAGMASK);
}
static SPNode *
--- 1145,1152 ----
static uint32
makeCompoundFlags(IspellDict *Conf, int affix)
{
! char *str = Conf->AffixData[affix];
! return (getFlagValues(Conf, str) & FF_DICTFLAGMASK);
}
static SPNode *
***************
*** 954,960 **** mkSPNode(IspellDict *Conf, int low, int high, int level)
int lownew = low;
for (i = low; i < high; i++)
! if (Conf->Spell[i]->p.d.len > level && lastchar != Conf->Spell[i]->word[level])
{
nchar++;
lastchar = Conf->Spell[i]->word[level];
--- 1160,1166 ----
int lownew = low;
for (i = low; i < high; i++)
! if (Conf->Spell[i]->d.len > level && lastchar != Conf->Spell[i]->word[level])
{
nchar++;
lastchar = Conf->Spell[i]->word[level];
***************
*** 969,975 **** mkSPNode(IspellDict *Conf, int low, int high, int level)
lastchar = '\0';
for (i = low; i < high; i++)
! if (Conf->Spell[i]->p.d.len > level)
{
if (lastchar != Conf->Spell[i]->word[level])
{
--- 1175,1181 ----
lastchar = '\0';
for (i = low; i < high; i++)
! if (Conf->Spell[i]->d.len > level)
{
if (lastchar != Conf->Spell[i]->word[level])
{
***************
*** 982,992 **** mkSPNode(IspellDict *Conf, int low, int high, int level)
lastchar = Conf->Spell[i]->word[level];
}
data->val = ((uint8 *) (Conf->Spell[i]->word))[level];
! if (Conf->Spell[i]->p.d.len == level + 1)
{
bool clearCompoundOnly = false;
! if (data->isword && data->affix != Conf->Spell[i]->p.d.affix)
{
/*
* MergeAffix called a few times. If one of word is
--- 1188,1198 ----
lastchar = Conf->Spell[i]->word[level];
}
data->val = ((uint8 *) (Conf->Spell[i]->word))[level];
! if (Conf->Spell[i]->d.len == level + 1)
{
bool clearCompoundOnly = false;
! if (data->isword && data->affix != Conf->Spell[i]->d.affix)
{
/*
* MergeAffix called a few times. If one of word is
***************
*** 995,1006 **** mkSPNode(IspellDict *Conf, int low, int high, int level)
*/
clearCompoundOnly = (FF_COMPOUNDONLY & data->compoundflag
! & makeCompoundFlags(Conf, Conf->Spell[i]->p.d.affix))
? false : true;
! data->affix = MergeAffix(Conf, data->affix, Conf->Spell[i]->p.d.affix);
}
else
! data->affix = Conf->Spell[i]->p.d.affix;
data->isword = 1;
data->compoundflag = makeCompoundFlags(Conf, data->affix);
--- 1201,1212 ----
*/
clearCompoundOnly = (FF_COMPOUNDONLY & data->compoundflag
! & makeCompoundFlags(Conf, Conf->Spell[i]->d.affix))
? false : true;
! data->affix = MergeAffix(Conf, data->affix, Conf->Spell[i]->d.affix);
}
else
! data->affix = Conf->Spell[i]->d.affix;
data->isword = 1;
data->compoundflag = makeCompoundFlags(Conf, data->affix);
***************
*** 1032,1070 **** NISortDictionary(IspellDict *Conf)
/* compress affixes */
! /* Count the number of different flags used in the dictionary */
!
! qsort((void *) Conf->Spell, Conf->nspell, sizeof(SPELL *), cmpspellaffix);
!
! naffix = 0;
! for (i = 0; i < Conf->nspell; i++)
! {
! if (i == 0 || strncmp(Conf->Spell[i]->p.flag, Conf->Spell[i - 1]->p.flag, MAXFLAGLEN))
! naffix++;
! }
!
! /*
! * Fill in Conf->AffixData with the affixes that were used in the
! * dictionary. Replace textual flag-field of Conf->Spell entries with
! * indexes into Conf->AffixData array.
*/
! Conf->AffixData = (char **) palloc0(naffix * sizeof(char *));
!
! curaffix = -1;
! for (i = 0; i < Conf->nspell; i++)
{
! if (i == 0 || strncmp(Conf->Spell[i]->p.flag, Conf->AffixData[curaffix], MAXFLAGLEN))
{
! curaffix++;
! Assert(curaffix < naffix);
! Conf->AffixData[curaffix] = cpstrdup(Conf, Conf->Spell[i]->p.flag);
}
-
- Conf->Spell[i]->p.d.affix = curaffix;
- Conf->Spell[i]->p.d.len = strlen(Conf->Spell[i]->word);
}
! Conf->lenAffixData = Conf->nAffixData = naffix;
qsort((void *) Conf->Spell, Conf->nspell, sizeof(SPELL *), cmpspell);
Conf->Dictionary = mkSPNode(Conf, 0, Conf->nspell, 0);
--- 1238,1294 ----
/* compress affixes */
! /* If we use flag aliases then we need to use Conf->AffixData filled in NIImportOOAffixes.
! * If Conf->Spell[i]->flag is empty, then get empty value of Conf->AffixData (0 index)
*/
! if (Conf->useFlagAliases)
{
! for (i = 0; i < Conf->nspell; i++)
{
! curaffix = strtol(Conf->Spell[i]->flag, (char **)NULL, 10);
! if (curaffix && curaffix <= Conf->nAffixData)
! Conf->Spell[i]->d.affix = curaffix;
! else
! Conf->Spell[i]->d.affix = 0;
! Conf->Spell[i]->d.len = strlen(Conf->Spell[i]->word);
}
}
+ /* Otherwise fill Conf->AffixData here */
+ else
+ {
+ /* Count the number of different flags used in the dictionary */
+ qsort((void *) Conf->Spell, Conf->nspell, sizeof(SPELL *), cmpspellaffix);
+
+ naffix = 0;
+ for (i = 0; i < Conf->nspell; i++)
+ {
+ if (i == 0 || strcmp(Conf->Spell[i]->flag, Conf->Spell[i - 1]->flag))
+ naffix++;
+ }
! /*
! * Fill in Conf->AffixData with the affixes that were used in the
! * dictionary. Replace textual flag-field of Conf->Spell entries with
! * indexes into Conf->AffixData array.
! */
! Conf->AffixData = (char **) palloc0(naffix * sizeof(char *));
!
! curaffix = -1;
! for (i = 0; i < Conf->nspell; i++)
! {
! if (i == 0 || strcmp(Conf->Spell[i]->flag, Conf->AffixData[curaffix]))
! {
! curaffix++;
! Assert(curaffix < naffix);
! Conf->AffixData[curaffix] = cpstrdup(Conf, Conf->Spell[i]->flag);
! }
!
! Conf->Spell[i]->d.affix = curaffix;
! Conf->Spell[i]->d.len = strlen(Conf->Spell[i]->word);
! }
!
! Conf->lenAffixData = Conf->nAffixData = naffix;
! }
qsort((void *) Conf->Spell, Conf->nspell, sizeof(SPELL *), cmpspell);
Conf->Dictionary = mkSPNode(Conf, 0, Conf->nspell, 0);
***************
*** 1185,1196 **** mkVoidAffix(IspellDict *Conf, bool issuffix, int startsuffix)
}
static bool
! isAffixInUse(IspellDict *Conf, char flag)
{
int i;
for (i = 0; i < Conf->nAffixData; i++)
! if (strchr(Conf->AffixData[i], flag) != NULL)
return true;
return false;
--- 1409,1420 ----
}
static bool
! isAffixInUse(IspellDict *Conf, int flag)
{
int i;
for (i = 0; i < Conf->nAffixData; i++)
! if (isAffixFlagInUse(Conf, i, flag))
return true;
return false;
***************
*** 1219,1225 **** NISortAffixes(IspellDict *Conf)
firstsuffix = i;
if ((Affix->flagflags & FF_COMPOUNDFLAG) && Affix->replen > 0 &&
! isAffixInUse(Conf, (char) Affix->flag))
{
if (ptr == Conf->CompoundAffix ||
ptr->issuffix != (ptr - 1)->issuffix ||
--- 1443,1449 ----
firstsuffix = i;
if ((Affix->flagflags & FF_COMPOUNDFLAG) && Affix->replen > 0 &&
! isAffixInUse(Conf, Affix->flag))
{
if (ptr == Conf->CompoundAffix ||
ptr->issuffix != (ptr - 1)->issuffix ||
***************
*** 1230,1236 **** NISortAffixes(IspellDict *Conf)
/* leave only unique and minimals suffixes */
ptr->affix = Affix->repl;
ptr->len = Affix->replen;
! ptr->issuffix = (Affix->type == FF_SUFFIX) ? true : false;
ptr++;
}
}
--- 1454,1460 ----
/* leave only unique and minimals suffixes */
ptr->affix = Affix->repl;
ptr->len = Affix->replen;
! ptr->issuffix = (Affix->type == FF_SUFFIX);
ptr++;
}
}
***************
*** 1685,1691 **** SplitToVariants(IspellDict *Conf, SPNode *snode, SplitVar *orig, char *word, int
if (StopLow < StopHigh)
{
! if (level == FF_COMPOUNDBEGIN)
compoundflag = FF_COMPOUNDBEGIN;
else if (level == wordlen - 1)
compoundflag = FF_COMPOUNDLAST;
--- 1909,1915 ----
if (StopLow < StopHigh)
{
! if (startpos == 0)
compoundflag = FF_COMPOUNDBEGIN;
else if (level == wordlen - 1)
compoundflag = FF_COMPOUNDLAST;
*** a/src/include/tsearch/dicts/spell.h
--- b/src/include/tsearch/dicts/spell.h
***************
*** 57,75 **** typedef struct SPNode
typedef struct spell_struct
{
! union
{
! /*
! * flag is filled in by NIImportDictionary. After NISortDictionary, d
! * is valid and flag is invalid.
! */
! char flag[MAXFLAGLEN];
! struct
! {
! int affix;
! int len;
! } d;
! } p;
char word[FLEXIBLE_ARRAY_MEMBER];
} SPELL;
--- 57,72 ----
typedef struct spell_struct
{
! struct
{
! int affix;
! int len;
! } d;
! /*
! * flag is filled in by NIImportDictionary. After NISortDictionary, d
! * is used instead of flag.
! */
! char *flag;
char word[FLEXIBLE_ARRAY_MEMBER];
} SPELL;
***************
*** 77,83 **** typedef struct spell_struct
typedef struct aff_struct
{
! uint32 flag:8,
type:1,
flagflags:7,
issimple:1,
--- 74,80 ----
typedef struct aff_struct
{
! uint32 flag:16,
type:1,
flagflags:7,
issimple:1,
***************
*** 132,137 **** typedef struct
--- 129,141 ----
bool issuffix;
} CMPDAffix;
+ typedef enum
+ {
+ FM_CHAR,
+ FM_LONG,
+ FM_NUM
+ } FlagMode;
+
typedef struct
{
int maffixes;
***************
*** 145,155 **** typedef struct
char **AffixData;
int lenAffixData;
int nAffixData;
CMPDAffix *CompoundAffix;
! unsigned char flagval[256];
bool usecompound;
/*
* Remaining fields are only used during dictionary construction; they are
--- 149,161 ----
char **AffixData;
int lenAffixData;
int nAffixData;
+ bool useFlagAliases;
CMPDAffix *CompoundAffix;
! unsigned char flagval[65000];
bool usecompound;
+ FlagMode flagMode;
/*
* Remaining fields are only used during dictionary construction; they are
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers