Hello Tomas,

On 2019/10/13 10:26, Tomas Vondra wrote:
over in pgsql-bugs [1] we got a report about CREATE TEXT SEARCH
DICTIONARY causing segfaults on 12.0. Simply running

    CREATE TEXT SEARCH DICTIONARY hunspell_num (Template=ispell,
    DictFile=hunspell_sample_num, AffFile=hunspell_sample_long);

does trigger a crash, 100% of the time. The crash was reported on 12.0,
but it's in fact present since 9.6.

On 9.5 the example does not work, because that version does not (a)
include the hunspell dictionaries used in the example, and (b) it does
not support long flags. So even after copying the dictionaries and
tweaking them a bit it still passes without a crash.

This crash is not because of long flags, but because of aliases (more thoughts below).

Looking at the commit history of spell.c, there seems to be a bunch of
commits in 2016 (e.g. f4ceed6ceba3) touching exactly this part of the
code (hunspell), and it also correlates quite nicely with the affected
branches (9.6+). So my best guess is it's a bug in those changes.

Yeah, there was a lot changes.

So it simply grabs whatever it finds in the dict file, parses it and
then (later) we use it as index to access the AffixData array, even if
the value is way out of bounds.

Yes, we enter this code if an affix file defines aliases (AF parameter). AffixData array is used to store those aliases.

More about hunspell format you can find here:
https://linux.die.net/man/4/hunspell

In the example we have the following aliases:
AF 11
AF cZ           #1
AF cL           #2
...
AF sB           #11

And in the dictionary file we should use their indexes (from 1 to 11). These aliases defines set of flags and in the dict file we can use only single index:
book/3
book/11

but not:
book/3,4
book/2,11

I added checking of this last case in the attached patch. PostgreSQL will raise an error if it sees non-numeric and non-whitespace character after the index.

Aliases can be used with all flag types: 'default' (i.e. FM_CHAR), 'long', and if I'm not mistaken 'num'.

So I think we need some sort of cross-check here. We certainly need to
make NISortDictionary() check the affix value is within AffixData
bounds, and error out when the index is non-sensical (maybe negative
and/or exceeding nAffixData).

I agree, I attached the patch which do this. I also added couple asserts, tests and fixed condition in getAffixFlagSet():

-               if (curaffix > 0 && curaffix <= Conf->nAffixData)
+               if (curaffix > 0 && curaffix < Conf->nAffixData)

I think it could be a bug, because curaffix can't be equal to Conf->nAffixData.

Maybe there's a simple way to check if the affix/dict files match.

I'm not sure how to properly fix this either. The only thing we could check is commas in affix flags in a dict file:

book/302,301,202,303

FM_CHAR and FM_LONG dictionaries can't have commas. They should have the following affix flags:

book/sGsJpUsS   # 4 affixes for FM_LONG
book/GJUS       # 4 affixes for FM_CHAR

But I guess they could have numbers in flags (as help says "Set flag type. Default type is the extended ASCII (8-bit) character.") and other non alphanumeric characters (as some language dictionaries have):

book/s1s2s3s4   # 4 affixes for FM_LONG

--
Artur
diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c
index c715f06b8e..1b8766659c 100644
--- a/src/backend/tsearch/spell.c
+++ b/src/backend/tsearch/spell.c
@@ -458,6 +458,8 @@ IsAffixFlagInUse(IspellDict *Conf, int affix, const char 
*affixflag)
        if (*affixflag == 0)
                return true;
 
+       Assert(affix < Conf->nAffixData);
+
        flagcur = Conf->AffixData[affix];
 
        while (*flagcur)
@@ -1160,13 +1162,17 @@ getAffixFlagSet(IspellDict *Conf, char *s)
                                        (errcode(ERRCODE_CONFIG_FILE_ERROR),
                                         errmsg("invalid affix alias \"%s\"", 
s)));
 
-               if (curaffix > 0 && curaffix <= Conf->nAffixData)
+               if (curaffix > 0 && curaffix < Conf->nAffixData)
 
                        /*
                         * Do not subtract 1 from curaffix because empty string 
was added
                         * in NIImportOOAffixes
                         */
                        return Conf->AffixData[curaffix];
+               else if (curaffix > Conf->nAffixData)
+                       ereport(ERROR,
+                                       (errcode(ERRCODE_CONFIG_FILE_ERROR),
+                                        errmsg("invalid affix alias \"%s\"", 
s)));
                else
                        return VoidString;
        }
@@ -1561,6 +1567,8 @@ MergeAffix(IspellDict *Conf, int a1, int a2)
 {
        char      **ptr;
 
+       Assert(a1 < Conf->nAffixData && a2 < Conf->nAffixData);
+
        /* Do not merge affix flags if one of affix flags is empty */
        if (*Conf->AffixData[a1] == '\0')
                return a2;
@@ -1603,9 +1611,10 @@ MergeAffix(IspellDict *Conf, int a1, int a2)
 static uint32
 makeCompoundFlags(IspellDict *Conf, int affix)
 {
-       char       *str = Conf->AffixData[affix];
+       Assert(affix < Conf->nAffixData);
 
-       return (getCompoundAffixFlagValue(Conf, str) & FF_COMPOUNDFLAGMASK);
+       return (getCompoundAffixFlagValue(Conf, Conf->AffixData[affix]) &
+                       FF_COMPOUNDFLAGMASK);
 }
 
 /*
@@ -1725,6 +1734,16 @@ NISortDictionary(IspellDict *Conf)
                                                        
(errcode(ERRCODE_CONFIG_FILE_ERROR),
                                                         errmsg("invalid affix 
alias \"%s\"",
                                                                        
Conf->Spell[i]->p.flag)));
+                               if (curaffix < 0 || curaffix >= 
Conf->nAffixData)
+                                       ereport(ERROR,
+                                                       
(errcode(ERRCODE_CONFIG_FILE_ERROR),
+                                                        errmsg("invalid affix 
alias \"%s\"",
+                                                                       
Conf->Spell[i]->p.flag)));
+                               if (*end != '\0' && !t_isdigit(end) && 
!t_isspace(end))
+                                       ereport(ERROR,
+                                                       
(errcode(ERRCODE_CONFIG_FILE_ERROR),
+                                                        errmsg("invalid affix 
alias \"%s\"",
+                                                                       
Conf->Spell[i]->p.flag)));
                        }
                        else
                        {
diff --git a/src/test/regress/expected/tsdicts.out 
b/src/test/regress/expected/tsdicts.out
index 2524ec2768..5a927be948 100644
--- a/src/test/regress/expected/tsdicts.out
+++ b/src/test/regress/expected/tsdicts.out
@@ -413,6 +413,40 @@ SELECT ts_lexize('hunspell_num', 'footballyklubber');
  {foot,ball,klubber}
 (1 row)
 
+-- Test suitability of affix and dict files
+CREATE TEXT SEARCH DICTIONARY hunspell_err (
+                                               Template=ispell,
+                                               DictFile=ispell_sample,
+                                               AffFile=hunspell_sample_long
+);
+ERROR:  invalid affix alias "GJUS"
+CREATE TEXT SEARCH DICTIONARY hunspell_err (
+                                               Template=ispell,
+                                               DictFile=ispell_sample,
+                                               AffFile=hunspell_sample_num
+);
+ERROR:  invalid affix flag "SZ\"
+CREATE TEXT SEARCH DICTIONARY hunspell_invalid_1 (
+                                               Template=ispell,
+                                               DictFile=hunspell_sample_long,
+                                               AffFile=ispell_sample
+);
+CREATE TEXT SEARCH DICTIONARY hunspell_invalid_2 (
+                                               Template=ispell,
+                                               DictFile=hunspell_sample_long,
+                                               AffFile=hunspell_sample_num
+);
+CREATE TEXT SEARCH DICTIONARY hunspell_invalid_3 (
+                                               Template=ispell,
+                                               DictFile=hunspell_sample_num,
+                                               AffFile=ispell_sample
+);
+CREATE TEXT SEARCH DICTIONARY hunspell_err (
+                                               Template=ispell,
+                                               DictFile=hunspell_sample_num,
+                                               AffFile=hunspell_sample_long
+);
+ERROR:  invalid affix alias "302,301,202,303"
 -- Synonym dictionary
 CREATE TEXT SEARCH DICTIONARY synonym (
                                                Template=synonym,
diff --git a/src/test/regress/sql/tsdicts.sql b/src/test/regress/sql/tsdicts.sql
index 60906f6549..908e675501 100644
--- a/src/test/regress/sql/tsdicts.sql
+++ b/src/test/regress/sql/tsdicts.sql
@@ -101,6 +101,43 @@ SELECT ts_lexize('hunspell_num', 'footballklubber');
 SELECT ts_lexize('hunspell_num', 'ballyklubber');
 SELECT ts_lexize('hunspell_num', 'footballyklubber');
 
+-- Test suitability of affix and dict files
+CREATE TEXT SEARCH DICTIONARY hunspell_err (
+                                               Template=ispell,
+                                               DictFile=ispell_sample,
+                                               AffFile=hunspell_sample_long
+);
+
+CREATE TEXT SEARCH DICTIONARY hunspell_err (
+                                               Template=ispell,
+                                               DictFile=ispell_sample,
+                                               AffFile=hunspell_sample_num
+);
+
+CREATE TEXT SEARCH DICTIONARY hunspell_invalid_1 (
+                                               Template=ispell,
+                                               DictFile=hunspell_sample_long,
+                                               AffFile=ispell_sample
+);
+
+CREATE TEXT SEARCH DICTIONARY hunspell_invalid_2 (
+                                               Template=ispell,
+                                               DictFile=hunspell_sample_long,
+                                               AffFile=hunspell_sample_num
+);
+
+CREATE TEXT SEARCH DICTIONARY hunspell_invalid_3 (
+                                               Template=ispell,
+                                               DictFile=hunspell_sample_num,
+                                               AffFile=ispell_sample
+);
+
+CREATE TEXT SEARCH DICTIONARY hunspell_err (
+                                               Template=ispell,
+                                               DictFile=hunspell_sample_num,
+                                               AffFile=hunspell_sample_long
+);
+
 -- Synonym dictionary
 CREATE TEXT SEARCH DICTIONARY synonym (
                                                Template=synonym,

Reply via email to