Hello, I recently got a server crash (bug #17583 [1]) caused by a stack
overflow.
Tom Lane and Richard Guo, in a discussion of this bug, suggested that there
could be more such places.
Therefore, Alexander Lakhin and I decided to deal with this issue and Alexander
developed a methodology. We processed src/backend/*/*.c with "clang -emit-llvm
... | opt -analyze -print-calgraph" to find all the functions that call
themselves directly. I checked each of them for features that protect against
stack overflows.
We analyzed 4 catalogs: regex, tsearch, snowball and adt.
Firstly, we decided to test the regex catalog functions and found 6 of them
that lack the check_stach_depth() call.
zaptreesubs
markst
next
nfatree
numst
repeat
We have tried to exploit the recursion in the function zaptreesubs():
select regexp_matches('a' || repeat(' a', 11000), '(.)(' || repeat(' \1',
11000) || ')?');
ERROR: invalid regular expression: regular expression is too complex
repeat():
select regexp_match('abc01234xyz',repeat('a{0,2}',100001));
ERROR: invalid regular expression: regular expression is too complex
numst():
select regexp_match('abc01234xyz',repeat('(.)\1e',100001));
ERROR: invalid regular expression: regular expression is too complex
markst():
markst is called in the code after v->tree = parse(...);
it is necessary that the tree be successfully parsed, but with a nesting level
of about 100,000 this will not work - stack protection will work during parsing
and v->ntree = numst(...); is also there.
next():
we were able to crash the server with the following query:
(printf "SELECT regexp_match('abc', 'a"; for ((i=1;i<1000000;i++)); do printf
"(?#)"; done; printf "b')" ) | psql
Secondly, we have tried to exploit the recursion in the adt catalog functions
and Alexander was able to crash the server with the following query:
regex_selectivity_sub():
SELECT * FROM pg_proc WHERE proname ~ ('(a' || repeat('|', 200000) || 'b)');
And this query:
(n=100000;
printf "SELECT polygon '((0,0),(0,1000000))' <@ polygon '((-200000,1000000),";
for ((i=1;i<$n;i++)); do printf "(100000,$(( 300000 + $i))),(-100000,$((800000
+ $i))),"; done;
printf "(200000,900000),(200000,0))';"
) | psql
Thirdly, the snowball catalog, Alexander has tried to exploit the recursion in
the r_stem_suffix_chain_before_ki function and crashed a server using this
query:
r_stem_suffix_chain_before_ki():
SELECT ts_lexize('turkish_stem', repeat('lerdeki', 1000000));
The last one is the tsearch catalog. We have found 4 functions that didn't have
check_stach_depth() function:
SplitToVariants
mkANode
mkSPNode
LexizeExec
We have tried to exploit the recursion in the SplitToVariants function and
Alexander crashed a server using this:
SplitToVariants():
CREATE TEXT SEARCH DICTIONARY ispell (Template=ispell,
DictFile=ispell_sample,AffFile=ispell_sample);
SELECT ts_lexize('ispell', repeat('bally', 10000));
After trying to exploit the recursion in the LexizeExec function Alexander made
this conlusion:
LexizeExec has two branches "ld->curDictId == InvalidOid" (usual mode) and
"ld->curDictId != InvalidOid" (multiword mode) - we start with the first one,
then make recursive call to switch to the multiword mode, but then we return to
the usual mode again.
mkANode and mkSPNode deal with the dictionary structs, not with user-supplied
data, so we believe these functions are not vulnerable.
[1]
https://www.postgresql.org/message-id/flat/CAMbWs499ytQiH4mLMhRxRWP-iEUz3-DSinpAD-cUCtVo_23Wtg%40mail.gmail.com#03ad703cf4bc8d28ccba69913e1e8106