Hello all, as recently mentioned on pg-general@, I am currently working on making installed myspell/unspell dictionary packages (which install themselves in /usr/share/myspell/dicts, mostly LATIN encoded) available to PostgreSQL's tsearch/word stemming in Debian/Ubuntu.
So far I wrote the postgresql-common infrastructure to mangle these dictionary/affix files to become palatable for PostgreSQL (recoding to UTF-8, renaming to lowercase, changing file suffix) and install them into /var/cache/postgresql/dicts/ whenever a {hun,my}spell-* package is installed or updated. The remaining bit is teaching postgresql to actually look into /var/cache/postgresql/dicts/ if it does not find a matching dictionary/affix file in ${sharepath}/tsearch_data/. The reasons why I'm not using ${sharepath}/tsearch_data/ in the first place are that - it's autogenerated data, as opposed to files statically shipped in a package - I do not want to conflict to/overwrite files which the admin manually put there. I created an initial demo patch which provides this fallback. It works great, it passes my test cases (which set up tsearch full text search and stemming handling) and is pretty simple, too. However, the path is hardcoded so far, which is of course bad for upstream inclusion. So this should either become a ./configure option --with-tsearch-dict-fallback=path (or similar), or even a new optional configuration parameter for postgresql.conf. However, before I work on that, I'd like to collect some opinions about the general idea, and whether you prefer autoconf option or postgresql.conf, or whether you wouldn't accept it at all? Thanks a lot in advance! Martin -- Martin Pitt | http://www.piware.de Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)
# Description: If a tsearch/stem dictionary is not found in /usr/share/postgresql/VERSION/tsearch_data/, fall back to /var/cache/postgresql/dicts/, where postgresql-common creates them from system directories. # Ubuntu: https://launchpad.net/bugs/301770 --- ./src/backend/tsearch/ts_utils.c.orig 2008-12-01 06:47:28.000000000 -0800 +++ ./src/backend/tsearch/ts_utils.c 2008-12-01 07:05:14.000000000 -0800 @@ -15,6 +15,7 @@ #include "postgres.h" #include <ctype.h> +#include <unistd.h> #include "miscadmin.h" #include "tsearch/ts_locale.h" @@ -36,7 +37,7 @@ const char *extension) { char sharepath[MAXPGPATH]; - char *result; + char *result, *system_result; /* * We limit the basename to contain a-z, 0-9, and underscores. This may @@ -58,6 +59,21 @@ snprintf(result, MAXPGPATH, "%s/tsearch_data/%s.%s", sharepath, basename, extension); + /* fall back to /var/cache/postgresql/dicts/ */ + if (access(result, R_OK) != 0) + { + system_result = palloc(MAXPGPATH); + snprintf(system_result, MAXPGPATH, "/var/cache/postgresql/dicts/%s.%s", + basename, extension); + if (access(system_result, R_OK) == 0) + { + pfree(result); + result = system_result; + } + else + pfree(system_result); + } + return result; }
signature.asc
Description: Digital signature