Hello all,

as recently mentioned on pg-general@, I am currently working on making
installed myspell/unspell dictionary packages (which install
themselves in /usr/share/myspell/dicts, mostly LATIN encoded)
available to PostgreSQL's tsearch/word stemming in Debian/Ubuntu.

So far I wrote the postgresql-common infrastructure to mangle these
dictionary/affix files to become palatable for PostgreSQL (recoding to
UTF-8, renaming to lowercase, changing file suffix) and install them
into /var/cache/postgresql/dicts/ whenever a {hun,my}spell-* package
is installed or updated.

The remaining bit is teaching postgresql to actually look into
/var/cache/postgresql/dicts/ if it does not find a matching
dictionary/affix file in ${sharepath}/tsearch_data/.

The reasons why I'm not using ${sharepath}/tsearch_data/ in the first
place are that

 - it's autogenerated data, as opposed to files statically shipped in
   a package

 - I do not want to conflict to/overwrite files which the admin
   manually put there.

I created an initial demo patch which provides this fallback. It works
great, it passes my test cases (which set up tsearch full text search
and stemming handling) and is pretty simple, too. 

However, the path is hardcoded so far, which is of course bad for
upstream inclusion. So this should either become a ./configure option
--with-tsearch-dict-fallback=path (or similar), or even a new optional
configuration parameter for postgresql.conf.

However, before I work on that, I'd like to collect some opinions
about the general idea, and whether you prefer autoconf option or
postgresql.conf, or whether you wouldn't accept it at all?

Thanks a lot in advance!

Martin

-- 
Martin Pitt                        | http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)
# Description: If a tsearch/stem dictionary is not found in /usr/share/postgresql/VERSION/tsearch_data/, fall back to /var/cache/postgresql/dicts/, where postgresql-common creates them from system directories.
# Ubuntu: https://launchpad.net/bugs/301770
--- ./src/backend/tsearch/ts_utils.c.orig	2008-12-01 06:47:28.000000000 -0800
+++ ./src/backend/tsearch/ts_utils.c	2008-12-01 07:05:14.000000000 -0800
@@ -15,6 +15,7 @@
 #include "postgres.h"
 
 #include <ctype.h>
+#include <unistd.h>
 
 #include "miscadmin.h"
 #include "tsearch/ts_locale.h"
@@ -36,7 +37,7 @@
 							const char *extension)
 {
 	char		sharepath[MAXPGPATH];
-	char	   *result;
+	char	   *result, *system_result;
 
 	/*
 	 * We limit the basename to contain a-z, 0-9, and underscores.	This may
@@ -58,6 +59,21 @@
 	snprintf(result, MAXPGPATH, "%s/tsearch_data/%s.%s",
 			 sharepath, basename, extension);
 
+	/* fall back to /var/cache/postgresql/dicts/ */
+	if (access(result, R_OK) != 0)
+	{
+		system_result = palloc(MAXPGPATH);
+		snprintf(system_result, MAXPGPATH, "/var/cache/postgresql/dicts/%s.%s",
+			 basename, extension);
+		if (access(system_result, R_OK) == 0)
+		{
+			pfree(result);
+			result = system_result;
+		} 
+		else
+			pfree(system_result);
+	}
+
 	return result;
 }
 

Attachment: signature.asc
Description: Digital signature

Reply via email to