Hello I found a page http://www.genesys-e.org/jwalter//mix4win.htm where is section >>Emulation of mmap/munmap<<. Can be a solution?
Regards Pavel Stehule 2010/7/8 Pavel Stehule <pavel.steh...@gmail.com>: > Hello > > 2010/7/8 Takahiro Itagaki <itagaki.takah...@oss.ntt.co.jp>: >> >> Pavel Stehule <pavel.steh...@gmail.com> wrote: >> >>> this version has enhanced AllocSet allocator - it can use a mmap API. >> >> I review your patch and will report some comments. However, I don't have >> test cases for the patch because there is no large dictionaries in the >> default postgres installation. I'd like to ask you to supply test data >> for the patch. > > you can use a Czech dictionary - please, download it from > http://www.pgsql.cz/data/czech.tar.gz > > CREATE TEXT SEARCH DICTIONARY cspell > (template=ispell, dictfile = czech, afffile=czech, stopwords=czech); > CREATE TEXT SEARCH CONFIGURATION cs (copy=english); > ALTER TEXT SEARCH CONFIGURATION cs > ALTER MAPPING FOR word, asciiword WITH cspell, simple; > > postgres=# select * from ts_debug('cs','Příliš žluťoučký kůň se napil > žluté vody'); > alias | description | token | dictionaries | > dictionary | lexemes > -----------+-------------------+-----------+-----------------+------------+------------- > word | Word, all letters | Příliš | {cspell,simple} | cspell > | {příliš} > blank | Space symbols | | {} | | > word | Word, all letters | žluťoučký | {cspell,simple} | cspell > | {žluťoučký} > blank | Space symbols | | {} | | > word | Word, all letters | kůň | {cspell,simple} | cspell > | {kůň} > blank | Space symbols | | {} | | > asciiword | Word, all ASCII | se | {cspell,simple} | cspell | {} > blank | Space symbols | | {} | | > asciiword | Word, all ASCII | napil | {cspell,simple} | cspell > | {napít} > blank | Space symbols | | {} | | > word | Word, all letters | žluté | {cspell,simple} | cspell > | {žlutý} > blank | Space symbols | | {} | | > asciiword | Word, all ASCII | vody | {cspell,simple} | cspell > | {voda} > > >> >> This patch allocates memory with non-file-based mmap() to preload text search >> dictionary files at the server start. Note that dist files are not mmap'ed >> directly in the patch; mmap() is used for reallocatable shared memory. >> >> The dictinary loader is also modified a bit to use simple_alloc() instead >> of palloc() for long-lived cache. It can reduce calls of AllocSetAlloc(), >> that have some overheads to support pfree(). Since the cache is never >> released, simple_alloc() seems to have better performance than palloc(). >> Note that the optimization will also work for non-preloaded dicts. > > it produce little bit better spead, but mainly it significant memory > reduction - palloc allocation is expensive, because add 4 bytes (8 > bytes) to any allocations. And it is problem for thousands smalls > blocks like TSearch ispell dictionary uses. On 64 bit the overhead is > horrible > >> >> === Questions === >> - How do backends share the dict cache? You might expect postmaster's >> catalog is inherited to backends with fork(), but we don't use fork() >> on Windows. >> > > I though about some variants > a) using a shared memory - but it needs more shared memory > reservation, maybe some GUC - but this variant was refused in > discussion. > b) using a mmap on Unix and CreateFileMapping API on windows - but it > is little bit problem for me. I am not have a develop tools for ms > windows. And I don't understand to MS Win platform :( > > Magnus, can you do some tip? > > Without MSWindows we don't need to solve a shared memory and can use > only fork. If we can think about MSWin too, then we have to calculate > only with some shared memory based solution. But it has more > possibilities - shared dictionary can be loaded in runtime too. > >> - Why are SQL functions dpreloaddict_init() and dpreloaddict_lexize() >> defined but not used? > > it is used, if I remember well. It uses ispell dictionary API. The > using is simlyfied - you can parametrize preload dictionary - and then > you use a preloaded dictionary - not some specific dictionary. This > has one advantage and one disadvantage + very simple configuration, + > there are not necessary some shared dictionary manager, - only one > preload dictionary can be used. > > >> >> === Design === >> - You added 3 custom parameters (dict_preload.dictfile/afffile/stopwords), >> but I think text search configuration names is better than file names. >> However, it requires system catalog access but we cannot access any >> catalog at the moment of preloading. If config-name-based setting is >> difficult, we need to write docs about where we can get the dict names >> to be preloaded instead. (from \dFd+ ?) >> > > yes - it is true argument - there are not possible access to these > data in preloaded time. I would to support preloading - (and possible > support sharing session loaded dictionaries), because it ensure a > constant time for TSearch queries everytime. Yes, some documentation, > some enhancing of dictionary list info can be solution. > >> - Do we need to support multiple preloaded dicts? I think dict_preload.* >> should accept a list of items to be loaded. GUC_LIST_INPUT will be a help. >> > > maybe yes. Personaly I would not to complicate a design and using. And > I don't know about request for multiple preloaded dicts now. The > preloaded dictionaries interface is only server side matter - so it > can be changed/enhanced later without problems. I have a idea about > enhancig a GUC parser to allow some like > > preload_dictionary.patch = ... > preload_dictionary.czech = (template=ispell, dictfile = czech, > afffile=czech, stopwords=czech) > proload_dictionary.japan = (template=..... > > >> - Server doesn't start when I added dict_preload to >> shared_preload_libraries and didn't add any custom parameters. >> FATAL: missing AffFile parameter >> But server should start with no effects or print WARNING messages >> for "no dicts are preloaded" in such case. >> >> - We could replace simple_alloc() to a new MemoryContextMethods that >> doesn't support pfree() but has better performance. It doesn't look >> ideal for me to implement simple_alloc() on the top of palloc(). >> > > I don't agree. palloc API is designed to be general - so I implemented > a new memory context type via MMapAllocSetContextCreate and then I use > a palloc function. There isn't reason to design a some new API. > >> === Implementation === >> I'm sure that your patch is WIP, but I'll note some issues just in case. >> >> - We need Makefile for contrib/dict_preload. > > sure, sorry > >> >> - mmap() is not always portable. We should check the availability >> in configure, and also have an alternative implementation for Win32. > > yes, it have to be first step. I need a established API for simple > allocation. Maybe divide this patch to two independent patches - and > to solve memory allocation first ? Dictionary preloading isn't complex > or large feature - so it can be handled in every commitfest. Memory > management is more importal, and can be handled first. > >> >> >> Regards, >> --- >> Takahiro Itagaki >> NTT Open Source Software Center >> > > Thank You very much for review > > Pavel Stehule > >> >> > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers