Re: [GENERAL] jsonb value retrieval performance
and I am trying to get value via jsonb->parentKey->childKey it seems it is very slow. Would it be actually faster to use top level key only and parse it at client side? Suppose, most time is spent for decompressing huge value, not for actual search inside jsonb. If so, we need to implement some search method which decompress some chunks of jsonb. Could you send to me an example of that jsonb? -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] jsonb value retrieval performance
Suppose, most time is spent for decompressing huge value, not for actual search inside jsonb. If so, we need to implement some search method which decompress some chunks of jsonb. On artificial example: %SAMP IMAGE FUNCTION CALLERS 92.9 postgres pglz_decompress toast_decompress_datum -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] jsonb value retrieval performance
does it read in the whole jsonb tree structure in memory and get to v1 or it has some optimization so only get v1 instead of reading in the whole structure. it reads, untoasts and uncompresses whole value and then executes search. An idea to fix that is a reading jsonb value by only needed chunks. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] MongoDB 3.2 beating Postgres 9.5.1?
CREATE INDEX json_tables_idx ON json_tables USING GIN (data jsonb_path_ops); Bitmap Heap Scan on json_tables (cost=113.50..37914.64 rows=1 width=1261) (actual time=2157.118..1259550.327 rows=909091 loops=1) Recheck Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb) Rows Removed by Index Recheck: 4360296 Heap Blocks: exact=37031 lossy=872059 Hmm, looks like too small work_mem because lossy heap block count is too big. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Test CMake build
I tried it on FreeBSD 64-bit, 16Gb, SSD, Core i7 ( ./configure && gmake all; ) 168,99s user 15,46s system 97% cpu 3:09,61 total ( cmake . && gmake all; ) 75,11s user 11,34s system 100% cpu 1:26,30 total Cmake 2 times faster, that is good, but I don't understand why. Which optimization level does cmake buld use by default? Which compiler does it take? It's not obvious, because cmake build hides actual compiler command line. Yury, pls, return back check target... -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Test CMake build
Teodor Sigaev wrote: I tried it on FreeBSD 64-bit, 16Gb, SSD, Core i7 ( ./configure && gmake all; ) 168,99s user 15,46s system 97% cpu 3:09,61 total ( cmake . && gmake all; ) 75,11s user 11,34s system 100% cpu 1:26,30 total ( CFLAGS='-O2' cmake . && gmake all; ) 141,87s user 12,18s system 97% cpu 2:37,40 total Oops, cmake default target is compiled with -O0. With -O2 cmake is still faster but not so much. Cmake 2 times faster, that is good, but I don't understand why. Which optimization level does cmake buld use by default? Which compiler does it take? It's not obvious, because cmake build hides actual compiler command line. Yury, pls, return back check target... -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Test CMake build
Hm, I don't think having the compile/link lines be hidden up is acceptable. Many times we need to debug some compile problem, and the output is mandatory. +1 Although it could be fixed by VERBOSE=1 make -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] crash creating tsearch2 index
Could you provide a test suite? John DeSoi wrote: Hi, I'm trying to dump and restore a copy of a database in the same cluster. pg_restore would abort when creating a tsearch2 gist index. So I dumped to text removed the CREATE INDEX commands and tried to do that at the end after the rest of the database was loaded. I still have the same problem: CREATE INDEX song_tsx_title_idx ON song USING gist (tsx_title public.gist_tsvector_ops); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. This is pg 8.0.8 in a shared hosting environment, so I don't have a lot of options for tweaking. Is there a known work-around for this? Thanks, John DeSoi, Ph.D. http://pgedit.com/ Power Tools for PostgreSQL ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] opclass for real[]
ERROR: data type real[] has no default operator class for access method "gist" HINT: You must specify an operator class for the index or define a default operator class for the data type. There is operator class for GIN for real[]. http://www.postgresql.org/docs/8.2/static/xindex.html#XINDEX-GIN-ARRAY-STRAT-TABLE Is there a opclass defined in 8.2 or I have to create one. In either case can you please give a link for information on opclasses. Thanks Abhang ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] TSEARCH2: disable stemming in indexes and triggers
I found out that using 'simple' instead of 'default' when using to_tsvector() does excactly that, but I don't know how to change my triggers and indexes to keep doing the same (using 'simple'). Suppose, your database is initialized with C locale. So, just mark simple configuration as default: # update pg_ts_cfg set locale=null where ts_name='default'; # update pg_ts_cfg set locale='C' where ts_name='simple'; If your locale setting is not C then mark needed configuration with your locale. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] warm standby server stops doing checkpoints afterawhile
<2007-06-01 13:11:29.365 CEST:%> DEBUG: 0: Ressource manager (13) has partial state information To me, this points clearly to there being an improperly completed action in resource manager 13. (GIN) In summary, it appears that there may be an issue with the GIN code for WAL recovery and this is effecting the Warm Standby. Hmm. I found that gin_xlog_cleanup doesn't reset incomplete_splits list. Is it possible reason of bug? Attached patch fixes it. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ *** ./src/backend/access/gin/ginxlog.c.orig Fri Jun 1 16:47:47 2007 --- ./src/backend/access/gin/ginxlog.c Fri Jun 1 16:53:47 2007 *** *** 594,599 --- 594,600 MemoryContextSwitchTo(topCtx); MemoryContextDelete(opCtx); + incomplete_splits = NIL; } bool ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] warm standby server stops doing checkpointsafterawhile
<2007-06-01 16:28:51.708 CEST:%> LOG: GIN incomplete split root:8 l:45303 r:111740 at redo CA/C8243C28 ... <2007-06-01 16:38:23.133 CEST:%> LOG: GIN incomplete split root:8 l:45303 r:111740 at redo CA/C8243C28 Looks like a bug in GIN. I'll play with it. Can you provide more details about your test? -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] warm standby server stops doingcheckpointsafterawhile
I'd suggest we throw an error, as shown in the enclosed patch. Frank, can you give that a whirl to provide Teodor with something more to work with? Thanks. I already makes a test suite which reproduce the problem - it leaves incompleted splits. But I discover one more problem: deadlock on buffer's lock. So, right now I investigate the problem. Neither GIST nor B-tree seems to throw an error in corresponding locations also, so the potential for not being able to track this is high. I'd want to throw errors in those locations also. Agreed, I'll add more check -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] warm standby server stops doingcheckpointsafterawhile
Found a reason: if parent page is fully backuped after child's split then forgetIncompleteSplit() isn't called at all. Hope, attached patch fix that. Pls, test it. PS I'm going away for weekend, so I'll not be online until Monday. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ patch_wal_gin.gz Description: Unix tar archive ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] warm standby server stops doingcheckpointsafterawhile
<2007-06-01 23:00:00.001 CEST:%> LOG: GIN incomplete splits=8 Just to be sure: patch fixes *creating* of WAL log, not replaying. So, primary db should be patched too. During weekend I found possible deadlock in locking protocol in GIN between concurrent UPDATE and VACUUM queries with the same GIN index involved. Strange, but I didn't see it in 8.2 and even now I can't reproduce it. It's easy to reproduce оnly on HEAD with recently added ReadBufferWithStrategy() call instead of ReadBuffer(). ReadBufferWithStrategy() call was added to implement limited-size "ring" of buffers for VACUUM. Nevertheless, it's a possible scenario in 8.2. Attached patch fixes that deadlock bug too. And, previous version of my patch has a mistake which is observable on CREATE INDEX .. USING GIN query. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ patch_wal_gin.v6.gz Description: Unix tar archive ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] warm standby server stops doingcheckpointsafterawhile
1. After a certain point, consecutive GIN index splits cause a problem. The new RHS block numbers are consecutive from 111780+ That's newly created page. Splitted page might have any number 2. The incomplete splits stay around indefinitely after creation and we aren't trying to remove the wrong split at any point. We're either never creating an xlog record, or we are ignoring it in recovery, or we are somehow making multiple entries then not removing all of them. Agreed 3. The root seems to move, which isn't what I personally was expecting to see. It seems root refers to the highest parent involved in the split. root in this context means parent of splitted page. Actually, there is a lot of B-tree in GIN, see http://www.sigaev.ru/gin/GinStructure.pdf 4. We're writing lots of redo in between failed page splits. So *almost* everything is working correctly. 5. This starts to happen when we have very large indexes. This may be coincidental but the first relation file is fairly full (900+ MB). Yes. It seems to me that conditions of error are very rare and B-tree over ItemPointers (second level of GIN) has a big capacity, 1000+ items per page. So, splits occur rather rare. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] warm standby server stops doingcheckpointsafterawhile
Ooops. Patch doesn't apply cleanly. New version. Attached patch fixes that deadlock bug too. And, previous version of my patch has a mistake which is observable on CREATE INDEX .. USING GIN query. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ patch_wal_gin.v7.gz Description: Unix tar archive ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] warm standby server stops doingcheckpointsafterawhile
After some observation of massive reindexing of some hundred thousand data sets it seems to me that the slave doesn't skip checkpoints anymore. (Apart from those skipped because of the CheckpointTimeout thing) I'll keep an eye on it and report back any news on the issue. Nice, committed. Thank for your report and testing. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Regression - Query requires full scan, GIN doesn't support it
Is this a permanent limitation of GIN, or is a fix possible? Permanent. You could check user input by querytree() function --- if it returns 'T' string then fullscan will be needed. If your tsquery is produced by plainto_tsquery() call then it will not find any result, so you can show to user void page. Is a fix being worked on? If a fix is forthcoming, will it be available in the 8.2 series or only 8.3+? Possibly, full fix in 8.4. But I will not promise. 8.3 will have protection from queries which doesn't match anything. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] PickSplit method of 2 columns ... error
Split page algorithm was rewrited for 8.2 for multicolumn indexes and API for user-defined pickSplit function was extended to has better results with index creation. But GiST can interact with old functions - and it says about this. That isn't mean some real problem or error - index will be the same as in 8.1, not better. Kevin Neufeld wrote: Has anyone come across this error before? LOG: PickSplit method of 2 columns of index 'asset_position_lines_asset_cubespacetime_idx' doesn't support secondary split This is a multi-column GiST index on an integer and a cube (a data type from the postgres cube extension module). I traced the error to the gistUserPicksplit function in the gistsplit.c ... I surmise that this method is called whenever a page split is necessary. So, I know when this error occurs, but I don't know why. Thoughts anyone? Cheers, Kevin -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] tsearch2 anomoly?
Usual text hasn't strict syntax rules, so parser tries to recognize most probable token. Something with '.', '-' and alnum characters is often a filename, but filename is very rare finished or started by dot. RC Gobeille wrote: Thanks and I didn't know about ts_debug, so thanks for that also. For the record, I see how to use my own processing function (e.g. dropatsymbol) to get what I need: http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro .html However, can you explain the logic behind the parsing difference if I just add a ".s" to a string: ossdb=# select ts_debug('gallery2-httpd-2.1-conf.'); ts_debug --- (default,hword,"Hyphenated word",gallery2-httpd-2,{simple},"'2' 'httpd' 'gallery2' 'gallery2-httpd-2'") (default,part_hword,"Part of hyphenated word",gallery2,{simple},'gallery2') (default,lpart_hword,"Latin part of hyphenated word",httpd,{en_stem},'httpd') (default,float,"Decimal notation",2.1,{simple},'2.1') (default,lpart_hword,"Latin part of hyphenated word",conf,{en_stem},'conf') (5 rows) ossdb=# select ts_debug('gallery2-httpd-2.1-conf.s'); ts_debug - (default,host,Host,gallery2-httpd-2.1-conf.s,{simple},'gallery2-httpd-2.1-c onf.s') (1 row) Thanks again, Bob On 9/6/07 11:19 AM, "Oleg Bartunov" <[EMAIL PROTECTED]> wrote: This is how default parser works. See output from select * from ts_debug('gallery2-httpd-conf'); and select * from ts_debug('httpd-2.2.3-5.src.rpm'); All token type: select * from token_type(); On Thu, 6 Sep 2007, RC Gobeille wrote: I'm having trouble understanding to_tsvector. (PostreSQL 8.1.9 contrib) In this first case converting 'gallery2-httpd-conf' makes sense to me and is exactly what I want. It looks like the entire string is indexed plus the substrings broken by '-' are indexed. ossdb=# select to_tsvector('gallery2-httpd-conf'); to_tsvector - 'conf':4 'httpd':3 'gallery2':2 'gallery2-httpd-conf':1 However, I'd expect the same to happen in the httpd example - but it does not appear to. ossdb=# select to_tsvector('httpd-2.2.3-5.src.rpm'); to_tsvector --- 'httpd-2.2.3-5.src.rpm':1 Why don't I get: 'httpd', 'src', 'rpm', 'httpd-2.2.3-5.src.rpm' ? Is this a bug or design? Thank you! Bob Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Tsearch2 - spanish
prueba=# select to_tsvector('espanol','melón'); ERROR: Affix parse error at 506 line and prueba=# select lexize('sp','melón'); lexize - {melon} (1 row) Looks very strange, can you provide list of dictionaries and configuration map? I tried many dictionaries with the same results. Also I change the codeset of files :aff and dict (from "latin1 to utf8" and "utf8 to iso88591") and got the same error where can I investigate for resolve about this problem? My dictionary at 506 line had: Where do you take this file? And what is encdoing/locale setting of your db? -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Tsearch2 - spanish
prueba1=# select to_tsvector('espanol','melón perro mordelón'); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> Hmm, can you provide backtrace? -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Segmentation fault with 8.3 FTS ISpell
I tryed to reproduce the bug but without success. Could you provide a dump of text column? Hannes Dorbath wrote: Crash happens about 7 minutes after issuing the UPDATE statement with current CVS HEAD. The table has around 5 million rows. It's always reproducible. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Segmentation fault with 8.3 FTS ISpell
Fixes are committed to CVS, hope, they will help you. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] postgres 8.3 rc-1 ispell installation problem
flag *J:# isimo E> -E, 'ISIMO # grand'isimo <-- here 432 E > -E, 'ISIMOS # grande grand'isimos E > -E, 'ISIMA# grande grand'isima E > -E, 'ISIMAS # grande grand'isimas O > -O, 'ISIMO# tonto tont'isimo O > -O, 'ISIMA# tonto tont'isima Current implementation doesn't accept any character in ending except alpha ones. i think 'I.. word is not correct for ispell, this should be one Í letter That's right, but you should convert dictionary and affix file in UTF8 encoding. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] full text index and most frequently used words
What I'd like to know is if there is an easy to way to use the full text index to generate a list of the most common words. I could write this code manually, but I'm hoping there's a better (simpler) way. For 8.3 http://www.postgresql.org/docs/8.3/static/textsearch-features.html#TEXTSEARCH-STATISTICS For versions before 8.3 just use stat() function instead of ts_stat(). -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Fragments in tsearch2 headline
Teodor, Oleg, do we want this? http://archives.postgresql.org/pgsql-general/2007-11/msg00508.php I suppose, we want it. But there are a questions/issues: - Is it needed to introduce new function? may be it will be better to add option to existing headline function. I'd like to keep current layout: ts_headline provides some common interface to headline generation. Finding and marking fragments is deal of parser's headline method and generation of exact pieces of text is made by ts_headline. - Covers may be overlapped. So, overlapped fragments will be looked odd. In any case, the patch was developed for contrib version of tsearch. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Fragments in tsearch2 headline
The patch takes into account the corner case of overlap. Here is the code for that // start check if (!startHL && *currentpos >= startpos) startHL = 1; The headline generation will not start until currentpos has gone past startpos. Ok You can also check how this headline function is working at my website indiankanoon.com. Some example queries are murder, freedom of speech, freedom of press etc. Looks good. Should I develop the patch for the current cvs head of postgres? I'd like to commit your patch, but if it should be: - for current HEAD - as extension of existing ts_headline. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Direct access to GIST structure
encode spatial proximity. Is there an API (backend C-level is fine) to access a GIST index? The best way is to extend existing interface to GiST to support KNN-search. But you can see how to get access to index structure from module in gevel module (http://www.sigaev.ru/cvsweb/cvsweb.cgi/gevel/). GiST-related functions in this module is invented to help to developers, not for production use, so they acquire exclusive lock on index. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Direct access to GIST structure
I just stumbled on http://www.cs.purdue.edu/spgist/ which seems like exactly what I need. It doesn't work with 8.2 and up, because since 8.2 index should take care about concurrent access itself and that implementation doesn't do it. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] tsearch2 on-demand dictionary loading & using functions in tsearch2
* Considering the database is loaded separately for each session, does this also imply that each running backend has a separate dictionary stored in memory? Yes. As for downsides, I only really see two: * Tracking updates of dictionaries - but it's reasonable to believe that new connections get open more often than the dictionary gets updated. Also, this might be easily solved by stat()-ing the dictionary file before starting up session, and only have the server reload it if there's a notified change. * Possibly complicated to implement? Keeping dictionary up to date - it's a most difficult part here. Configuration of dictionary might be done by ALTER command - so, parent process (and all currently running backends) should get that information to reload dictionary. As for my second question, is it possible to use functions in tsearch2? For example, writing my own stemmer in PL/pgSQL or in C as a postgres function. Yes, of course, you can develop your dictionary (-ies) and parser. Dut only in C, because they are critical for performance. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] tsearch2 on-demand dictionary loading & using functions in tsearch2
Hmm, good point; I presume "accept the fact that settings change won't propagate to other backends until reconnect" would not be acceptable behavior, even if documented along with the relevant configuration option? I suppose so. That was one of the reasons to move tsearch into core and it will be too regrettable to lost that feature again. As for my second question, is it possible to use functions in tsearch2? For example, writing my own stemmer in PL/pgSQL or in C as a postgres function. I've had something different in mind. Considering there are already facilities to use functions, be it PL/pgSQL, PL/Python or C, why not just use those with the condition that the function must accept some-arguments and return some-result? Or would using this, even while using C as the language used for the actual parser, slow things down too? API to dictionary and parser intentionally utilizes complex (and nested) C-structures to decrease overheads. During parse of text postgres makes two call of parser (one call - parser returns word, second - word delimiter. Space is a lexeme too! Although it's not a subject to index) and one call of dictionary per word. So, if your language can work with C-structures then you can use that language with tsearch with more or less performance pay. PL/pgSQL hasn't this capability. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Fragments in tsearch2 headline
[moved to -hackers, because talk is about implementation details] I've ported the patch of Sushant Sinha for fragmented headlines to pg8.3.1 (http://archives.postgresql.org/pgsql-general/2007-11/msg00508.php) Thank you. 1 > diff -Nrub postgresql-8.3.1-orig/contrib/tsearch2/tsearch2.c now contrib/tsearch2 is compatibility layer for old applications - they don't know about new features. So, this part isn't needed. 2 solution to compile function (ts_headline_with_fragments) into core, but using it only from contrib module looks very odd. So, new feature can be used only with compatibility layer for old release :) 3 headline_with_fragments() is hardcoded to use default parser, but what will be in case when configuration uses another parser? For example, for japanese language. 4 I would prefer the signature ts_headline( [regconfig,] text, tsquery [,text] ) and function should accept 'NumFragments=>N' for default parser. Another parsers may use another options. 5 it just doesn't work correctly, because new code doesn't care of parser specific type of lexemes. contrib_regression=# select headline_with_fragments('english', 'wow asd-wow wow', 'asd', ''); headline_with_fragments -- ...wow asd-wowasd-wow wow (1 row) So, I incline to use existing framework/infrastructure although it may be a subject to change. Some description: 1 ts_headline defines a correct parser to use 2 it calls hlparsetext to split text into structure suitable for both goals: find the best fragment(s) and concatenate that fragment(s) back to the text representation 3 it calls parser specific method prsheadline which works with preparsed text (parse was done in hlparsetext). Method should mark a needed words/parts/lexemes etc. 4 ts_headline glues fragments into text and returns that. We need a parser's headline method because only parser knows all about its lexemes. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] FILLFACTOR for GIN indexes in 8.3.7
it seems that I should reduce the Fill Factor of some FTS indexes, but what is the default ? > The other index methods use fillfactor in different but roughly analogous ways; > the default fillfactor varies between methods Actually, GIN doesn't use it. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Vacuumdb Fails: Huge Tuple
APseudoUtopia writes: Here's what happened: $ vacuumdb --all --full --analyze --no-password vacuumdb: vacuuming database "postgres" vacuumdb: vacuuming database "web_main" vacuumdb: vacuuming of database "web_main" failed: ERROR: б═huge tuple PostgreSQL 8.4.0 on i386-portbld-freebsd7.2, compiled by GCC cc (GCC) 4.2.1 20070719 [FreeBSD], 32-bit Pls, apply attached patch. Patch increases max size from approximately 500 bytes up to 2700 bytes, so vacuum will be able to finish. This is evidently coming out of ginHeapTupleFastCollect because it's formed a GIN tuple that is too large (either too long a word, or too many postings, or both). I'd say that this represents a serious degradation in usability from pre-8.4 releases: before, you would have gotten the error upon attempting to insert the table row that triggers the problem. Now, with the "fast insert" stuff, you don't find out until VACUUM fails, and you have no idea where the bad data is. Not cool. Oleg, Teodor, what can we do about this? Can we split an oversize tuple into multiple entries? Can we apply suitable size checks before instead of after the fast-insert queue? ginHeapTupleFastCollect and ginEntryInsert checked tuple's size for TOAST_INDEX_TARGET, but ginHeapTupleFastCollect checks without one ItemPointer, as ginEntryInsert does it. So ginHeapTupleFastCollect could produce a tuple which 6-bytes larger than allowed by ginEntryInsert. ginEntryInsert is called during pending list cleanup. Patch removes checking of TOAST_INDEX_TARGET and use checking only by GinMaxItemSize which is greater than TOAST_INDEX_TARGET. All size's check is now in GinFormTuple. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ patch.gz Description: Unix tar archive -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Vacuumdb Fails: Huge Tuple
Looks reasonable, although since the error is potentially user-facing I think we should put a bit more effort into the error message (use ereport and make it mention the index name, at least --- is there any other useful information we could give?) Only sizes as it's done in BTree, I suppose. Will you apply this, or do you want me to? I'm not able to provide a good error message in good English :( -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Prefix search on all hstore values
Hi! Full-text search has this feature. # select to_tsvector('en_name=>yes, fr_name=>oui'::hstore::text) @@ 'en:*'; ?column? -- t or (index only keys) select to_tsvector(akeys('en_name=>yes, fr_name=>oui'::hstore)::text) @@ 'en:*'; ?column? -- t To speed up this queries you use functional indexes. Albert Chern wrote: Hi, I have an hstore column that stores a string in several arbitrary languages, so something like this: "en" => "string in english", "zh" => "string in chinese", "fr" => "string in french" Is it possible to construct an index that can be used to determine if a query string is a prefix of ANY of the values in the hstore? From reading the documentation the closest I've gotten is a gin index after converting the values to an array, but that doesn't seem to work with prefix searching. Any pointers would be much appreciated! Thanks, Albert -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Prefix search on all hstore values
My requirements can be relaxed to full text search, but the problem I had with that approach is I have strings in Chinese, and postgres doesn't seem to support it. Calling to_tsvector() on Chinese characters always returns an empty vector. Hm, check your locale settings. AFAIK, somebody uses FTS with Chinese language. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] multi-word expression full-text searching
SELECT id FROM document WHERE to_tsvector('english',text) @@ plainto_tsquery('english','despite this'); -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] multi-word expression full-text searching
If I understand well the plainto_tsquery behaviour, this query match with: Despite this, the president went out. Despite the event, this question arise. Right, you mean phrase search. Read the thread: http://archives.postgresql.org/pgsql-hackers/2008-05/msg0.php Suggested patch should be made as module, I think. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] changing text search treatment of puncutation
In general there seem to be a lot of ways that people wish they could tweak the text search parser, and telling them to write their own parser isn't a very helpful response for most folk. I don't have an idea about how to improve the situation, but it seems like something that should be thought about. We (with Oleg) thought hard about it and we don't find a solution yet. Configurable parser should be: - fast - flexible - not error-prone - comfortable to use by non-programmer (at least for non-C programmer) It might be a table-driven state machine (just put TParserStateAction into table(s) with some caching for first step) , but it's complex to operate and it's needed to prove correctness of changes in states before its become in use. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Using ISpell dictionary - headaches...
It *may* be because I'm using psql 8.0.3 and not the latest version (but I'm stucked with that version), i'm just hoping that one of you have met Upgrade to 8.0.17 - there was a several fixes in ISpell code. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] index scan leads to result that is different from sec scan after upgrading to 8.3.4
Hmm. So the problem seems to be statable as "a full-index scan on a GIST index might fail to return all the rows, if the index has been modified since creation". Teodor, can you think of anything you changed recently in that area? Only fixing possible duplicates during index scan. Will see. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] index scan leads to result that is different from sec scan after upgrading to 8.3.4
Hmm. So the problem seems to be statable as "a full-index scan on a GIST index might fail to return all the rows, if the index has been modified since creation". Teodor, can you think of anything you changed recently in that area? I still can't reproduce the bug, but found useless recheck condition with bitmap check: drop table if exists qq; select 1 as st , 1::int4 as t into qq from generate_series(1,1) as t; create index qqidx on qq using gist (st) where t =1; INSERT INTO qq (SELECT (4 * random())::int4, (4 * random())::int4 from generate_series(1,1)); # explain select t, count(1) from qq where t =1 group by t; QUERY PLAN - GroupAggregate (cost=19.62..633.49 rows=1 width=2) -> Bitmap Heap Scan on qq (cost=19.62..630.28 rows=640 width=2) Recheck Cond: (t = 1) -> Bitmap Index Scan on qqidx (cost=0.00..19.46 rows=640 width=0) -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] index scan leads to result that is different from sec scan after upgrading to 8.3.4
Thank you, I reproduce the bug, will fix. Sergey Konoplev wrote: Ok, I've done the test case (see attachment). 8.3.3 has passed it. 8.3.4 hasn't passed in ~99% times I run it. Steps to reproduce: 1. install pg 8.3.4, do initdb, start pg 2. correct PSQL parameter in pg-8.3.4_index_update_test.sh 3. run pg-8.3.4_index_update_test.sh few times And you will see something like this: ... -- 2nd time obtaining "seq-scan" count and plan... -- SELECT table1_flag, count(*) FROM table1 GROUP BY table1_flag; table1_flag | count -+--- 1 | 100 (1 row) EXPLAIN ANALYZE SELECT table1_flag, count(*) FROM table1 GROUP BY table1_flag; QUERY PLAN --- HashAggregate (cost=15.00..15.01 rows=1 width=2) (actual time=0.140..0.140 rows=1 loops=1) -> Seq Scan on table1 (cost=0.00..12.00 rows=600 width=2) (actual time=0.004..0.059 rows=100 loops=1) Total runtime: 0.172 ms (3 rows) -- 2nd time obtaining "index-scan" count and plan... -- SELECT count(*) FROM table1 WHERE table1_flag = 1; count --- 98 (1 row) EXPLAIN ANALYZE SELECT count(*) FROM table1 WHERE table1_flag = 1; QUERY PLAN -- Aggregate (cost=8.27..8.28 rows=1 width=0) (actual time=0.451..0.451 rows=1 loops=1) -> Index Scan using i_table1__table1_point on table1 (cost=0.00..8.27 rows=1 width=0) (actual time=0.011..0.408 rows=98 loops=1) Total runtime: 0.477 ms (3 rows) -- Regards, Sergey Konoplev -- PostgreSQL articles in english & russian http://gray-hemp.blogspot.com/search/label/postgresql/ -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] index scan leads to result that is different from sec scan after upgrading to 8.3.4
Fixed, patch attached. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ diff -c -r src.orig/backend/access/gist/gistget.c src/backend/access/gist/gistget.c *** src.orig/backend/access/gist/gistget.c 2008-10-22 12:07:39.0 +0400 --- src/backend/access/gist/gistget.c 2008-10-22 15:13:23.0 +0400 *** *** 49,55 for (offset = FirstOffsetNumber; offset <= maxoff; offset = OffsetNumberNext(offset)) { ! IndexTuple ituple = (IndexTuple) PageGetItem(p, PageGetItemId(p, offset)); if (ItemPointerEquals(&(ituple->t_tid), iptr)) { --- 49,55 for (offset = FirstOffsetNumber; offset <= maxoff; offset = OffsetNumberNext(offset)) { ! IndexTuple ituple = (IndexTuple) PageGetItem(p, PageGetItemId(p, offset)); if (ItemPointerEquals(&(ituple->t_tid), iptr)) { *** *** 157,163 { while( ntids < maxtids && so->curPageData < so->nPageData ) { ! tids[ ntids ] = scan->xs_ctup.t_self = so->pageData[ so->curPageData ]; so->curPageData ++; ntids++; --- 157,167 { while( ntids < maxtids && so->curPageData < so->nPageData ) { ! tids[ ntids ] = scan->xs_ctup.t_self = so->pageData[ so->curPageData ].heapPtr; ! ItemPointerSet(&(so->curpos), ! BufferGetBlockNumber(so->curbuf), ! so->pageData[ so->curPageData ].pageOffset); ! so->curPageData ++; ntids++; *** *** 251,258 { while( ntids < maxtids && so->curPageData < so->nPageData ) { ! tids[ ntids ] = scan->xs_ctup.t_self = so->pageData[ so->curPageData ]; so->curPageData ++; ntids++; } --- 255,267 { while( ntids < maxtids && so->curPageData < so->nPageData ) { ! tids[ ntids ] = scan->xs_ctup.t_self = ! so->pageData[ so->curPageData ].heapPtr; + ItemPointerSet(&(so->curpos), + BufferGetBlockNumber(so->curbuf), + so->pageData[ so->curPageData ].pageOffset); + so->curPageData ++; ntids++; } *** *** 297,309 * we can efficiently resume the index scan later. */ - ItemPointerSet(&(so->curpos), - BufferGetBlockNumber(so->curbuf), n); - if (!(ignore_killed_tuples && ItemIdIsDead(PageGetItemId(p, n { it = (IndexTuple) PageGetItem(p, PageGetItemId(p, n)); ! so->pageData[ so->nPageData ] = it->t_tid; so->nPageData ++; } } --- 306,316 * we can efficiently resume the index scan later. */ if (!(ignore_killed_tuples && ItemIdIsDead(PageGetItemId(p, n { it = (IndexTuple) PageGetItem(p, PageGetItemId(p, n)); ! so->pageData[ so->nPageData ].heapPtr = it->t_tid; ! so->pageData[ so->nPageData ].pageOffset = n; so->nPageData ++; }
Re: [GENERAL] index scan leads to result that is different from sec scan after upgrading to 8.3.4
20 hours to find the fix Teodor, Kudos ! Nothing for the pride :(, my bug. Due to the importance of the fix, will we see very soon a 8.3.5 ? Don't known, see discussion. I think, that will make sense. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Weird problem concerning tsearch functions built into postgres 8.3, assistance requested
One of the tables we're using in the 8.1.3 setups currently running includes phone numbers as a searchable field (fti_phone), with the results of a select on the field generally looking like this: 'MMM':2 '':3 'MMM-':1. MMM is the first three digits, is the fourth-seventh. The weird part is this: On the old systems running 8.1.3, I can look up a record by fti_phone using any of the three above items; first three, last four, or entire number including dash. On the new system running 8.3.1, I can do a lookup by the first three or the last four and get the results I'm after, but any attempt to do a lookup by the entire MMM- version returns no records. Parser was changed: postgres=# select * from ts_debug('123-4567'); alias | description| token | dictionaries | dictionary | lexemes ---+--+---+--++- uint | Unsigned integer | 123 | {simple} | simple | {123} int | Signed integer | -4567 | {simple} | simple | {-4567} (2 rows) postgres=# select * from ts_debug('abc-defj'); alias | description | token | dictionaries | dictionary | lexemes -+-+--++--+ asciihword | Hyphenated word, all ASCII | abc-defj | {english_stem} | english_stem | {abc-defj} hword_asciipart | Hyphenated word part, all ASCII | abc | {english_stem} | english_stem | {abc} blank | Space symbols | -| {} | | hword_asciipart | Hyphenated word part, all ASCII | defj | {english_stem} | english_stem | {defj} Parser in 8.1 threats any [alnum]+-[alnum]+ as a hyphenated word, but 8.3 treats [digit]+-[digit]+ as two separated numbers. So, you can play around pre-process texts before indexing or have a look on regex dictionary (http://vo.astronet.ru/arxiv/dict_regex.html) -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] still gin index creation takes forever
GIN's build algorithm could use bulk insert of ItemPointers if and only if they should be inserted on rightmost page (exact piece of code - dataPlaceToPage() in gindatapage.c, lines 407-427) I'm not following. Rightmost page of what --- it can't be the whole index, can it, or the case would hardly ever apply? GIN's index contains btree over keys (entry tree) and for each key it contains list of ItemPointers (posting list) or btree over ItemPointers (posting tree or data tree) depending on its quantity. Bulk insertion process collects into memory keys and sorted arrays of ItemPointers, and then for each keys, it tries to insert every ItemPointer from array into corresponding data tree one by one. But if the smallest ItemPointer in array is greater than the biggest stored one then algorithm will insert the whole array on rightmost page in data tree. So, in that case process can insert about 1000 ItemPointers per one data tree lookup, in opposite case it does 1000 lookups in data tree. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] still gin index creation takes forever
Any suggestion about how to track down the problem? What you are describing sounds rather like a use-of-uninitialized-memory problem, wherein the behavior depends on what happened to be in that memory previously. If so, using a debug/cassert-enabled build of Postgres might help to make the behavior more reproducible. It seems to me, possible reason of that behavior could be an order of table's scanning. GIN's build algorithm prefers scan from begin to the end of table, but in 8.3 it's not always true - scan may begin from the middle or end of table depending on sequence scan's history. GIN's build algorithm could use bulk insert of ItemPointers if and only if they should be inserted on rightmost page (exact piece of code - dataPlaceToPage() in gindatapage.c, lines 407-427) Is any way to force table's scan from the beginning? -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] still gin index creation takes forever
Yeah, I'm not convinced either. Still, Teodor's theory should be easily testable: set synchronize_seqscans to FALSE and see if the problem goes away. Test suit to reproduce the problem: DROP TABLE IF EXISTS foo; DROP TABLE IF EXISTS footmp; CREATE OR REPLACE FUNCTION gen_array() RETURNS _int4 AS $$ SELECT ARRAY( SELECT (random()*1000)::int FROM generate_series(1,10+(random()*90)::int) ) $$ LANGUAGE SQL VOLATILE; SELECT gen_array() AS v INTO foo FROM generate_series(1,10); VACUUM ANALYZE foo; CREATE INDEX fooidx ON foo USING gin (v); DROP INDEX fooidx; SELECT * INTO footmp FROM foo LIMIT 9; CREATE INDEX fooidx ON foo USING gin (v); DROP INDEX fooidx; On my notebook with HEAD and default postgresql.conf it produce (show only interesting part): postgres=# CREATE INDEX fooidx ON foo USING gin (v); Time: 14961,409 ms postgres=# SELECT * INTO footmp FROM foo LIMIT 9; postgres=# CREATE INDEX fooidx ON foo USING gin (v); LOG: checkpoints are occurring too frequently (12 seconds apart) HINT: Consider increasing the configuration parameter "checkpoint_segments". LOG: checkpoints are occurring too frequently (8 seconds apart) HINT: Consider increasing the configuration parameter "checkpoint_segments". LOG: checkpoints are occurring too frequently (7 seconds apart) HINT: Consider increasing the configuration parameter "checkpoint_segments". LOG: checkpoints are occurring too frequently (10 seconds apart) HINT: Consider increasing the configuration parameter "checkpoint_segments". LOG: checkpoints are occurring too frequently (8 seconds apart) HINT: Consider increasing the configuration parameter "checkpoint_segments". CREATE INDEX Time: 56286,507 ms So, time for creation is 4-time bigger after select. Without "SELECT * INTO footmp FROM foo LIMIT 9;": postgres=# CREATE INDEX fooidx ON foo USING gin (v); CREATE INDEX Time: 13894,050 ms postgres=# CREATE INDEX fooidx ON foo USING gin (v); LOG: checkpoints are occurring too frequently (14 seconds apart) HINT: Consider increasing the configuration parameter "checkpoint_segments". CREATE INDEX Time: 15087,348 ms Near to the same time. With synchronize_seqscans = off and SELECT: postgres=# CREATE INDEX fooidx ON foo USING gin (v); CREATE INDEX Time: 14452,024 ms postgres=# SELECT * INTO footmp FROM foo LIMIT 9; postgres=# CREATE INDEX fooidx ON foo USING gin (v); LOG: checkpoints are occurring too frequently (16 seconds apart) HINT: Consider increasing the configuration parameter "checkpoint_segments". CREATE INDEX Time: 14557,750 ms Again, near to the same time. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] still gin index creation takes forever
We could extend IndexBuildHeapScan's API to support that, but I'm not quite convinced that this is the issue. That extension might be useful for bitmap index too to simplify index creation process. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] still gin index creation takes forever
changing it; I've applied a patch for that. I'm still not quite convinced that Ivan isn't seeing some other issue though. Thank you In the meantime, I noticed something odd while experimenting with your test case: when running with default maintenance_work_mem = 16MB, there is a slowdown of 3x or 4x for the un-ordered case, just as you say. But at maintenance_work_mem = 200MB I see very little difference. This doesn't make sense to me --- it seems like a larger workspace should result in more difference because of greater chance to dump a lot of tuples into the index at once. Do you know why that's happening? I suppose, if maintenance_work_mem is rather big then all data of index accumulates in memory and so it writes at disk at once. With that test's options size of index is equal to 40Mb. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to reduce impact of a query.
The machine in question is a 1GB Ram, AMD 64 with Raid 1 Sata disks. Non standard parts of my postgresql.conf are as follows: max_connections=100 shared_buffers=128MB work_mem=4MB maintenance_work_mem=256MB max_fsm_pages=204800 max_fsm_relations=1500 Any tips appreciated. Pls, show 1) effective_cache_size 2) The query 3) Output of EXPLAIN ANALYZE of query -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [TextSearch] syntax error while parsing affix file
iconv -f windows-1251 -t utf-8 bulgarian.dic >bulgarian_utf8.dict iconv -f windows-1251 -t utf-8 bulgarian.aff >bulgarian_utf8.affix The locale of the database is fr_FR, and its encoding is UTF8. I believe that characters 'И', 'А' (non-ascii) and other cyrillic ones are not acceptable for french locale :( -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [TextSearch] syntax error while parsing affix file
I am using Postrges 8.3.5, and I am trying to install a bulgarian ISpell dictionary (the OpenOffice one) for Textsearch features. flag *A: . > А (this is line 24) . > АТА . > И . > ИТЕ OpenOffice or ISpell? Pls, provide: - link to download of dictionary - Locale and encoding setting of your db -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Incorrect FTS query results with GIN index
Thank you for the report, will see on this weekend Vyacheslav Kalinin wrote: Hello, Basically, I started testing prefix matching in FTS and got into troubles. Self-contained example follows: -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Incorrect FTS query results with GIN index
Basically, I started testing prefix matching in FTS and got into troubles. Self-contained example follows: Thank you, fixed. The reason was in incorrect optimization of GIN scan: GIN reuses scan result for equals key, but comparison of key didn't take into account a difference of scan's strategy. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Incorrect FTS query results with GIN index
Great, thank you! I assume this one goes into 8.4.3, right? Yeah, or apply patch http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/gin/ginscan.c?r1=1.25&r2=1.26 -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] very long update gin index troubles back?
Ответишь ему что-нибудь? Он так мутно излагает, что я ни хрена не понял. Ivan Sergio Borgonovo wrote: I've a table that contain a tsvector that is indexed (gin) and triggers to update the tsvector that should then update the index. This gin index has always been problematic. Recreation and updates were very slow. Now I had to update 1M rows of that table but for columns that doesn't involve the tsvector I dropped the trigger to update the tsvector so that when rows get updated the trigger won't be called so things should be faster... but still it is taking forever. begin; set constraints all deferred; select * from FT1IDX_trigger_drop(); update catalog_items set APrice=p.PrezzoA, BPrice=p.PrezzoB from import.catalog_prices p where catalog_items.ItemID=p.id; select * from FT1IDX_trigger_create(); commit; function are used since I've 2 triggers actually that I drop and create. Is there anything wrong in the above to make this update so slow on a 2x Xeon 3.2GHz 4GbRAM and a RAID1 [sic] I know it is slow on write. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] very long update gin index troubles back?
GIN index is slow for update by its construction. When you update the rows with or without columns indexed by GIN, postgres (in most cases) will insert new records, so index insertion will occur. So, for large updates it's much cheaper to drop and create index. That was a one of reasons to develop fast_insert_gin patch which now in review process. Ivan Sergio Borgonovo wrote: I've a table that contain a tsvector that is indexed (gin) and triggers to update the tsvector that should then update the index. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] very long update gin index troubles back?
No matter if I drop the trigger that update agg content and the fact that I'm just updating d, postgresql will update the index? Yes, due to MVCC. Update of row could produce new version (tuple) and new version should be index as old one. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Text search segmentation fault
Could you provide a backtrace? Do you use unchanged norwegian.stop file? I'm not able to reproduce the bug - postgres just works. Tommy Gildseth wrote: While trying to create a new dictionary for use with PostgreSQL text search, I get a segfault. My Postgres version is 8.3.5 -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Text search segmentation fault
How do I make a backtrace? - if you have coredump, just execute gdb /PATH1/postgres gdb /PATH2/core and type bt. Linux doesn't make core by default, so you allow to do it by ulimit -c unlimited for postgresql user - connect to db, and attach gdb to backend process: gdb /PATH1/postgres BACKEND_PID and type run in gdb, next, execute CREATE DICTIONARY and type bt in gdb Teodor Sigaev wrote: Could you provide a backtrace? Do you use unchanged norwegian.stop file? I'm not able to reproduce the bug - postgres just works. Tommy Gildseth wrote: While trying to create a new dictionary for use with PostgreSQL text search, I get a segfault. My Postgres version is 8.3.5 -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Text search segmentation fault
I reproduced the bug with a help of Grzegorz's point for 64-bit box. So, patch is attached and I'm going to commit it -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ *** src/backend/tsearch/spell.c.orig2009-01-29 18:18:03.0 +0300 --- src/backend/tsearch/spell.c 2009-01-29 18:20:09.0 +0300 *** *** 521,527 (errcode(ERRCODE_CONFIG_FILE_ERROR), errmsg("multibyte flag character is not allowed"))); ! Conf->flagval[(unsigned int) *s] = (unsigned char) val; Conf->usecompound = true; } --- 521,527 (errcode(ERRCODE_CONFIG_FILE_ERROR), errmsg("multibyte flag character is not allowed"))); ! Conf->flagval[*(unsigned char*) s] = (unsigned char) val; Conf->usecompound = true; } *** *** 654,660 ptr = repl + (ptr - prepl) + 1; while (*ptr) { ! aflg |= Conf->flagval[(unsigned int) *ptr]; ptr++; } } --- 654,660 ptr = repl + (ptr - prepl) + 1; while (*ptr) { ! aflg |= Conf->flagval[*(unsigned char*) ptr]; ptr++; } } *** *** 735,741 if (*s && pg_mblen(s) == 1) { ! Conf->flagval[(unsigned int) *s] = FF_COMPOUNDFLAG; Conf->usecompound = true; } oldformat = true; --- 735,741 if (*s && pg_mblen(s) == 1) { ! Conf->flagval[*(unsigned char*) s] = FF_COMPOUNDFLAG; Conf->usecompound = true; } oldformat = true; *** *** 791,797 (errcode(ERRCODE_CONFIG_FILE_ERROR), errmsg("multibyte flag character is not allowed"))); ! flag = (unsigned char) *s; goto nextline; } if (STRNCMP(recoded, "COMPOUNDFLAG") == 0 || STRNCMP(recoded, "COMPOUNDMIN") == 0 || --- 791,797 (errcode(ERRCODE_CONFIG_FILE_ERROR), errmsg("multibyte flag character is not allowed"))); ! flag = *(unsigned char*) s; goto nextline; } if (STRNCMP(recoded, "COMPOUNDFLAG") == 0 || STRNCMP(recoded, "COMPOUNDMIN") == 0 || *** *** 851,857 while (str && *str) { ! flag |= Conf->flagval[(unsigned int) *str]; str++; } --- 851,857 while (str && *str) { ! flag |= Conf->flagval[*(unsigned char*) str]; str++; } -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Text search segmentation fault
Than I have quite few notes about that function: - affix is not checked on entry, and should be unsigned, Could be Assert( affix>=0 && affix < Conf->nAffixData ) - for sake of safety uint32_t should be used instead of unsigned int, in the cast see patch - there should be some safety limit for lenght of str, It's a C-string -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Text search segmentation fault
Tom Lane wrote: Teodor Sigaev writes: I reproduced the bug with a help of Grzegorz's point for 64-bit box. Hmm, seems it's not so much a "64 bit" error as a "signed vs unsigned char" issue? Yes, but I don't understand why it worked in 32-bit box. Does this affect the old contrib/tsearch2 code? Will check. Please try to make the commits in the next eight hours, as we have release wraps scheduled for tonight. Minor versions or beta of 8.4? if last, I'd like to commit btree_gin and fast_update_gin. For both patches all pointed issues was resolved and Jeff, seems, haven't objections. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Text search segmentation fault
To be honest, looking through that file, I am quite worried about few points. I don't know too much about insights of ispell, but I see few suspicious things in mkSPNode too. I generally don't want to get involve in reviewing code for stuff I don't know, But if Teodor (and Oleg) don't mind, I can raise my points, and see if anything useful comes out of it. If you see bug/mistake/suspicious point, please, don't be quiet Also, about that patch - it doesn't seem to apply cleanly to 8.4, perhaps that file has changed too much (I based my 'review' above on 8.4's code) will tweak -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Text search segmentation fault
char" issue? Does this affect the old contrib/tsearch2 code? Checked - No, that was improvement for 8.3 :). -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full text index not being used
I tried to create an index including all of the fields I query on to see if that would work, but I get an error the the index row is too large: => create index master_index on source_listings(geo_lat, geo_lon, price, bedrooms, region, city, listing_type, to_tsvector('english', full_listing), post_time); It's not a fulltext index - btree doesn't support @@ operation. Read carefully: http://www.postgresql.org/docs/8.3/static/textsearch.html , and about full text indexes: http://www.postgresql.org/docs/8.3/static/textsearch-tables.html , http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] multicolumn GIST index question
Did anyone get multi-column GIST indexes working using both the gist_btree and postgis modules? It must. fl=# -- sessionid is a text; the_geom is a GEOMETRY fl=# create index testidx2 on user_point_features using gist (sessionid,the_geom); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. fl=# fl=# fl=# fl=# create table test3(a text, b text, c text); CREATE TABLE fl=# select addgeometrycolumn ('','test3','the_geom','-1','POINT',2); addgeometrycolumn -- public.test3.the_geom SRID:-1 TYPE:POINT DIMS:2 geometry_column fixed:0 (1 row) fl=# create index idx_text_text on test3 using gist (a,b); CREATE INDEX fl=# create index idx_text_geom on test3 using gist (a,the_geom); CREATE INDEX fl=# Is a sessionid from user_point_features table unique? Pls, try to compile database with --enable-cassert --enable-debug and send gdb output. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] Tsearch2 doesn't work
You shoud run configure script in a root pgsql's source tree. Supiah Mustaffa wrote: Hi, I want to install Tsearch2 and follow this command : 1.cd contrib/tsearch2 2. make; make install It doesn't work and display this messages : Makefile:5: ../../src/Makefile.global: No such file or directory Makefile:47: /contrib/contrib-global.mk: No such file or directory make: *** No rule to make target `/contrib/contrib-global.mk'. Stop. Makefile:5: ../../src/Makefile.global: No such file or directory Makefile:47: /contrib/contrib-global.mk: No such file or directory make: *** No rule to make target `/contrib/contrib-global.mk'. Stop. What it's means? What should I do? Please help me. Thanks. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] If an index is based on 3 columns will a query using
Greg Stark suggests here: http://archives.postgresql.org/pgsql-hackers/2005-05/msg00966.php that GiST could also be fixed to work with any subset of the index columns, but it hasn't been done yet, unless Teodor and Oleg snuck something in during that last round of GiST work. GiST may work with any subset of index columns too. Even in existing code I don't see any problem except NULL in a first column. GiST doesn't store tuples with leading NULL value (gist.c lines 174, 326), so index doesn't contained them. After our work about WAL-lization GiST, it may work with "invalid" tuples (possibly occured after crash recovery), so itsn't a big deal to add support NULL in a first column. But freeze date is outdated... Should I add or leave it to 8.2? -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] TSearch2 snowball version error
Snowball changes interfaces, I'll update tsearh2 sources today or tommorow. William Leite Araújo wrote: Hi, I'm trying compile a new brazilian portuguese dictionary to TSearch2 contrib, but found the errors: portuguese_stem.c: In function `r_prelude': portuguese_stem.c:481: error: void value not ignored as it ought to be portuguese_stem.c:487: error: void value not ignored as it ought to be portuguese_stem.c: In function `r_postlude': portuguese_stem.c:610: error: void value not ignored as it ought to be portuguese_stem.c:616: error: void value not ignored as it ought to be portuguese_stem.c: In function `r_standard_suffix': portuguese_stem.c:662: error: void value not ignored as it ought to be portuguese_stem.c:672: error: void value not ignored as it ought to be portuguese_stem.c:682: error: void value not ignored as it ought to be portuguese_stem.c:692: error: void value not ignored as it ought to be portuguese_stem.c:702: error: void value not ignored as it ought to be portuguese_stem.c:715: error: void value not ignored as it ought to be portuguese_stem.c:729: error: void value not ignored as it ought to be portuguese_stem.c:744: error: void value not ignored as it ought to be portuguese_stem.c:760: error: void value not ignored as it ought to be portuguese_stem.c:775: error: void value not ignored as it ought to be portuguese_stem.c:791: error: void value not ignored as it ought to be portuguese_stem.c:806: error: void value not ignored as it ought to be portuguese_stem.c:818: error: void value not ignored as it ought to be portuguese_stem.c:832: error: void value not ignored as it ought to be portuguese_stem.c: In function `r_verb_suffix': portuguese_stem.c:856: error: void value not ignored as it ought to be portuguese_stem.c: In function `r_residual_suffix': portuguese_stem.c:880: error: void value not ignored as it ought to be portuguese_stem.c: In function `r_residual_form': portuguese_stem.c:902: error: void value not ignored as it ought to be portuguese_stem.c:929: error: void value not ignored as it ought to be portuguese_stem.c:935: error: void value not ignored as it ought to be portuguese_stem.c: In function `portuguese_ISO_8859_1_stem': portuguese_stem.c:993: error: void value not ignored as it ought to be make: ** [portuguese_stem.o] Erro 1 This after change the included file "header.h", the old "../runtime/header.h" is invalid. I think that is a version trouble, because the return of functions "slice_from_s" and "slice_del" are "int" in current snowball portuguese files, but on include files of version 8.0.2 and 8.0.3 of Portgresql the return type is "void". Help-me, please. -- William Leite Araújo -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] TSearch2 snowball version error
Fixed and commited in cvs. Patches for already existsing versions : http://www.sigaev.ru/gist/patch_snowball-7.4.gz http://www.sigaev.ru/gist/patch_snowball-8.0.gz I'm trying compile a new brazilian portuguese dictionary to TSearch2 contrib, but found the errors: -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] re : tsearch2 problem
because of its way of "Inverse Order OF Idexing",and "The GiST Tree is well for larting lan but doesn't for chinese lan(also ineffifent for Japanese lan)"。 About japanese look at http://www.oss.ecl.ntt.co.jp/tsearch2j/ (it's on Japanese, but you can contact with developer) I only know that GiST is also "Inverse Order OF Idexing",but of course it is a tree NOT as lucene does Tsearch2 doesn't use inverted index (yet), and so it easy to update (your index is always full - online indexing), but it hasn't good performance on large data set. The limit is about 10^5 - 10^6 rows in practice. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] really thanks,Teodor Sigaev. HOW ts2 implment that
I know the LUCENE " inverted index " is just like follows: keyword doc_id[frequence] location guangzhou 1[2] 3,6 he 2[1] 1 i 1[1] 4 live 1[2],2[1] 2,5,2 shanghai 2[1] 3 Yes so ,before I thought that A node In GiST tree is very like the above : In the tree's node , it has several value for storing its's turple_id[frequence] location ,and that is why I thoutht the " BUT now , I'm lost again ,how does Gist tree store lexem and how to find his correspoding tuples in database( including frequence location)? Tsearch2 use "direct" index: each leaf tuple in tree represent document, internal tuples represents union of documents. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Fuzzy text search
jennyw wrote: We've heard that PostgreSQL can do fuzzy search, but haven't had much luck. I'm brand new to PostgreSQL, so this might be completely obvious for an experienced user. Are there any how-tos on fuzzy text searching? Someone said to try using tsearch2, but it seems that it does full-text searching, but not fuzzy search. Ideally, we'd like to use that w/ a fuzzy search option. In particular, if someone searches for "imat" we want to return results including "immaterial" and "imaterial" (so misspellings plus partial match). Look at contrib/pg_trgm. If you want fulltext search with mispelling correction then read "Tsearch2 Integration" in contrib/pg_trgm/README. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] tsearch2: more than one index per table?
ERROR: could not find tsearch config by locale UPDATE t SET idxA=to_tsvector('default', a); Is it working select to_tsvector('foo bar')? I suppose, no. In that case tsearch can't find configuration for current database locale, update pg_ts_cfg.locale in wished row to correct value. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] TSearch2 / German compound words / UTF-8
Tsearch/isepll is not able to break this word into parts, because of the "s" in "Produktion/s/intervall". Misspelling the word as "Produktionintervall" fixes it: It should be affixes marked as 'affix in middle of compound word', Flag is '~', example look in norsk dictionary: flag ~\\: [^S] >S #~ advarsel > advarsels- BTW, we develop and debug compound word support on norsk (norwegian) dictionary, so look for example there. But we don't know Norwegian, norwegians helped us :) The second thing is with UTF-8: I know there is no, or no full support yet, but I need to get it as good as it's possible /now/. Is there anything in CVS that I might be able to backport to my version or other tips? My setup works, as for the dict and the stop word files, but I fear the stemming and mapping of umlauts and other special chars does not as it should. I tried recoding the german.aff to UTF-8 as well, but that breaks it with an regex error sometimes: Now in CVS it is deep alpha version and now only text parser is UTF-compliant, we continue development... fts=# SELECT ts_debug('dass'); ERROR: Regex error in '[^sãŸ]$': brackets [] not balanced CONTEXT: SQL function "ts_debug" statement 1 This seems while it tries to map ss to ß, but anyway, I fear, I didn't anything good with that. As suggested in the "Tsearch2 and Unicode/UTF-8" article I have a second snowball dict. The first lines of the stem.h I used start with: extern struct SN_env * german_ISO_8859_1_create_env(void); Can you use ISO-8859-1? So I guess this will not work exactly well with UTF-8 ;p Is there any other stem.h I could use? Google hasn't returned much for me :/ http://snowball.tartarus.org/ Snowball can generate UTF parser: http://snowball.tartarus.org/runtime/use.html: F1 [-o[utput] F2] [-s[yntax]] [-w[idechars]] [-u[tf8]] < that's it! [-j[ava]] [-n[ame] C] [-ep[refix] S1] [-vp[refix] S2] [-i[nclude] D] [-r[untime] P] At least for Russian there is 2 parsers, for KOI8 and UTF, ( http://snowball.tartarus.org/algorithms/russian/stem.sbl http://snowball.tartarus.org/algorithms/russian/stem-Unicode.sbl ), diff shows that they different only in stringdef section. So you can make UTF parser for german. BUT, I'm afraid that Snowball uses widechar, and postgres use multibyte for UTF internally. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] cyclical redundancy checksum algorithm(s)?
You sure that's actually what he said? A change in CRC proves the data changed, but lack of a change does not prove it didn't. "To quickly determine if rows have changed, we rely on a cyclic redundancy checksum (CRC) algorithm. If the CRC is identical for the > "summary" functions, such as an MD5 hash. I wouldn't trust it at all with a 32-bit CRC, and not much with a 64-bit CRC. Too much risk of collision. Small example of collisions for crc32: 0x38ee5531 Hundemarke 92294 0x59471e4f raciner tranchefiler 0x947bb6c0 Betriebsteile 4245 I had make a lot of work when choosing hash function for tsearch2. Also, I had find that popular hash algorithms produce more collision for non-ascii languages... CRC32 is more "smooth". On dictionary with 332296 unique words CRC32 produces 11 collisions, perl's hash function - 35, pgsql's hash_any - 12. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Full Text fuzzy search
Play around contrib/tsearch2 and contrib/pg_trgm Michael Vodep wrote: Hi How can i do a full text fuzzy search with PgSQL? For example: i got a text domain. I search for Postgresql He should return a result for 'Postgresql' if i enter following values: PostgreSQL, Postgresql, postgresql but also bostgresql, posdgresql, bosdgresql Is this possible? Regrads michael ---(end of broadcast)--- TIP 6: explain analyze is your friend -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Full Text fuzzy search
For the others (bostgresql, posdgresql, bosdgresql) i think u should write a function, that's what we did in our application. contrib/pg_trgm solves it -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] GiST and Gin technical doc
GiST: http://gist.cs.berkeley.edu/ http://www.sai.msu.su/~megera/postgres/gist/papers/concurrency/ GIN: GIN is ordinary inverted index which described in many docs. Look for some implementation details in my presentation on conference: http://www.sigaev.ru/gin/Gin.pdf Yannick Warnier wrote: Hi there, I'm having trouble finding technical documentation about GiST, Gin and TSearch2. I am particularly interested in the internal data structure of a GiST-ed or Gin-ed index, and the availability of searches by proximity. Does anyone know of a good place to find such doc, outside from the source code (I'm not too much into C/C++)? Thank you, Yannick ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] NOTICE: word is too long INSERT 0 3014
I am running into this limitation ALOT with Tsearch2. What are my options to get around it. Do I have to compile PostgreSQL with a different block size? If yes, what are the downsides to doing so (outside of not being able to do straight upgrades)? If you really need that, your should play around WordEntry definition (tsvector.h). Sorry, right now I haven't possibility to look closer, just tomorrow. Limit for word is equal 2KB - I supposed that is long enough to store any meaningful words. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] NOTICE: word is too long INSERT 0 3014
For example, redefine by follow way: typedef struct { uint32 haspos:1, len:31; uint32 pos; } WordEntry; /* <= 1Gb */ #define MAXSTRLEN ( 1<<30 ) #define MAXSTRPOS ( 1<<30 ) Teodor Sigaev wrote: I am running into this limitation ALOT with Tsearch2. What are my options to get around it. Do I have to compile PostgreSQL with a different block size? If yes, what are the downsides to doing so (outside of not being able to do straight upgrades)? If you really need that, your should play around WordEntry definition (tsvector.h). Sorry, right now I haven't possibility to look closer, just tomorrow. Limit for word is equal 2KB - I supposed that is long enough to store any meaningful words. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/ -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Index greater than 8k
The problem I am after is the 8k index size issue. It is very easy to get a GIST index (especially when using tsearch2) that is larger than that. Hmm, tsearch2 GIST index is specially designed for support huge index entry: first, every lexemes in tsvectore are transformed to hash value (with a help of crc32), second, it's stripped all position infos, third, if size of array is greater than TOAST_INDEX_TARGET then tsearch2 will make bit signature of tsvector. Signature's length is fixed and equals to 252 bytes by default (+ 8 bytes for header of datum). All values on internal pages are represented as signatures below. So, tsearch2 guarantees that index entry will be small enough. If it's not true, then there is a bug - pls, make test suite demonstrating the problem. Is recompiling the block size the option there? What are the downsides, except for the custom build? Can you send exact error message? -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [HACKERS] [GENERAL] Index greater than 8k
The problem as I remember it is pg_tgrm not tsearch2 directly, I've sent a self contained test case directly to Teodor which shows the error. 'ERROR: index row requires 8792 bytes, maximum size is 8191' Uh, I see. But I'm really surprised why do you use pg_trgm on big text? pg_trgm is designed to find similar words and use technique known as trigrams. This will work good on small pieces of text such as words or set expression. But all big texts (on the same language) will be similar :(. So, I didn't take care about guarantee that index tuple's size limitation. In principle, it's possible to modify pg_trgm to have such guarantee, but index becomes lossy - all tuples gotten from index should be checked by table's tuple evaluation. If you want to search similar documents I can recommend to have a look to fingerprint technique (http://webglimpse.net/pubs/TR93-33.pdf). It's pretty close to trigrams and metrics of similarity is the same, but uses another signature calculations. And, there are some tips and trics: removing HTML marking,removing punctuation, lowercasing text and so on - it's interesting and complex task. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [GENERAL] Index greater than 8k
We are trying to get something faster than ~ '%foo%'; Which Tsearch2 does not give us :) Hmm, why not with GIN? :) Although, right now GIN doesn't support prefix search, it's possible to use it. Brain storm method: Develop a dictionary which returns all substring for lexeme, for example for word foobar it will be 'foobar fooba foob foo fo oobar ooba oob oo obar oba ob bar ba ar'. And make GIN functional index over your column (to save disk space). So, your query will be looked as select ... where to_tsvector(text_column) @@ 'foo'; Notices: Time of search in GIN weak depend on number of words (opposite to tsearch2/GiST), but insertion of row may be slow enough -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] tsearch2() with data from other table
CREATE FUNCTION euits(int) RETURNS text AS 'select username || \' \' || firstname || \' \' || lastname from site_user where objectid = $1;' LANGUAGE SQL; CREATE TRIGGER site_item_fts BEFORE UPDATE OR INSERT ON site_item FOR EACH ROW EXECUTE PROCEDURE tsearch2(idxfti, name, keywords, keywords_cb, location_country, location_city, media_source, description, euits, site_user); site_user is a table, isn't it? tsearch2 trigger accepts only column's names and functions with prototype TEXT func(TEXT). For clarify, show your table's definitions. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] tsearch2() with data from other table
CREATE FUNCTION euits(int) RETURNS text AS 'select username || \' \' || firstname || \' \' || lastname from site_user where objectid = $1;' LANGUAGE SQL; CREATE TRIGGER site_item_fts BEFORE UPDATE OR INSERT ON site_item FOR EACH ROW EXECUTE PROCEDURE tsearch2(idxfti, name, keywords, keywords_cb, location_country, location_city, media_source, description, euits, site_user); So, when updating site_item I want to fetch the names from site_user and give this data to tsearch2() along with other stuff from site_item. Sorry, current interface of tsvector aggregate doesn't support it. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Using GIN indexes on 8.2
Alexander Staubo wrote: Two questions about GIN on 8.2. There's not much documentation about GIN, but this should be possible: create table foo (values text[]); create index foo_values_index on foo using gin (text); However, this then fails saying the operator "@" does not exist: select * from foo where values @ '{hello, world}' Use @>, <@ operations instead of @ and ~ Look for discussions in -hackers for reasons of changing names Do I need to reference a specific opclass when creating the index? From the documentation I got the impression that GIN bundled operators for most built-in types. if there is a default opclass for your datatype - you may do not specify. Secondly, are GIN indexes immutable and (unlike Tsearch2) non-lossy and therefore useful with functional indexes? I would like to do this: create table bar (value text); create index bar_value_index on bar using gin (analyze(value)); where analyze() is a function of my own that tokenizes, stems and filters the text into a text[] array. Be careful - select pg_opclass.opcname, pg_operator.oprname, pg_amop.amopreqcheck from pg_opclass, pg_operator, pg_amop, pg_am where pg_operator.oid = pg_amop.amopopr and pg_opclass.oid = pg_amop.amopclaid and pg_opclass.opcamid = pg_am.oid and pg_am.amname='gin' and pg_opclass.opcname='_text_ops'; opcname | oprname | amopreqcheck ---+-+-- _text_ops | && | f _text_ops | @> | f _text_ops | <@ | t _text_ops | = | t (4 rows) So, operations <@ and = will recheck result with table's row. Pls, why don't you use tsearch2 with GIN? -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Using GIN indexes on 8.2
Use @>, <@ operations instead of @ and ~ Look for discussions in -hackers for reasons of changing names Ah, many thanks. How about updating those web pages? :) Now they are in core: http://developer.postgresql.org/pgdocs/postgres/functions-array.html Pls, why don't you use tsearch2 with GIN? Perhaps -- is there any documentation on this? Nothing special, just use GIN index instead of GiST. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] PG_MODULE_MAGIC check in 8.2
#ifdef PG_MODULE_MAGIC PG_MODULE_MAGIC; #endif solves your problem -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Functional Index
use varchar_pattern_ops operator class, LIKE cannot use varchar_ops for non-C locales. Alexander Presber wrote: Hello everybody, I am trying to speed up a query on an integer column by defining an index as follows > CREATE INDEX idx_main_subject ON pdb.main (lower(main_subject::text) using varchar_ops); on column "main_subject". I had hoped to get speedups for right-fuzzy LIKE-searches, but EXPLAIN ANALYZE yields that the index is not used: > EXPLAIN ANALYZE SELECT COUNT(*) FROM pdb.main WHERE lower(main_subject::text) LIKE lower('10%'::text); QUERY PLAN Aggregate (cost=137759.92..137759.93 rows=1 width=0) (actual time=3421.696..3421.697 rows=1 loops=1) -> Seq Scan on main (cost=0.00..137727.17 rows=13096 width=0) (actual time=0.036..3300.961 rows=77577 loops=1) Filter: (lower((main_subject)::text) ~~ '10%'::text) Total runtime: 3421.751 ms (4 Zeilen) Am I misunderstanding the concept of functional indexes? Is there another way to achieve Any help is greatly appreciated. Yours, Alexander Presber ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] tsearch to spellcheck
You can use lexize() function from tsearch2 (with properly configured ispell dictionary) - if it returns not NULL value the word is ok. Also have a look to pg_trgm contrib modle. SunWuKung wrote: Is it possible to use tsearch2 to check spelling? Something like a function that takes a single word (and a language id) and returns true if spelled correctly false otherwise. Thanks. Balazs ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] postgresql 8.2 rc1 - crash
While I'm downloading your file, pls, do follow: gdb /usr/local/pgsql/bin/postgres your_core_file If it's needed, change path to postgres file. In gdb, type # bt and send output hubert depesz lubaczewski wrote: hi, i have been testing 8.2 rc1, while i got this problem. base data: linux, 32bit, kernel: 2.6.18.3 <http://2.6.18.3>; debian postgresql version: PostgreSQL 8.2rc1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.5 (Debian 1:3.3.5-13) problematic table is over 2gigabytes in size, and has several indices - one of them is gin-index. problem: when i issue vacuum full verbose analyze it works, but then crashes with signal 11. always in the same situation. i was not able to check what is the reason. i did: recompile with debug, set ulimit -c unlimited, and rerun the query. it crashed. i bundled: 1. logs 2. core file 3. config of postgresql 4. saved output of vacuum all of this can be fetched from: http://depesz.com/various/crash.data.tar.bz2 unfortunatelly i'm not c programmer, so i dont know gdb, but i hope you will be ab le to make any sense out of it. the bz2 file is > 20mb in size. any help? is it hardware problem? or a missed bug in code? if i can provide you with more information - please tell me what i should tell you. depesz -- http://www.depesz.com/ - nowy, lepszy depesz -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] postgresql 8.2 rc1 - crash
#1 0x080bc224 in PageDeletePostingItem (page=0xb28039a0 "\020", offset=53719) at gindatapage.c:291 #2 0x080bf558 in ginDeletePage (gvs=0xbfc2ab80, deleteBlkno=29194, leftBlkno=29059, parentBlkno=70274, myoff=351, isParentRoot=0 '\0') at ginvacuum.c:268 Are you sure about your hardware? myoff in ginDeletePage() and offset in PageDeletePostingItem are the same variable... Pls, send to me postgres file itself - just core isn't very useful for debug. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] postgresql 8.2 rc1 - crash
I reproduce a problem with small script: print <5000 and i<40; VACUUM FULL ANALYZE qq; EOT So, I'm digging now... -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster