Re: [GENERAL] jsonb value retrieval performance

2015-09-08 Thread Teodor Sigaev

and I am trying to get value via  jsonb->parentKey->childKey
it seems it is very slow.
Would it be actually faster to use top level key only and parse it at client 
side?


Suppose, most time is spent for decompressing huge value, not for actual search 
inside jsonb. If so, we need to implement some search method which decompress 
some chunks of jsonb.



Could you send to me an example of that jsonb?



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] jsonb value retrieval performance

2015-09-08 Thread Teodor Sigaev

Suppose, most time is spent for decompressing huge value, not for actual search
inside jsonb. If so, we need to implement some search method which decompress
some chunks of jsonb.

On artificial example:
%SAMP IMAGE  FUNCTION CALLERS
 92.9 postgres   pglz_decompress  toast_decompress_datum


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] jsonb value retrieval performance

2015-09-09 Thread Teodor Sigaev

  does it read in the whole jsonb tree structure in memory
and get to v1  or it has some optimization so only get v1 instead
of reading in the whole structure.


it reads, untoasts and uncompresses whole value and then executes search. An 
idea to fix that is a reading jsonb value by only needed chunks.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] MongoDB 3.2 beating Postgres 9.5.1?

2016-07-19 Thread Teodor Sigaev

CREATE INDEX json_tables_idx ON json_tables USING GIN (data jsonb_path_ops);
Bitmap Heap Scan on json_tables  (cost=113.50..37914.64 rows=1 width=1261)
(actual time=2157.118..1259550.327 rows=909091 loops=1)
Recheck Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)
Rows Removed by Index Recheck: 4360296
Heap Blocks: exact=37031 lossy=872059

Hmm, looks like too small work_mem because lossy heap block count  is too big.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Test CMake build

2016-02-12 Thread Teodor Sigaev

I tried it on FreeBSD 64-bit, 16Gb, SSD, Core i7

( ./configure && gmake all; )  168,99s user 15,46s system 97% cpu 3:09,61 total
( cmake . && gmake all; )  75,11s user 11,34s system 100% cpu 1:26,30 total

Cmake 2 times faster, that is good, but I don't understand why. Which 
optimization level does cmake buld use by default? Which compiler does it take? 
It's not obvious, because cmake build hides actual compiler command line.


Yury, pls, return back check target...
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Test CMake build

2016-02-12 Thread Teodor Sigaev



Teodor Sigaev wrote:

I tried it on FreeBSD 64-bit, 16Gb, SSD, Core i7

( ./configure && gmake all; )  168,99s user 15,46s system 97% cpu 3:09,61 total
( cmake . && gmake all; )  75,11s user 11,34s system 100% cpu 1:26,30 total
( CFLAGS='-O2' cmake . && gmake all; )  141,87s user 12,18s system 97% cpu 
2:37,40 total


Oops, cmake default target is compiled with -O0. With -O2 cmake is still faster 
but not so much.




Cmake 2 times faster, that is good, but I don't understand why. Which
optimization level does cmake buld use by default? Which compiler does it take?
It's not obvious, because cmake build hides actual compiler command line.

Yury, pls, return back check target...


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Test CMake build

2016-02-12 Thread Teodor Sigaev

Hm, I don't think having the compile/link lines be hidden up is
acceptable.  Many times we need to debug some compile problem, and the
output is mandatory.


+1

Although it could be fixed by
VERBOSE=1 make

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] crash creating tsearch2 index

2007-05-28 Thread Teodor Sigaev

Could you provide a test suite?

John DeSoi wrote:

Hi,

I'm trying to dump and restore a copy of a database in the same cluster. 
pg_restore would abort when creating a tsearch2 gist index. So I dumped 
to text removed the CREATE INDEX commands and tried to do that at the 
end after the rest of the database was loaded. I still have the same 
problem:


CREATE INDEX song_tsx_title_idx ON song USING gist (tsx_title 
public.gist_tsvector_ops);

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

This is pg 8.0.8 in a shared hosting environment, so I don't have a lot 
of options for tweaking. Is there a known work-around for this?


Thanks,




John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] opclass for real[]

2007-05-30 Thread Teodor Sigaev
ERROR:  data type real[] has no default operator class for access method 
"gist"
HINT:  You must specify an operator class for the index or define a 
default operator class for the data type.

There is operator class for GIN for real[].
http://www.postgresql.org/docs/8.2/static/xindex.html#XINDEX-GIN-ARRAY-STRAT-TABLE



Is there a opclass defined in 8.2 or I have to create one. In either 
case can you please give a link for information on opclasses.


Thanks
Abhang


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] TSEARCH2: disable stemming in indexes and triggers

2007-05-31 Thread Teodor Sigaev
I found out that using 'simple' instead of 'default' when using 
to_tsvector() does excactly that, but I don't know how to change my 
triggers and indexes to keep doing the same (using 'simple'). 


Suppose, your database is initialized with C locale. So, just mark 
simple configuration as default:


# update pg_ts_cfg set locale=null where ts_name='default';
# update pg_ts_cfg set locale='C' where ts_name='simple';

If your locale setting is not C then mark needed configuration with your 
 locale.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] warm standby server stops doing checkpoints afterawhile

2007-06-01 Thread Teodor Sigaev

<2007-06-01 13:11:29.365 CEST:%> DEBUG:  0: Ressource manager (13)
has partial state information

To me, this points clearly to there being an improperly completed action
in resource manager 13. (GIN) In summary, it appears that there may be
an issue with the GIN code for WAL recovery and this is effecting the
Warm Standby.


Hmm. I found that gin_xlog_cleanup doesn't reset incomplete_splits list. Is it 
possible reason of bug?


Attached patch fixes it.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/
*** ./src/backend/access/gin/ginxlog.c.orig Fri Jun  1 16:47:47 2007
--- ./src/backend/access/gin/ginxlog.c  Fri Jun  1 16:53:47 2007
***
*** 594,599 
--- 594,600 
  
MemoryContextSwitchTo(topCtx);
MemoryContextDelete(opCtx);
+   incomplete_splits = NIL;
  }
  
  bool

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] warm standby server stops doing checkpointsafterawhile

2007-06-01 Thread Teodor Sigaev

<2007-06-01 16:28:51.708 CEST:%> LOG:  GIN incomplete split root:8
l:45303 r:111740 at redo CA/C8243C28

...

<2007-06-01 16:38:23.133 CEST:%> LOG:  GIN incomplete split root:8
l:45303 r:111740 at redo CA/C8243C28


Looks like a bug in GIN. I'll play with it. Can you provide more details about 
your test?



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] warm standby server stops doingcheckpointsafterawhile

2007-06-01 Thread Teodor Sigaev



I'd suggest we throw an error, as shown in the enclosed patch. Frank,
can you give that a whirl to provide Teodor with something more to work
with? Thanks.


I already makes a test suite which reproduce the problem - it leaves incompleted 
splits. But I discover one more problem: deadlock on buffer's lock. So, right 
now I investigate the problem.




Neither GIST nor B-tree seems to throw an error in corresponding
locations also, so the potential for not being able to track this is
high. I'd want to throw errors in those locations also.


Agreed, I'll add more check
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] warm standby server stops doingcheckpointsafterawhile

2007-06-01 Thread Teodor Sigaev


Found a reason: if parent page is fully backuped after child's split then 
forgetIncompleteSplit() isn't called at all.


Hope, attached patch fix that. Pls, test it.

PS I'm going away for weekend, so I'll not be online until Monday.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/


patch_wal_gin.gz
Description: Unix tar archive

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] warm standby server stops doingcheckpointsafterawhile

2007-06-04 Thread Teodor Sigaev

<2007-06-01 23:00:00.001 CEST:%> LOG:  GIN incomplete splits=8


Just to be sure: patch fixes *creating* of WAL log, not replaying. So, primary 
db should be patched too.


During weekend I found possible deadlock in locking protocol in GIN between 
concurrent UPDATE and VACUUM queries with the same GIN index involved. Strange, 
but I didn't see it in 8.2 and even now I can't reproduce it. It's easy to 
reproduce оnly on HEAD with recently added ReadBufferWithStrategy() call instead 
of ReadBuffer(). ReadBufferWithStrategy() call was added to implement 
limited-size "ring" of buffers for VACUUM. Nevertheless, it's a possible 
scenario in 8.2.


Attached patch fixes that deadlock bug too. And, previous version of my patch 
has a mistake which is observable on CREATE INDEX .. USING GIN query.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/


patch_wal_gin.v6.gz
Description: Unix tar archive

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] warm standby server stops doingcheckpointsafterawhile

2007-06-04 Thread Teodor Sigaev

1. After a certain point, consecutive GIN index splits cause a problem.
The new RHS block numbers are consecutive from 111780+

That's newly created page. Splitted page might have any number



2. The incomplete splits stay around indefinitely after creation and we
aren't trying to remove the wrong split at any point. We're either never
creating an xlog record, or we are ignoring it in recovery, or we are
somehow making multiple entries then not removing all of them.

Agreed



3. The root seems to move, which isn't what I personally was expecting
to see. It seems root refers to the highest parent involved in the
split.
root in this context means parent of splitted page. Actually, there is a lot of 
B-tree in GIN, see http://www.sigaev.ru/gin/GinStructure.pdf


4. We're writing lots of redo in between failed page splits. So *almost*
everything is working correctly.

5. This starts to happen when we have very large indexes. This may be
coincidental but the first relation file is fairly full (900+ MB).


Yes. It seems to me that conditions of error are very rare and B-tree over 
ItemPointers (second level of GIN) has a big capacity, 1000+ items per page. So, 
splits occur rather rare.




--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] warm standby server stops doingcheckpointsafterawhile

2007-06-04 Thread Teodor Sigaev

Ooops. Patch doesn't apply cleanly. New version.


Attached patch fixes that deadlock bug too. And, previous version of my 
patch has a mistake which is observable on CREATE INDEX .. USING GIN query.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/


patch_wal_gin.v7.gz
Description: Unix tar archive

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] warm standby server stops doingcheckpointsafterawhile

2007-06-04 Thread Teodor Sigaev

After some observation of massive reindexing of some hundred thousand
data sets it seems to me that the slave doesn't skip checkpoints
anymore. (Apart from those skipped because of the CheckpointTimeout thing)
I'll keep an eye on it and report back any news on the issue.


Nice, committed. Thank for your report and testing.
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Regression - Query requires full scan, GIN doesn't support it

2007-06-22 Thread Teodor Sigaev

Is this a permanent limitation of GIN, or is a fix possible?
Permanent. You could check user input by querytree() function --- if it returns 
'T' string then fullscan will be needed. If your tsquery is produced by 
plainto_tsquery() call then it will not find any result, so you can show to user 
void page.



Is a fix being worked on?
If a fix is forthcoming, will it be available in the 8.2 series or only 8.3+?


Possibly, full fix in 8.4. But I will not promise.
8.3 will have protection from queries which doesn't match anything.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] PickSplit method of 2 columns ... error

2007-08-28 Thread Teodor Sigaev
Split page algorithm was rewrited for 8.2 for multicolumn indexes and API for 
user-defined pickSplit function was extended to has better results with index 
creation. But GiST can interact with old functions - and it says about this. 
That isn't mean some real problem or error - index will be the same as in 8.1, 
not better.




Kevin Neufeld wrote:

Has anyone come across this error before?

LOG:  PickSplit method of 2 columns of index 
'asset_position_lines_asset_cubespacetime_idx' doesn't support secondary 
split


This is a multi-column GiST index on an integer and a cube (a data type 
from the postgres cube extension module).


I traced the error to the gistUserPicksplit 
 function in the 
gistsplit.c ... I surmise that this method is called whenever a page 
split is necessary.


So, I know when this error occurs, but I don't know why.

Thoughts anyone?
Cheers,
Kevin



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] tsearch2 anomoly?

2007-09-07 Thread Teodor Sigaev
Usual text hasn't strict syntax rules, so parser tries to recognize most 
probable token.  Something with '.', '-' and alnum characters is often a 
filename, but filename is very rare finished or started by dot.


RC Gobeille wrote:

Thanks and I didn't know about ts_debug, so thanks for that also.

For the record, I see how to use my own processing function (e.g.
dropatsymbol) to get what I need:
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro
.html

However, can you explain the logic behind the parsing difference if I just
add a ".s" to a string:


ossdb=# select ts_debug('gallery2-httpd-2.1-conf.');
ts_debug
---
 (default,hword,"Hyphenated word",gallery2-httpd-2,{simple},"'2' 'httpd'
'gallery2' 'gallery2-httpd-2'")
 (default,part_hword,"Part of hyphenated word",gallery2,{simple},'gallery2')
 (default,lpart_hword,"Latin part of hyphenated
word",httpd,{en_stem},'httpd')
 (default,float,"Decimal notation",2.1,{simple},'2.1')
 (default,lpart_hword,"Latin part of hyphenated word",conf,{en_stem},'conf')
(5 rows)

ossdb=# select ts_debug('gallery2-httpd-2.1-conf.s');
  ts_debug
-
 (default,host,Host,gallery2-httpd-2.1-conf.s,{simple},'gallery2-httpd-2.1-c
onf.s')
(1 row)

Thanks again,
Bob


On 9/6/07 11:19 AM, "Oleg Bartunov" <[EMAIL PROTECTED]> wrote:


This is how default parser works.  See output from
select * from ts_debug('gallery2-httpd-conf');
and
select * from ts_debug('httpd-2.2.3-5.src.rpm');

All token type:

select * from token_type();


On Thu, 6 Sep 2007, RC Gobeille wrote:


I'm having trouble understanding to_tsvector.  (PostreSQL 8.1.9 contrib)

In this first case converting 'gallery2-httpd-conf' makes sense to me and is
exactly what I want.  It looks like the entire string is indexed plus the
substrings broken by '-' are indexed.


ossdb=# select to_tsvector('gallery2-httpd-conf');
 to_tsvector
-
'conf':4 'httpd':3 'gallery2':2 'gallery2-httpd-conf':1


However, I'd expect the same to happen in the httpd example - but it does not
appear to.

ossdb=# select to_tsvector('httpd-2.2.3-5.src.rpm');
  to_tsvector
---
'httpd-2.2.3-5.src.rpm':1

Why don't I get: 'httpd', 'src', 'rpm', 'httpd-2.2.3-5.src.rpm' ?

Is this a bug or design?


Thank you!
Bob

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83





--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] Tsearch2 - spanish

2007-09-18 Thread Teodor Sigaev

prueba=# select to_tsvector('espanol','melón');
ERROR:  Affix parse error at 506 line

and

prueba=# select lexize('sp','melón');
 lexize  
-

 {melon}
(1 row)


Looks very strange, can you provide list of dictionaries and configuration map?


I tried many dictionaries with the same results. Also I change the
codeset of files :aff and dict (from "latin1 to utf8" and "utf8 to
iso88591") and got the same error

where  can I investigate for resolve about this problem?

My dictionary at 506 line had:

Where do you take this file? And what is encdoing/locale setting of your db?

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Tsearch2 - spanish

2007-09-19 Thread Teodor Sigaev

prueba1=# select to_tsvector('espanol','melón  perro mordelón');
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> 



Hmm, can you provide backtrace?

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] Segmentation fault with 8.3 FTS ISpell

2008-01-15 Thread Teodor Sigaev

I tryed to reproduce the bug but without success.
Could you provide a dump of text column?


Hannes Dorbath wrote:
Crash happens about 7 minutes after issuing the UPDATE statement with 
current CVS HEAD. The table has around 5 million rows. It's always 
reproducible.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Segmentation fault with 8.3 FTS ISpell

2008-01-16 Thread Teodor Sigaev

Fixes are committed to CVS, hope, they will help you.
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] postgres 8.3 rc-1 ispell installation problem

2008-01-20 Thread Teodor Sigaev

flag *J:# isimo
   E> -E, 'ISIMO  #  grand'isimo  <-- here  432
E   > -E, 'ISIMOS   # grande grand'isimos
E   > -E, 'ISIMA# grande grand'isima
E   > -E, 'ISIMAS   # grande grand'isimas
O   > -O, 'ISIMO# tonto tont'isimo
O   > -O, 'ISIMA# tonto tont'isima


Current implementation doesn't accept any character in ending except alpha ones.

i think 'I..  word is not correct for ispell, 
this should be one Í letter 

That's right, but you should convert dictionary and affix file in UTF8 encoding.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] full text index and most frequently used words

2008-02-08 Thread Teodor Sigaev

What I'd like to know is if there is an easy to way to use the full
text index to generate a list of the most common words.  I could write
this code manually, but I'm hoping there's a better (simpler) way.


For 8.3
http://www.postgresql.org/docs/8.3/static/textsearch-features.html#TEXTSEARCH-STATISTICS

For versions before 8.3 just use stat() function instead of ts_stat().
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Fragments in tsearch2 headline

2008-03-17 Thread Teodor Sigaev



Teodor, Oleg, do we want this?
http://archives.postgresql.org/pgsql-general/2007-11/msg00508.php


I suppose, we want it. But there are a questions/issues:
- Is it needed to introduce new function? may be it will be better to add option 
to existing headline function. I'd like to keep current layout: ts_headline 
provides some common interface to headline generation. Finding and marking 
fragments is deal of parser's headline method and generation of exact pieces of 
text is made by ts_headline.

- Covers may be overlapped. So, overlapped fragments will be looked odd.


In any case, the patch was developed for contrib version of tsearch.
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Fragments in tsearch2 headline

2008-03-31 Thread Teodor Sigaev

The patch takes into account the corner case of overlap. Here is the
code for that
// start check
if (!startHL && *currentpos >= startpos)
   startHL = 1;

The headline generation will not start until currentpos has gone past
startpos. 

Ok



You can also check how this headline function is working at my website
indiankanoon.com. Some example queries are murder, freedom of speech,
freedom of press etc.

Looks good.


Should I develop the patch for the current cvs head of postgres?


I'd like to commit your patch, but if it should be:
 - for current HEAD
 - as extension of existing ts_headline.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Direct access to GIST structure

2008-04-04 Thread Teodor Sigaev



encode spatial proximity. Is there an API (backend C-level is fine) to
access a GIST index?


The best way is to extend existing interface to GiST to support KNN-search. But 
you can see how to get access to index structure from module in gevel module 
(http://www.sigaev.ru/cvsweb/cvsweb.cgi/gevel/). GiST-related functions in this 
module is invented to help to developers, not for production use, so they 
acquire exclusive lock on index.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Direct access to GIST structure

2008-04-04 Thread Teodor Sigaev

I just stumbled on http://www.cs.purdue.edu/spgist/ which seems like
exactly what I need.


It doesn't work with 8.2 and up, because since 8.2 index should take care about 
concurrent access itself and that implementation doesn't do it.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] tsearch2 on-demand dictionary loading & using functions in tsearch2

2008-05-18 Thread Teodor Sigaev
 * Considering the database is loaded separately for each session, does 
this also imply that each running backend has a separate dictionary 
stored in memory?


Yes.


As for downsides, I only really see two:
 * Tracking updates of dictionaries - but it's reasonable to believe 
that new connections get open more often than the dictionary gets 
updated. Also, this might be easily solved by stat()-ing the dictionary 
file before starting up session, and only have the server reload it if 
there's a notified change.

 * Possibly complicated to implement?


Keeping dictionary up to date - it's a most difficult part here. Configuration 
of dictionary might be done by ALTER command - so, parent process (and all 
currently running backends) should get that information to reload dictionary.



As for my second question, is it possible to use functions in tsearch2? 
For example, writing my own stemmer in PL/pgSQL or in C as a postgres 
function.


Yes, of course, you can develop your dictionary (-ies) and parser. Dut only in 
C, because they are critical for performance.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] tsearch2 on-demand dictionary loading & using functions in tsearch2

2008-05-18 Thread Teodor Sigaev



Hmm, good point; I presume "accept the fact that settings change won't 
propagate to other backends until reconnect" would not be acceptable 
behavior, even if documented along with the relevant configuration option?


I suppose so. That was one of the reasons to move tsearch into core and it will 
be too regrettable to lost that feature again.


As for my second question, is it possible to use functions in 
tsearch2? For example, writing my own stemmer in PL/pgSQL or in C as 
a postgres function.
I've had something different in mind. Considering there are already 
facilities to use functions, be it PL/pgSQL, PL/Python or C, why not 
just use those with the condition that the function must accept 
some-arguments and return some-result? Or would using this, even while 
using C as the language used for the actual parser, slow things down too?


API to dictionary and parser intentionally utilizes complex (and nested) 
C-structures to decrease overheads. During parse of text postgres makes two call 
of parser (one call - parser returns word, second - word delimiter. Space is a 
lexeme too! Although it's not a subject to index) and one call of dictionary per 
word. So, if your language can work with C-structures then you can use that 
language with tsearch with more or less performance pay. PL/pgSQL hasn't this 
capability.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Fragments in tsearch2 headline

2008-05-23 Thread Teodor Sigaev

[moved to -hackers, because talk is about implementation details]


I've ported the patch of Sushant Sinha for fragmented headlines to pg8.3.1
(http://archives.postgresql.org/pgsql-general/2007-11/msg00508.php)

Thank you.

1 > diff -Nrub postgresql-8.3.1-orig/contrib/tsearch2/tsearch2.c
now contrib/tsearch2 is compatibility layer for old applications - they don't
know about new features. So, this part isn't needed.

2 solution to compile function (ts_headline_with_fragments)  into core, but
using it only from contrib module looks very odd. So, new feature can be used
only with compatibility layer for old release :)

3 headline_with_fragments() is hardcoded to use default parser, but what will be
in case when configuration uses another parser? For example, for japanese 
language.

4 I would prefer the signature ts_headline( [regconfig,] text, tsquery [,text] )
and function should accept 'NumFragments=>N' for default parser. Another parsers
may use another options.

5 it just doesn't work correctly, because new code doesn't care of parser
specific type of lexemes.
contrib_regression=# select headline_with_fragments('english', 'wow asd-wow
wow', 'asd', '');
 headline_with_fragments
--
 ...wow asd-wowasd-wow wow
(1 row)


So, I incline to use existing framework/infrastructure although it may be a
subject to change.

Some description:
1 ts_headline defines a correct parser to use
2 it calls hlparsetext to split text into structure suitable for both goals:
find the best fragment(s) and concatenate that fragment(s) back to the text
representation
3 it calls parser specific method   prsheadline which works with preparsed text
(parse was done in hlparsetext). Method should mark a needed
words/parts/lexemes etc.
4 ts_headline glues fragments into text and returns that.

We need a parser's headline method because only parser knows all about its 
lexemes.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] FILLFACTOR for GIN indexes in 8.3.7

2009-08-14 Thread Teodor Sigaev
it seems that I should reduce the Fill Factor of some FTS indexes, but 
what is the default ?
 > The other index methods use fillfactor in different but roughly 
analogous ways;

 > the default fillfactor varies between methods


Actually, GIN doesn't use it.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Vacuumdb Fails: Huge Tuple

2009-10-02 Thread Teodor Sigaev

APseudoUtopia  writes:

Here's what happened:

$ vacuumdb --all --full --analyze --no-password
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming database "web_main"
vacuumdb: vacuuming of database "web_main" failed: ERROR: б═huge tuple



PostgreSQL 8.4.0 on i386-portbld-freebsd7.2, compiled by GCC cc (GCC)
4.2.1 20070719  [FreeBSD], 32-bit
Pls, apply attached patch. Patch increases max size from approximately 500 bytes 
up to 2700 bytes, so vacuum will be able to finish.




This is evidently coming out of ginHeapTupleFastCollect because it's
formed a GIN tuple that is too large (either too long a word, or too
many postings, or both).  I'd say that this represents a serious
degradation in usability from pre-8.4 releases: before, you would have
gotten the error upon attempting to insert the table row that triggers
the problem.  Now, with the "fast insert" stuff, you don't find out
until VACUUM fails, and you have no idea where the bad data is.  Not cool.

Oleg, Teodor, what can we do about this?  Can we split an oversize
tuple into multiple entries?  Can we apply suitable size checks
before instead of after the fast-insert queue?
ginHeapTupleFastCollect and ginEntryInsert checked tuple's size for 
TOAST_INDEX_TARGET, but ginHeapTupleFastCollect checks without one ItemPointer, 
as ginEntryInsert does it. So ginHeapTupleFastCollect could produce a tuple 
which 6-bytes larger than allowed by ginEntryInsert. ginEntryInsert is called 
during pending list cleanup.


Patch removes checking of TOAST_INDEX_TARGET and use checking only by 
GinMaxItemSize which is greater than TOAST_INDEX_TARGET. All size's check is now 
in GinFormTuple.





--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


patch.gz
Description: Unix tar archive

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Vacuumdb Fails: Huge Tuple

2009-10-02 Thread Teodor Sigaev

Looks reasonable, although since the error is potentially user-facing
I think we should put a bit more effort into the error message
(use ereport and make it mention the index name, at least --- is there
any other useful information we could give?)

Only sizes as it's done in BTree, I suppose.


Will you apply this, or do you want me to?


I'm not able to provide a good error message in good English :(


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Prefix search on all hstore values

2013-11-28 Thread Teodor Sigaev

Hi!

Full-text search has this feature.

# select to_tsvector('en_name=>yes, fr_name=>oui'::hstore::text) @@ 'en:*';
 ?column?
--
 t


or (index only keys)

select to_tsvector(akeys('en_name=>yes, fr_name=>oui'::hstore)::text) @@ 'en:*';
 ?column?
--
 t

To speed up this queries you use functional indexes.


Albert Chern wrote:

Hi,

I have an hstore column that stores a string in several arbitrary languages, so
something like this:

"en" => "string in english", "zh" => "string in chinese", "fr" => "string in 
french"

Is it possible to construct an index that can be used to determine if a query
string is a prefix of ANY of the values in the hstore?  From reading the
documentation the closest I've gotten is a gin index after converting the values
to an array, but that doesn't seem to work with prefix searching.  Any pointers
would be much appreciated!

Thanks,
Albert


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Prefix search on all hstore values

2013-11-28 Thread Teodor Sigaev

My requirements can be relaxed to full text search, but the problem I had with
that approach is I have strings in Chinese, and postgres doesn't seem to support
it.  Calling to_tsvector() on Chinese characters always returns an empty vector.



Hm, check your locale settings. AFAIK, somebody uses FTS with Chinese language.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] multi-word expression full-text searching

2008-06-30 Thread Teodor Sigaev


SELECT id FROM document WHERE to_tsvector('english',text) @@
plainto_tsquery('english','despite this');
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] multi-word expression full-text searching

2008-07-01 Thread Teodor Sigaev

If I understand well the plainto_tsquery behaviour, this query match with:
Despite this, the president went out.
Despite the event, this question arise.


Right, you mean phrase search. Read the thread: 
http://archives.postgresql.org/pgsql-hackers/2008-05/msg0.php


Suggested patch should be made as module, I think.



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] changing text search treatment of puncutation

2008-07-03 Thread Teodor Sigaev



In general there seem to be a lot of ways that people wish they
could tweak the text search parser, and telling them to write
their own parser isn't a very helpful response for most folk.
I don't have an idea about how to improve the situation, but
it seems like something that should be thought about.


We (with Oleg) thought hard about it and we don't find a solution yet.
Configurable parser should be:
- fast
- flexible
- not error-prone
- comfortable to use by non-programmer (at least for non-C programmer)

It might be a table-driven state machine (just put TParserStateAction into 
table(s) with some caching for first step) , but it's complex to operate and 
it's needed to prove correctness of changes in states before its become in use.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Using ISpell dictionary - headaches...

2008-07-22 Thread Teodor Sigaev


It *may* be because I'm using psql 8.0.3 and not the latest version (but 
I'm stucked with that version), i'm just hoping that one of you have met 


Upgrade to 8.0.17 - there was a several fixes in ISpell code.
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] index scan leads to result that is different from sec scan after upgrading to 8.3.4

2008-10-20 Thread Teodor Sigaev

Hmm.  So the problem seems to be statable as "a full-index scan on a
GIST index might fail to return all the rows, if the index has been
modified since creation".  Teodor, can you think of anything you
changed recently in that area?


Only fixing possible duplicates during index scan. Will see.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] index scan leads to result that is different from sec scan after upgrading to 8.3.4

2008-10-20 Thread Teodor Sigaev

Hmm.  So the problem seems to be statable as "a full-index scan on a
GIST index might fail to return all the rows, if the index has been
modified since creation".  Teodor, can you think of anything you
changed recently in that area?


I still can't reproduce the bug, but found useless recheck condition with bitmap 
check:


drop table if exists qq;
select 1 as st , 1::int4 as t into qq from generate_series(1,1) as t;
create index qqidx on qq using gist (st) where t =1;
INSERT INTO qq (SELECT (4 * random())::int4, (4 * random())::int4 from 
generate_series(1,1));


# explain select t, count(1) from qq where t =1 group by t;
 QUERY PLAN
-
 GroupAggregate  (cost=19.62..633.49 rows=1 width=2)
   ->  Bitmap Heap Scan on qq  (cost=19.62..630.28 rows=640 width=2)
 Recheck Cond: (t = 1)
 ->  Bitmap Index Scan on qqidx  (cost=0.00..19.46 rows=640 width=0)

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] index scan leads to result that is different from sec scan after upgrading to 8.3.4

2008-10-21 Thread Teodor Sigaev

Thank you, I reproduce the bug, will fix.

Sergey Konoplev wrote:

Ok, I've done the test case (see attachment).

8.3.3 has passed it.
8.3.4 hasn't passed in ~99% times I run it.

Steps to reproduce:
1. install pg 8.3.4, do initdb, start pg
2. correct PSQL parameter in pg-8.3.4_index_update_test.sh
3. run pg-8.3.4_index_update_test.sh few times

And you will see something like this:

...

--
2nd time obtaining "seq-scan" count and plan...
--
SELECT table1_flag, count(*) FROM table1
GROUP BY table1_flag;
 table1_flag | count
-+---
  1 |   100
(1 row)

EXPLAIN ANALYZE SELECT table1_flag, count(*) FROM table1
GROUP BY table1_flag;
   QUERY PLAN
---
 HashAggregate  (cost=15.00..15.01 rows=1 width=2) (actual
time=0.140..0.140 rows=1 loops=1)
  ->  Seq Scan on table1  (cost=0.00..12.00 rows=600 width=2) (actual
time=0.004..0.059 rows=100 loops=1)
 Total runtime: 0.172 ms
(3 rows)

--
2nd time obtaining "index-scan" count and plan...
--
SELECT count(*) FROM table1
WHERE table1_flag = 1;
 count
---
   98
(1 row)

EXPLAIN ANALYZE SELECT count(*) FROM table1
WHERE table1_flag = 1;
 QUERY
PLAN
--
 Aggregate  (cost=8.27..8.28 rows=1 width=0) (actual time=0.451..0.451
rows=1 loops=1)
  ->  Index Scan using i_table1__table1_point on table1
(cost=0.00..8.27 rows=1 width=0) (actual time=0.011..0.408 rows=98
loops=1)
 Total runtime: 0.477 ms
(3 rows)

--
Regards,
Sergey Konoplev
--
PostgreSQL articles in english & russian
http://gray-hemp.blogspot.com/search/label/postgresql/


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] index scan leads to result that is different from sec scan after upgrading to 8.3.4

2008-10-22 Thread Teodor Sigaev

Fixed, patch attached.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/
diff -c -r src.orig/backend/access/gist/gistget.c 
src/backend/access/gist/gistget.c
*** src.orig/backend/access/gist/gistget.c  2008-10-22 12:07:39.0 
+0400
--- src/backend/access/gist/gistget.c   2008-10-22 15:13:23.0 +0400
***
*** 49,55 
  
for (offset = FirstOffsetNumber; offset <= maxoff; offset = 
OffsetNumberNext(offset))
{
!   IndexTuple  ituple = (IndexTuple) 
PageGetItem(p, PageGetItemId(p, offset));
  
if (ItemPointerEquals(&(ituple->t_tid), iptr))
{
--- 49,55 
  
for (offset = FirstOffsetNumber; offset <= maxoff; offset = 
OffsetNumberNext(offset))
{
!   IndexTuple  ituple = (IndexTuple) PageGetItem(p, 
PageGetItemId(p, offset));
  
if (ItemPointerEquals(&(ituple->t_tid), iptr))
{
***
*** 157,163 
{
while( ntids < maxtids && so->curPageData < so->nPageData )
{
!   tids[ ntids ] = scan->xs_ctup.t_self = so->pageData[ 
so->curPageData ];

so->curPageData ++;
ntids++;
--- 157,167 
{
while( ntids < maxtids && so->curPageData < so->nPageData )
{
!   tids[ ntids ] = scan->xs_ctup.t_self = so->pageData[ 
so->curPageData ].heapPtr;
!   ItemPointerSet(&(so->curpos),
!  
BufferGetBlockNumber(so->curbuf), 
!  so->pageData[ 
so->curPageData ].pageOffset);
! 

so->curPageData ++;
ntids++;
***
*** 251,258 
{
while( ntids < maxtids && so->curPageData < 
so->nPageData )
{
!   tids[ ntids ] = scan->xs_ctup.t_self = 
so->pageData[ so->curPageData ];

so->curPageData ++;
ntids++;
}
--- 255,267 
{
while( ntids < maxtids && so->curPageData < 
so->nPageData )
{
!   tids[ ntids ] = scan->xs_ctup.t_self = 
!   so->pageData[ so->curPageData 
].heapPtr;

+   ItemPointerSet(&(so->curpos),
+  
BufferGetBlockNumber(so->curbuf), 
+  
so->pageData[ so->curPageData ].pageOffset);
+ 
so->curPageData ++;
ntids++;
}
***
*** 297,309 
 * we can efficiently resume the index scan 
later.
 */
  
-   ItemPointerSet(&(so->curpos),
-  
BufferGetBlockNumber(so->curbuf), n);
- 
if (!(ignore_killed_tuples && 
ItemIdIsDead(PageGetItemId(p, n
{
it = (IndexTuple) PageGetItem(p, 
PageGetItemId(p, n));
!   so->pageData[ so->nPageData ] = 
it->t_tid;
so->nPageData ++;
}
}
--- 306,316 
 * we can efficiently resume the index scan 
later.
 */
  
if (!(ignore_killed_tuples && 
ItemIdIsDead(PageGetItemId(p, n
{
it = (IndexTuple) PageGetItem(p, 
PageGetItemId(p, n));
!   so->pageData[ so->nPageData ].heapPtr = 
it->t_tid;
!   so->pageData[ so->nPageData 
].pageOffset = n;
so->nPageData ++;
}
   

Re: [GENERAL] index scan leads to result that is different from sec scan after upgrading to 8.3.4

2008-10-22 Thread Teodor Sigaev

20 hours to find the fix Teodor, Kudos !

Nothing for the pride :(, my bug.


Due to the importance of the fix, will we see very soon a 8.3.5 ?

Don't known, see discussion. I think, that will make sense.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Weird problem concerning tsearch functions built into postgres 8.3, assistance requested

2008-10-30 Thread Teodor Sigaev
One of the tables we're using in the 8.1.3 setups currently running 
includes phone numbers as a searchable field (fti_phone), with the 
results of a select on the field generally looking like this: 'MMM':2 
'':3 'MMM-':1.  MMM is the first three digits,  is the 
fourth-seventh.


The weird part is this: On the old systems running 8.1.3, I can look up 
a record by
fti_phone using any of the three above items; first three, last four, or 
entire number including dash.  On the new system running 8.3.1, I can do 
a lookup by the first three or the last four and get the results I'm 
after, but any attempt to do a lookup by the entire MMM- version 
returns no records.


Parser was changed:
postgres=# select  * from ts_debug('123-4567');
 alias |   description| token | dictionaries | dictionary | lexemes
---+--+---+--++-
 uint  | Unsigned integer | 123   | {simple} | simple | {123}
 int   | Signed integer   | -4567 | {simple} | simple | {-4567}
(2 rows)
postgres=# select  * from ts_debug('abc-defj');
  alias  |   description   |  token   |  dictionaries 
|  dictionary  |  lexemes

-+-+--++--+
 asciihword  | Hyphenated word, all ASCII  | abc-defj | {english_stem} 
| english_stem | {abc-defj}
 hword_asciipart | Hyphenated word part, all ASCII | abc  | {english_stem} 
| english_stem | {abc}
 blank   | Space symbols   | -| {} 
|  |
 hword_asciipart | Hyphenated word part, all ASCII | defj | {english_stem} 
| english_stem | {defj}


Parser in 8.1 threats any [alnum]+-[alnum]+ as a hyphenated word, but 8.3 treats 
[digit]+-[digit]+ as two separated numbers.


So, you can play around pre-process texts before indexing or have a look on
regex dictionary (http://vo.astronet.ru/arxiv/dict_regex.html)
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] still gin index creation takes forever

2008-11-12 Thread Teodor Sigaev
GIN's build algorithm could use bulk insert of ItemPointers if and only if they 
should be inserted on rightmost page (exact piece of code - dataPlaceToPage() in 
gindatapage.c, lines 407-427)


I'm not following.  Rightmost page of what --- it can't be the whole
index, can it, or the case would hardly ever apply?


GIN's index contains btree over keys (entry tree) and for each key it 
contains list of ItemPointers (posting list) or btree over ItemPointers 
(posting tree or data tree) depending on its quantity. Bulk insertion 
process collects into memory keys and sorted arrays of ItemPointers, and 
then for each keys, it tries to insert every ItemPointer from array into 
corresponding data tree one by one. But if the smallest ItemPointer in 
array is greater than the biggest stored one then algorithm will insert 
the whole array on rightmost page in data tree.


So, in that case process can insert about 1000 ItemPointers per one data 
tree lookup, in opposite case it does 1000 lookups in data tree.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] still gin index creation takes forever

2008-11-12 Thread Teodor Sigaev

Any suggestion about how to track down the problem?


What you are describing sounds rather like a use-of-uninitialized-memory
problem, wherein the behavior depends on what happened to be in that
memory previously.  If so, using a debug/cassert-enabled build of
Postgres might help to make the behavior more reproducible.


It seems to me, possible reason of that behavior could be an order of table's 
scanning. GIN's build algorithm prefers scan from begin to the end of table, but 
in 8.3 it's not always true - scan may begin from the middle or end of table 
depending on sequence scan's history.


GIN's build algorithm could use bulk insert of ItemPointers if and only if they 
should be inserted on rightmost page (exact piece of code - dataPlaceToPage() in 
gindatapage.c, lines 407-427)


Is any way to force table's scan from the beginning?
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] still gin index creation takes forever

2008-11-13 Thread Teodor Sigaev

Yeah, I'm not convinced either.  Still, Teodor's theory should be easily
testable: set synchronize_seqscans to FALSE and see if the problem goes
away.


Test suit to reproduce the problem:
DROP TABLE IF EXISTS foo;
DROP TABLE IF EXISTS footmp;

CREATE OR REPLACE FUNCTION gen_array()
RETURNS _int4 AS
$$
SELECT ARRAY(
SELECT (random()*1000)::int
FROM generate_series(1,10+(random()*90)::int)
)
$$
LANGUAGE SQL VOLATILE;

SELECT gen_array() AS v INTO foo FROM generate_series(1,10);

VACUUM ANALYZE foo;

CREATE INDEX fooidx ON foo USING gin (v);
DROP INDEX fooidx;

SELECT * INTO footmp FROM foo LIMIT 9;

CREATE INDEX fooidx ON foo USING gin (v);
DROP INDEX fooidx;

On my notebook with HEAD and default postgresql.conf it produce (show only 
interesting part):


   postgres=# CREATE INDEX fooidx ON foo USING gin (v);
   Time: 14961,409 ms
   postgres=# SELECT * INTO footmp FROM foo LIMIT 9;
   postgres=# CREATE INDEX fooidx ON foo USING gin (v);
   LOG:  checkpoints are occurring too frequently (12 seconds apart)
   HINT:  Consider increasing the configuration parameter "checkpoint_segments".
   LOG:  checkpoints are occurring too frequently (8 seconds apart)
   HINT:  Consider increasing the configuration parameter "checkpoint_segments".
   LOG:  checkpoints are occurring too frequently (7 seconds apart)
   HINT:  Consider increasing the configuration parameter "checkpoint_segments".
   LOG:  checkpoints are occurring too frequently (10 seconds apart)
   HINT:  Consider increasing the configuration parameter "checkpoint_segments".
   LOG:  checkpoints are occurring too frequently (8 seconds apart)
   HINT:  Consider increasing the configuration parameter "checkpoint_segments".
   CREATE INDEX
   Time: 56286,507 ms

So, time for creation is 4-time bigger after select.
Without "SELECT * INTO footmp FROM foo LIMIT 9;":
   postgres=# CREATE INDEX fooidx ON foo USING gin (v);
   CREATE INDEX
   Time: 13894,050 ms
   postgres=# CREATE INDEX fooidx ON foo USING gin (v);
   LOG:  checkpoints are occurring too frequently (14 seconds apart)
   HINT:  Consider increasing the configuration parameter "checkpoint_segments".
   CREATE INDEX
   Time: 15087,348 ms

Near to the same time.

With   synchronize_seqscans = off and SELECT:
   postgres=# CREATE INDEX fooidx ON foo USING gin (v);
   CREATE INDEX
   Time: 14452,024 ms
   postgres=# SELECT * INTO footmp FROM foo LIMIT 9;
   postgres=# CREATE INDEX fooidx ON foo USING gin (v);
   LOG:  checkpoints are occurring too frequently (16 seconds apart)
   HINT:  Consider increasing the configuration parameter "checkpoint_segments".
   CREATE INDEX
   Time: 14557,750 ms

Again, near to the same time.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] still gin index creation takes forever

2008-11-13 Thread Teodor Sigaev

We could extend IndexBuildHeapScan's API to support that, but I'm
not quite convinced that this is the issue.


That extension might be useful for bitmap index too to simplify index creation 
process.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] still gin index creation takes forever

2008-11-13 Thread Teodor Sigaev

changing it; I've applied a patch for that.  I'm still not quite
convinced that Ivan isn't seeing some other issue though.


Thank you


In the meantime, I noticed something odd while experimenting with your
test case: when running with default maintenance_work_mem = 16MB,
there is a slowdown of 3x or 4x for the un-ordered case, just as you
say.  But at maintenance_work_mem = 200MB I see very little difference.
This doesn't make sense to me --- it seems like a larger workspace
should result in more difference because of greater chance to dump a
lot of tuples into the index at once.  Do you know why that's happening?


I suppose, if maintenance_work_mem is rather big then all data of index 
accumulates in memory and so it writes at disk at once. With that test's options 
size of index is equal to 40Mb.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to reduce impact of a query.

2008-11-17 Thread Teodor Sigaev
The machine in question is a 1GB Ram, AMD 64 with Raid 1 Sata disks. Non 
standard parts of my postgresql.conf are as follows:

max_connections=100
shared_buffers=128MB
work_mem=4MB
maintenance_work_mem=256MB
max_fsm_pages=204800
max_fsm_relations=1500

Any tips appreciated.


Pls, show
1) effective_cache_size
2) The query
3) Output of EXPLAIN ANALYZE of query

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [TextSearch] syntax error while parsing affix file

2008-12-02 Thread Teodor Sigaev

iconv -f windows-1251 -t utf-8 bulgarian.dic >bulgarian_utf8.dict
iconv -f windows-1251 -t utf-8 bulgarian.aff >bulgarian_utf8.affix

The locale of the database is fr_FR, and its encoding is UTF8.
 I believe that characters 'И', 'А' (non-ascii) and other cyrillic ones are not 
acceptable for french locale  :(



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [TextSearch] syntax error while parsing affix file

2008-12-02 Thread Teodor Sigaev
I am using Postrges 8.3.5, and I am trying to install a bulgarian ISpell 
dictionary (the OpenOffice one) for Textsearch features.



flag *A:
. > А (this is line 24)
. > АТА
. > И
. > ИТЕ

OpenOffice or ISpell? Pls, provide:
- link to download of dictionary
- Locale and encoding setting of your db

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incorrect FTS query results with GIN index

2010-01-15 Thread Teodor Sigaev

Thank you for the report, will see on this weekend

Vyacheslav Kalinin wrote:

Hello,

Basically, I started testing prefix matching in FTS and got into
troubles. Self-contained example follows:

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incorrect FTS query results with GIN index

2010-01-18 Thread Teodor Sigaev

Basically, I started testing prefix matching in FTS and got into
troubles. Self-contained example follows:


Thank you, fixed. The reason was in incorrect optimization of GIN scan: GIN 
reuses scan result for equals key, but comparison of key didn't take into 
account a difference of scan's strategy.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incorrect FTS query results with GIN index

2010-01-18 Thread Teodor Sigaev

Great, thank you!
I assume this one goes into 8.4.3, right?
Yeah, or apply patch 
http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/gin/ginscan.c?r1=1.25&r2=1.26


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] very long update gin index troubles back?

2009-01-24 Thread Teodor Sigaev

Ответишь ему что-нибудь? Он так мутно излагает, что я ни хрена не понял.

Ivan Sergio Borgonovo wrote:

I've a table that contain a tsvector that is indexed (gin) and
triggers to update the tsvector that should then update the index.

This gin index has always been problematic. Recreation and updates
were very slow.

Now I had to update 1M rows of that table but for columns
that doesn't involve the tsvector
I dropped the trigger to update the tsvector so that when rows get
updated the trigger won't be called so things should be faster...
but still it is taking forever.

begin;
set constraints all deferred;

select * from FT1IDX_trigger_drop();
update catalog_items set
APrice=p.PrezzoA,
BPrice=p.PrezzoB
from import.catalog_prices p where
catalog_items.ItemID=p.id;
select * from FT1IDX_trigger_create();
commit;

function are used since I've 2 triggers actually that I drop and
create.

Is there anything wrong in the above to make this update so slow on
a 2x Xeon 3.2GHz 4GbRAM and a RAID1 [sic] I know it is slow on write.




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] very long update gin index troubles back?

2009-01-24 Thread Teodor Sigaev
GIN index is slow for update by its construction. When you update the rows with 
or without columns indexed by GIN, postgres (in most cases) will insert new 
records, so index insertion will occur. So, for large updates it's much cheaper 
to drop and create index.


That was a one of reasons to develop fast_insert_gin patch which now in review 
process.


Ivan Sergio Borgonovo wrote:

I've a table that contain a tsvector that is indexed (gin) and
triggers to update the tsvector that should then update the index.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] very long update gin index troubles back?

2009-01-27 Thread Teodor Sigaev

No matter if I drop the trigger that update agg content and the fact
that I'm just updating d, postgresql will update the index?
Yes, due to MVCC. Update of row could produce new version (tuple) and new 
version should be index as old one.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Text search segmentation fault

2009-01-29 Thread Teodor Sigaev

Could you provide a backtrace? Do you use unchanged norwegian.stop file?
I'm not able to reproduce the bug - postgres just works.

Tommy Gildseth wrote:
While trying to create a new dictionary for use with PostgreSQL text 
search, I get a segfault. My Postgres version is 8.3.5



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Text search segmentation fault

2009-01-29 Thread Teodor Sigaev

How do I make a backtrace?


- if you have coredump, just execute gdb /PATH1/postgres gdb /PATH2/core and 
type bt. Linux doesn't make core by default, so you allow to do it by ulimit -c 
unlimited for postgresql user
- connect to db, and attach gdb to backend process: gdb /PATH1/postgres 
BACKEND_PID and type run in gdb, next, execute CREATE DICTIONARY and type bt in gdb




Teodor Sigaev wrote:

Could you provide a backtrace? Do you use unchanged norwegian.stop file?
I'm not able to reproduce the bug - postgres just works.

Tommy Gildseth wrote:
While trying to create a new dictionary for use with PostgreSQL text 
search, I get a segfault. My Postgres version is 8.3.5








--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Text search segmentation fault

2009-01-29 Thread Teodor Sigaev
I reproduced the bug with a help of Grzegorz's point for 64-bit box. So, patch 
is attached and I'm going to commit it


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/
*** src/backend/tsearch/spell.c.orig2009-01-29 18:18:03.0 +0300
--- src/backend/tsearch/spell.c 2009-01-29 18:20:09.0 +0300
***
*** 521,527 
(errcode(ERRCODE_CONFIG_FILE_ERROR),
 errmsg("multibyte flag character is not 
allowed")));
  
!   Conf->flagval[(unsigned int) *s] = (unsigned char) val;
Conf->usecompound = true;
  }
  
--- 521,527 
(errcode(ERRCODE_CONFIG_FILE_ERROR),
 errmsg("multibyte flag character is not 
allowed")));
  
!   Conf->flagval[*(unsigned char*) s] = (unsigned char) val;
Conf->usecompound = true;
  }
  
***
*** 654,660 
ptr = repl + (ptr - prepl) + 1;
while (*ptr)
{
!   aflg |= Conf->flagval[(unsigned int) 
*ptr];
ptr++;
}
}
--- 654,660 
ptr = repl + (ptr - prepl) + 1;
while (*ptr)
{
!   aflg |= Conf->flagval[*(unsigned char*) 
ptr];
ptr++;
}
}
***
*** 735,741 
  
if (*s && pg_mblen(s) == 1)
{
!   Conf->flagval[(unsigned int) *s] = 
FF_COMPOUNDFLAG;
Conf->usecompound = true;
}
oldformat = true;
--- 735,741 
  
if (*s && pg_mblen(s) == 1)
{
!   Conf->flagval[*(unsigned char*) s] = 
FF_COMPOUNDFLAG;
Conf->usecompound = true;
}
oldformat = true;
***
*** 791,797 

(errcode(ERRCODE_CONFIG_FILE_ERROR),
 errmsg("multibyte flag 
character is not allowed")));
  
!   flag = (unsigned char) *s;
goto nextline;
}
if (STRNCMP(recoded, "COMPOUNDFLAG") == 0 || STRNCMP(recoded, 
"COMPOUNDMIN") == 0 ||
--- 791,797 

(errcode(ERRCODE_CONFIG_FILE_ERROR),
 errmsg("multibyte flag 
character is not allowed")));
  
!   flag = *(unsigned char*) s;
goto nextline;
}
if (STRNCMP(recoded, "COMPOUNDFLAG") == 0 || STRNCMP(recoded, 
"COMPOUNDMIN") == 0 ||
***
*** 851,857 
  
while (str && *str)
{
!   flag |= Conf->flagval[(unsigned int) *str];
str++;
}
  
--- 851,857 
  
while (str && *str)
{
!   flag |= Conf->flagval[*(unsigned char*) str];
str++;
}
  

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Text search segmentation fault

2009-01-29 Thread Teodor Sigaev




Than I have quite few notes about that function:
- affix is not checked on entry, and should be unsigned,


Could be Assert( affix>=0 && affix < Conf->nAffixData )


- for sake of safety uint32_t should be used instead of unsigned int,
in the cast

see patch

- there should be some safety limit for lenght of str,

It's a C-string

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Text search segmentation fault

2009-01-29 Thread Teodor Sigaev



Tom Lane wrote:

Teodor Sigaev  writes:

I reproduced the bug with a help of Grzegorz's point for 64-bit box.


Hmm, seems it's not so much a "64 bit" error as a "signed vs unsigned
char" issue?  


Yes, but I don't understand why it worked in 32-bit box.


Does this affect the old contrib/tsearch2 code?


Will check.


Please try to make the commits in the next eight hours, as we have
release wraps scheduled for tonight.


Minor versions or beta of 8.4? if last, I'd like to commit btree_gin and 
fast_update_gin.  For both patches all pointed issues was resolved and Jeff, 
seems, haven't objections.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Text search segmentation fault

2009-01-29 Thread Teodor Sigaev

To be honest, looking through that file, I am quite worried about few
points. I don't know too much about insights of ispell, but I see few
suspicious things in mkSPNode too.
I generally don't want to get involve in reviewing code for stuff I
don't know, But if Teodor (and Oleg) don't mind, I can raise my
points, and see if anything useful comes out of it.

If you see bug/mistake/suspicious point, please, don't be quiet



Also, about that patch - it doesn't seem to apply cleanly to 8.4,
perhaps that file has changed too much (I based my 'review' above on
8.4's code)

will tweak
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Text search segmentation fault

2009-01-29 Thread Teodor Sigaev

char" issue?  Does this affect the old contrib/tsearch2 code?


Checked - No, that was improvement for 8.3 :).

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Full text index not being used

2009-02-01 Thread Teodor Sigaev




I tried to create an index including all of the fields I query on to
see if that would work, but I get an error the the index row is too
large:

=> create index master_index on source_listings(geo_lat, geo_lon,
price, bedrooms, region, city, listing_type, to_tsvector('english',
full_listing), post_time);
It's not a fulltext index - btree doesn't support @@ operation. Read 
carefully: http://www.postgresql.org/docs/8.3/static/textsearch.html , 
and about full text indexes: 
http://www.postgresql.org/docs/8.3/static/textsearch-tables.html , 
http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] multicolumn GIST index question

2005-03-01 Thread Teodor Sigaev
Did anyone get multi-column GIST indexes working using both
the gist_btree and postgis modules?
It must.
fl=# -- sessionid is a text;  the_geom is a GEOMETRY
fl=# create index testidx2 on user_point_features using gist 
(sessionid,the_geom);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
fl=#
fl=#
fl=#
fl=# create table test3(a text, b text, c text);
CREATE TABLE
fl=# select addgeometrycolumn ('','test3','the_geom','-1','POINT',2);
addgeometrycolumn
--
 public.test3.the_geom SRID:-1 TYPE:POINT DIMS:2
 geometry_column fixed:0
(1 row)
fl=# create index idx_text_text on test3 using gist (a,b);
CREATE INDEX
fl=# create index idx_text_geom on test3 using gist (a,the_geom);
CREATE INDEX
fl=#

Is a sessionid from user_point_features table unique?
Pls, try to compile database with --enable-cassert --enable-debug and send gdb 
output.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [GENERAL] Tsearch2 doesn't work

2005-08-04 Thread Teodor Sigaev

You shoud run configure script in a root pgsql's source tree.

Supiah Mustaffa wrote:

Hi,

I want to install Tsearch2 and follow  this command :
1.cd contrib/tsearch2
2. make; make install

It doesn't work and display this messages :
Makefile:5: ../../src/Makefile.global: No such file or directory
Makefile:47: /contrib/contrib-global.mk: No such file or directory
make: *** No rule to make target `/contrib/contrib-global.mk'.  Stop.
Makefile:5: ../../src/Makefile.global: No such file or directory
Makefile:47: /contrib/contrib-global.mk: No such file or directory
make: *** No rule to make target `/contrib/contrib-global.mk'.  Stop.

What it's means? What should I do? Please help me.

Thanks.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] If an index is based on 3 columns will a query using

2005-09-12 Thread Teodor Sigaev

Greg Stark suggests here:
http://archives.postgresql.org/pgsql-hackers/2005-05/msg00966.php
that GiST could also be fixed to work with any subset of the index
columns, but it hasn't been done yet, unless Teodor and Oleg snuck
something in during that last round of GiST work.


GiST may work with any subset of index columns too. Even in existing code I 
don't see any problem except NULL in a first column. GiST doesn't store tuples 
with leading NULL value (gist.c lines 174, 326), so index doesn't contained them.


After our work about WAL-lization GiST, it may work with "invalid" tuples 
(possibly occured after crash recovery), so itsn't a big deal to add support 
NULL in a first column. But freeze date is outdated... Should I add or leave it 
to 8.2?




--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] TSearch2 snowball version error

2005-09-15 Thread Teodor Sigaev

Snowball changes interfaces, I'll update tsearh2 sources today or tommorow.


William Leite Araújo wrote:

Hi,

I'm trying compile a new brazilian portuguese dictionary to TSearch2 
contrib, but found the errors:


portuguese_stem.c: In function `r_prelude':
portuguese_stem.c:481: error: void value not ignored as it ought to be
portuguese_stem.c:487: error: void value not ignored as it ought to be
portuguese_stem.c: In function `r_postlude':
portuguese_stem.c:610: error: void value not ignored as it ought to be
portuguese_stem.c:616: error: void value not ignored as it ought to be
portuguese_stem.c: In function `r_standard_suffix':
portuguese_stem.c:662: error: void value not ignored as it ought to be
portuguese_stem.c:672: error: void value not ignored as it ought to be
portuguese_stem.c:682: error: void value not ignored as it ought to be
portuguese_stem.c:692: error: void value not ignored as it ought to be
portuguese_stem.c:702: error: void value not ignored as it ought to be
portuguese_stem.c:715: error: void value not ignored as it ought to be
portuguese_stem.c:729: error: void value not ignored as it ought to be
portuguese_stem.c:744: error: void value not ignored as it ought to be
portuguese_stem.c:760: error: void value not ignored as it ought to be
portuguese_stem.c:775: error: void value not ignored as it ought to be
portuguese_stem.c:791: error: void value not ignored as it ought to be
portuguese_stem.c:806: error: void value not ignored as it ought to be
portuguese_stem.c:818: error: void value not ignored as it ought to be
portuguese_stem.c:832: error: void value not ignored as it ought to be
portuguese_stem.c: In function `r_verb_suffix':
portuguese_stem.c:856: error: void value not ignored as it ought to be
portuguese_stem.c: In function `r_residual_suffix':
portuguese_stem.c:880: error: void value not ignored as it ought to be
portuguese_stem.c: In function `r_residual_form':
portuguese_stem.c:902: error: void value not ignored as it ought to be
portuguese_stem.c:929: error: void value not ignored as it ought to be
portuguese_stem.c:935: error: void value not ignored as it ought to be
portuguese_stem.c: In function `portuguese_ISO_8859_1_stem':
portuguese_stem.c:993: error: void value not ignored as it ought to be
make: ** [portuguese_stem.o] Erro 1

 This after change the included file "header.h", the old 
"../runtime/header.h" is invalid.
 I think that is a version trouble, because the return of functions 
"slice_from_s" and "slice_del" are "int" in current snowball portuguese 
files, but on include files of version 8.0.2 and 8.0.3 of Portgresql the 
return type is "void".

 Help-me, please.

--
William Leite Araújo


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] TSearch2 snowball version error

2005-09-15 Thread Teodor Sigaev

Fixed and commited in cvs.

Patches for already existsing versions :
http://www.sigaev.ru/gist/patch_snowball-7.4.gz
http://www.sigaev.ru/gist/patch_snowball-8.0.gz




I'm trying compile a new brazilian portuguese dictionary to TSearch2 
contrib, but found the errors:


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] re : tsearch2 problem

2005-10-18 Thread Teodor Sigaev
because of its way of "Inverse Order OF Idexing",and "The GiST Tree is 
well for larting lan but doesn't for chinese lan(also ineffifent for 
Japanese lan)"。


About japanese look at http://www.oss.ecl.ntt.co.jp/tsearch2j/ (it's on 
Japanese, but you can contact with developer)






 

I only know that GiST is also  "Inverse Order OF Idexing",but of course 
it is a tree NOT as lucene does


Tsearch2 doesn't use inverted index (yet), and so it easy to update (your index 
is always full - online indexing), but it hasn't good performance on large data 
set. The limit is about 10^5 - 10^6 rows in practice.



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] really thanks,Teodor Sigaev. HOW ts2 implment that

2005-10-18 Thread Teodor Sigaev

I know the LUCENE " inverted index " is just like follows:

 


keyword   doc_id[frequence]   location
guangzhou 1[2]   3,6
he   2[1]   1
i 1[1]   4
live  1[2],2[1]   2,5,2
shanghai  2[1]   3


Yes


so ,before I  thought that A node In GiST tree is very like the above :

In the tree's node , it has several value for storing its's

turple_id[frequence]   location ,and that is why I thoutht the " 
BUT now , I'm lost again ,how does Gist tree store lexem and how to find 
his correspoding tuples in database( including  frequence   location)?


Tsearch2 use "direct" index: each leaf tuple in tree represent document, 
internal tuples represents union of documents.



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Fuzzy text search

2005-11-14 Thread Teodor Sigaev



jennyw wrote:
We've heard that PostgreSQL can do fuzzy search, but haven't had much 
luck. I'm brand new to PostgreSQL, so this might be completely obvious 
for an experienced user.
Are there any how-tos on fuzzy text searching? Someone said to try using 
tsearch2, but it seems that it does full-text searching, but not fuzzy 
search. Ideally, we'd like to use that w/ a fuzzy search option. In 
particular, if someone searches for "imat" we want to return results 
including "immaterial" and "imaterial" (so misspellings plus partial 
match).




Look at contrib/pg_trgm. If you want fulltext search with mispelling correction 
then read "Tsearch2 Integration" in contrib/pg_trgm/README.



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] tsearch2: more than one index per table?

2005-11-22 Thread Teodor Sigaev

ERROR:  could not find tsearch config by locale
UPDATE t SET idxA=to_tsvector('default', a);



Is it working
select to_tsvector('foo bar')?

I suppose, no. In that case tsearch can't find configuration for current 
database locale, update pg_ts_cfg.locale in wished row to correct value.



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] TSearch2 / German compound words / UTF-8

2005-11-23 Thread Teodor Sigaev
Tsearch/isepll is not able to break this word into parts, because of the 
"s" in "Produktion/s/intervall". Misspelling the word as 
"Produktionintervall" fixes it:


It should be affixes marked as 'affix in middle of compound word',
Flag is '~', example look in norsk dictionary:

flag ~\\:
[^S]   >S  #~ advarsel > advarsels-

BTW, we develop and debug compound word support on norsk (norwegian) dictionary, 
so look for example there. But we don't know Norwegian, norwegians helped us  :)







The second thing is with UTF-8:

I know there is no, or no full support yet, but I need to get it as good 
as it's possible /now/. Is there anything in CVS that I might be able to 
backport to my version or other tips? My setup works, as for the dict 
and the stop word files, but I fear the stemming and mapping of umlauts 
and other special chars does not as it should. I tried recoding the 
german.aff to UTF-8 as well, but that breaks it with an regex error 
sometimes:


Now in CVS it is deep alpha version and now only text parser is UTF-compliant, 
we continue development...





fts=# SELECT ts_debug('dass');
ERROR:  Regex error in '[^sãŸ]$': brackets [] not balanced
CONTEXT:  SQL function "ts_debug" statement 1

This seems while it tries to map ss to ß, but anyway, I fear, I didn't 
anything good with that.


As suggested in the "Tsearch2 and Unicode/UTF-8" article I have a second 
snowball dict. The first lines of the stem.h I used start with:



extern struct SN_env * german_ISO_8859_1_create_env(void);

Can you use ISO-8859-1?

So I guess this will not work exactly well with UTF-8 ;p Is there any 
other stem.h I could use? Google hasn't returned much for me :/


http://snowball.tartarus.org/

Snowball can generate UTF parser:
http://snowball.tartarus.org/runtime/use.html:
F1 [-o[utput] F2]
   [-s[yntax]]
   [-w[idechars]]  [-u[tf8]] < that's it!
   [-j[ava]]  [-n[ame] C]
   [-ep[refix] S1]  [-vp[refix] S2]
   [-i[nclude] D]
   [-r[untime] P]
At least for Russian there is 2 parsers, for KOI8 and UTF, (
http://snowball.tartarus.org/algorithms/russian/stem.sbl
http://snowball.tartarus.org/algorithms/russian/stem-Unicode.sbl
), diff shows that they different only in stringdef section. So you can make UTF 
parser for german.
BUT, I'm afraid that Snowball uses widechar, and postgres use multibyte for UTF 
internally.




--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] cyclical redundancy checksum algorithm(s)?

2006-09-28 Thread Teodor Sigaev

You sure that's actually what he said?  A change in CRC proves the data
changed, but lack of a change does not prove it didn't.


"To quickly determine if rows have changed, we rely on a cyclic
redundancy checksum (CRC) algorithm.   If the CRC is identical for the

>

"summary" functions, such as an MD5 hash.  I wouldn't trust it at all
with a 32-bit CRC, and not much with a 64-bit CRC.  Too much risk of
collision.


Small example of collisions for crc32:
0x38ee5531
Hundemarke
92294
0x59471e4f
raciner
tranchefiler
0x947bb6c0
Betriebsteile
4245


I had make a lot of work when choosing hash function for tsearch2. Also, I had 
find that popular hash algorithms produce more collision for non-ascii 
languages... CRC32 is more "smooth".
On dictionary with 332296 unique words CRC32 produces 11 collisions, perl's hash 
function - 35, pgsql's hash_any - 12.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Full Text fuzzy search

2006-09-28 Thread Teodor Sigaev

Play around contrib/tsearch2 and contrib/pg_trgm

Michael Vodep wrote:

Hi

How can i do a full text fuzzy search with PgSQL?

For example: i got a text domain. I search for Postgresql
He should return a result for 'Postgresql' if i enter following values:
PostgreSQL, Postgresql, postgresql
but also
bostgresql, posdgresql, bosdgresql

Is this possible?

Regrads
michael

---(end of broadcast)---
TIP 6: explain analyze is your friend


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [GENERAL] Full Text fuzzy search

2006-09-28 Thread Teodor Sigaev
For the others (bostgresql, posdgresql, bosdgresql) i think u should 
write a function, that's what we did in our application.


contrib/pg_trgm solves it

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] GiST and Gin technical doc

2006-10-02 Thread Teodor Sigaev

GiST:

http://gist.cs.berkeley.edu/
http://www.sai.msu.su/~megera/postgres/gist/papers/concurrency/

GIN:
GIN is ordinary inverted index which described in many docs.
Look for some implementation details in my presentation on conference:
http://www.sigaev.ru/gin/Gin.pdf

Yannick Warnier wrote:

Hi there,

I'm having trouble finding technical documentation about GiST, Gin and
TSearch2.

I am particularly interested in the internal data structure of a GiST-ed
or Gin-ed index, and the availability of searches by proximity.

Does anyone know of a good place to find such doc, outside from the
source code (I'm not too much into C/C++)?

Thank you,

Yannick


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] NOTICE: word is too long INSERT 0 3014

2006-10-30 Thread Teodor Sigaev

I am running into this limitation ALOT with Tsearch2. What are my
options to get around it. Do I have to compile PostgreSQL with a
different block size?

If yes, what are the downsides to doing so (outside of not being able to
do straight upgrades)?


If you really need that, your should play around WordEntry definition 
(tsvector.h). Sorry, right now I haven't possibility to look closer, 
just tomorrow.
Limit for word is equal 2KB - I supposed that is long enough to store 
any meaningful words.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] NOTICE: word is too long INSERT 0 3014

2006-10-30 Thread Teodor Sigaev

For example, redefine by follow way:

typedef struct
{
uint32
haspos:1,
len:31;
uint32  pos;
}   WordEntry;

/* <= 1Gb */
#define MAXSTRLEN ( 1<<30 )
#define MAXSTRPOS ( 1<<30 )

Teodor Sigaev wrote:

I am running into this limitation ALOT with Tsearch2. What are my
options to get around it. Do I have to compile PostgreSQL with a
different block size?

If yes, what are the downsides to doing so (outside of not being able to
do straight upgrades)?


If you really need that, your should play around WordEntry definition 
(tsvector.h). Sorry, right now I haven't possibility to look closer, 
just tomorrow.
Limit for word is equal 2KB - I supposed that is long enough to store 
any meaningful words.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Index greater than 8k

2006-10-30 Thread Teodor Sigaev

The problem I am after is the 8k index size issue. It is very easy to
get a GIST index (especially when using tsearch2) that is larger than that.

Hmm, tsearch2 GIST index  is specially designed for support huge index entry:
first, every lexemes in tsvectore are transformed to hash value (with a help of 
crc32), second, it's stripped all position infos, third, if size of array is 
greater than TOAST_INDEX_TARGET then tsearch2 will make bit signature of 
tsvector. Signature's length is fixed and equals to 252 bytes by default (+ 8 
bytes for header of datum). All values on internal pages are represented as 
signatures below.


So, tsearch2 guarantees that index entry will be small enough. If it's not true, 
then there is a bug - pls, make test suite demonstrating the problem.



Is recompiling the block size the option there?
What are the downsides, except for the custom build?


Can you send exact error message?

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [HACKERS] [GENERAL] Index greater than 8k

2006-10-31 Thread Teodor Sigaev
The problem as I remember it is pg_tgrm not tsearch2 directly, I've sent a 
self contained test case directly to  Teodor  which shows the error. 


'ERROR:  index row requires 8792 bytes, maximum size is 8191'
Uh, I see. But I'm really surprised why do you use pg_trgm on big text? pg_trgm 
is designed to find similar words and use technique known as trigrams. This will 
 work good on small pieces of text such as words or set expression. But all big 
texts (on the same language) will be similar :(. So, I didn't take care about 
guarantee that index tuple's size limitation. In principle, it's possible to 
modify pg_trgm to have such guarantee, but index becomes lossy - all tuples 
gotten  from index should be checked by table's tuple evaluation.


If you want to search similar documents I can recommend to have a look to 
fingerprint technique (http://webglimpse.net/pubs/TR93-33.pdf). It's pretty 
close to trigrams and metrics of similarity is the same, but uses another 
signature calculations. And, there are some tips and trics: removing HTML 
marking,removing punctuation, lowercasing text and so on - it's interesting and 
complex task.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] Index greater than 8k

2006-11-01 Thread Teodor Sigaev

We are trying to get something faster than ~ '%foo%';
Which Tsearch2 does not give us :)


Hmm, why not with GIN? :) Although, right now GIN doesn't support prefix search, 
it's possible to use it.


Brain storm method:

Develop a dictionary which returns all substring for lexeme, for example for 
word foobar it will be 'foobar fooba foob foo fo oobar ooba oob oo obar oba ob 
bar ba ar'. And make GIN functional index over your column (to save disk space).

So, your query will be looked as
select ... where to_tsvector(text_column) @@ 'foo';
Notices:
Time of search in GIN weak depend on number of words (opposite to 
tsearch2/GiST), but insertion of row may be slow enough


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] tsearch2() with data from other table

2006-11-10 Thread Teodor Sigaev

CREATE FUNCTION euits(int)
RETURNS text AS 'select username || \' \' || firstname || \' \' || 
lastname from site_user where objectid = $1;' LANGUAGE SQL;


CREATE TRIGGER site_item_fts
BEFORE UPDATE OR INSERT ON site_item
FOR EACH ROW EXECUTE PROCEDURE
tsearch2(idxfti, name, keywords, keywords_cb, location_country, 
location_city, media_source, description, euits, site_user);


site_user is a table, isn't it?
tsearch2 trigger accepts only column's names and functions with prototype TEXT 
func(TEXT).


For clarify, show your table's definitions.
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] tsearch2() with data from other table

2006-11-10 Thread Teodor Sigaev

CREATE FUNCTION euits(int)
RETURNS text AS 'select username || \' \' || firstname || \' \' || 
lastname from site_user where objectid = $1;' LANGUAGE SQL;


CREATE TRIGGER site_item_fts
BEFORE UPDATE OR INSERT ON site_item
FOR EACH ROW EXECUTE PROCEDURE
tsearch2(idxfti, name, keywords, keywords_cb, location_country, 
location_city, media_source, description, euits, site_user);


So, when updating site_item I want to fetch the names from site_user and 
 give this data to tsearch2() along with other stuff from site_item.


Sorry, current interface of tsvector aggregate doesn't support it.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Using GIN indexes on 8.2

2006-11-10 Thread Teodor Sigaev



Alexander Staubo wrote:
Two questions about GIN on 8.2. There's not much documentation about 
GIN, but this should be possible:


  create table foo (values text[]);
  create index foo_values_index on foo using gin (text);

However, this then fails saying the operator "@" does not exist:

  select * from foo where values @ '{hello, world}'


Use @>, <@ operations instead of @ and ~
Look for discussions in -hackers for reasons of changing names

Do I need to reference a specific opclass when creating the index? From 
the documentation I got the impression that GIN bundled operators for 
most built-in types.

if there is a default opclass for your datatype - you may do not specify.



Secondly, are GIN indexes immutable and (unlike Tsearch2) non-lossy and 
therefore useful with functional indexes? I would like to do this:


  create table bar (value text);
  create index bar_value_index on bar using gin (analyze(value));

where analyze() is a function of my own that tokenizes, stems and 
filters the text into a text[] array.


Be careful -
select
pg_opclass.opcname,
pg_operator.oprname,
pg_amop.amopreqcheck
from
pg_opclass,
pg_operator,
pg_amop,
pg_am
where
pg_operator.oid = pg_amop.amopopr and
pg_opclass.oid = pg_amop.amopclaid and
pg_opclass.opcamid = pg_am.oid and
pg_am.amname='gin' and pg_opclass.opcname='_text_ops';
  opcname  | oprname | amopreqcheck
---+-+--
 _text_ops | &&  | f
 _text_ops | @>  | f
 _text_ops | <@  | t
 _text_ops | =   | t
(4 rows)

So, operations <@ and = will recheck result with table's row.

Pls, why don't you use tsearch2 with GIN?


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] Using GIN indexes on 8.2

2006-11-10 Thread Teodor Sigaev

Use @>, <@ operations instead of @ and ~
Look for discussions in -hackers for reasons of changing names


Ah, many thanks. How about updating those web pages? :)

Now they are in core:
http://developer.postgresql.org/pgdocs/postgres/functions-array.html




Pls, why don't you use tsearch2 with GIN?


Perhaps -- is there any documentation on this?

Nothing special, just use GIN index instead of GiST.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] PG_MODULE_MAGIC check in 8.2

2006-11-14 Thread Teodor Sigaev

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

solves your problem
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] Functional Index

2006-11-22 Thread Teodor Sigaev
use varchar_pattern_ops operator class, LIKE cannot use varchar_ops for non-C 
locales.


Alexander Presber wrote:

Hello everybody,

I am trying to speed up a query on an integer column by defining an 
index as follows


 > CREATE INDEX idx_main_subject ON pdb.main (lower(main_subject::text) 
using varchar_ops);


on column "main_subject".

I had hoped to get speedups for right-fuzzy LIKE-searches, but EXPLAIN 
ANALYZE yields that the index is not used:


 > EXPLAIN ANALYZE SELECT COUNT(*) FROM pdb.main WHERE 
lower(main_subject::text) LIKE lower('10%'::text);
QUERY 
PLAN
 

Aggregate  (cost=137759.92..137759.93 rows=1 width=0) (actual 
time=3421.696..3421.697 rows=1 loops=1)
  ->  Seq Scan on main  (cost=0.00..137727.17 rows=13096 width=0) 
(actual time=0.036..3300.961 rows=77577 loops=1)

Filter: (lower((main_subject)::text) ~~ '10%'::text)
Total runtime: 3421.751 ms
(4 Zeilen)


Am I misunderstanding the concept of functional indexes? Is there 
another way to achieve

Any help is greatly
appreciated.

Yours,
Alexander Presber

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] tsearch to spellcheck

2006-11-24 Thread Teodor Sigaev
You can use lexize() function from tsearch2 (with properly configured ispell 
dictionary) - if it returns not NULL value the word is ok.

Also have a look to pg_trgm contrib modle.

SunWuKung wrote:

Is it possible to use tsearch2 to check spelling? Something like a
function that takes a single word (and a language id) and returns true
if spelled correctly false otherwise.

Thanks.
Balazs


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] postgresql 8.2 rc1 - crash

2006-11-30 Thread Teodor Sigaev

While I'm downloading your file, pls, do follow:
gdb /usr/local/pgsql/bin/postgres your_core_file

If it's needed, change path to postgres file.

In gdb, type
# bt
and send output

hubert depesz lubaczewski wrote:

hi,
i have been testing 8.2 rc1, while i got this problem.

base data:
linux, 32bit, kernel: 2.6.18.3 <http://2.6.18.3>; debian
postgresql version:
PostgreSQL 8.2rc1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.5 
(Debian 1:3.3.5-13)
problematic table is over 2gigabytes in size, and has several indices - 
one of them is gin-index.


problem:
when i issue vacuum full verbose analyze it works, but then crashes with 
signal 11.

always in the same situation.

i was not able to check what is the reason.
i did:
recompile with debug, set ulimit -c unlimited, and rerun the query.
it crashed.
i bundled:
1. logs
2. core file
3. config of postgresql
4. saved output of vacuum
all of this can be fetched from: 
http://depesz.com/various/crash.data.tar.bz2
unfortunatelly i'm not c programmer, so i dont know gdb, but i hope you 
will be ab le to make any sense out of it.


the bz2 file is > 20mb in size.

any help? is it hardware problem? or a missed bug in code?

if i can provide you with more information - please tell me what i 
should tell you.


depesz

--
http://www.depesz.com/ - nowy, lepszy depesz


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] postgresql 8.2 rc1 - crash

2006-11-30 Thread Teodor Sigaev
#1  0x080bc224 in PageDeletePostingItem (page=0xb28039a0 "\020", 
offset=53719) at gindatapage.c:291
#2  0x080bf558 in ginDeletePage (gvs=0xbfc2ab80, deleteBlkno=29194, 
leftBlkno=29059, parentBlkno=70274, myoff=351, isParentRoot=0 '\0') at 
ginvacuum.c:268


Are you sure about your hardware? myoff in ginDeletePage() and offset in 
PageDeletePostingItem are the same variable...


Pls, send to me postgres file itself - just core isn't very useful for debug.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] postgresql 8.2 rc1 - crash

2006-11-30 Thread Teodor Sigaev

I reproduce a problem with small script:
print <5000 and i<40;
VACUUM FULL ANALYZE qq;

EOT

So, I'm digging now...
--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


  1   2   >