Re: [HACKERS] [GENERAL] Index greater than 8k

2006-11-03 Thread Oleg Bartunov
On Fri, 3 Nov 2006, Joshua D. Drake wrote: Oleg Bartunov wrote: Gregory, All you described you do with fti is possible with tsearch2. Just need some think, of course. If you don't need stemming, just don't use it, if you need something like %txt%, just write simple dictionary, which produce an

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-11-03 Thread Joshua D. Drake
Oleg Bartunov wrote: > Gregory, > > All you described you do with fti is possible with tsearch2. Just need > some think, of course. If you don't need stemming, just don't use it, > if you need something like %txt%, just write simple dictionary, which > produce any substrings from input word. Is t

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-11-03 Thread Oleg Bartunov
r reasons that undoubtedly be made clear ;-) Greg Williamson DBA GlobeXplorer LLC -Original Message- From: [EMAIL PROTECTED] on behalf of Joshua D. Drake Sent: Tue 10/31/2006 7:46 PM To: Teodor Sigaev Cc: Darcy Buskermolen; PgSQL General; PostgreSQL-development Subject:

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-11-02 Thread tomas
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, Nov 01, 2006 at 07:16:37PM -0300, Alvaro Herrera wrote: > [EMAIL PROTECTED] wrote: [...] > > a "functional trigram index"? (this would be very cool). > > Heh :-) I meant an index, using the pg_trgm opclass (which indexes > trigrams; hence the

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-11-01 Thread Oleg Bartunov
On Tue, 31 Oct 2006, Joshua D. Drake wrote: Yes we do (and can) expect to find text among the bytes. We have searches running, we are just running into the maximum size issues for certain rows. you can use substr() to be safe, if schema change doesn't available Sincerely, Joshua D. Drake

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-11-01 Thread Alvaro Herrera
[EMAIL PROTECTED] wrote: > On Tue, Oct 31, 2006 at 08:55:04PM -0800, Joshua D. Drake wrote: > > > > >> We are not storing bytea [...] > [...] > > > Hmm, have you tried to create a functional trigram index on the > > > equivalent of "strings(bytea_column)" or something like that? > > Hrm. Sorry fo

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-11-01 Thread Gregory Maxwell
On 11/1/06, Teodor Sigaev <[EMAIL PROTECTED]> wrote: [snip] Brain storm method: Develop a dictionary which returns all substring for lexeme, for example for word foobar it will be 'foobar fooba foob foo fo oobar ooba oob oo obar oba ob bar ba ar'. And make GIN functional index over your column (

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-11-01 Thread tomas
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Tue, Oct 31, 2006 at 08:55:04PM -0800, Joshua D. Drake wrote: > > >> We are not storing bytea [...] [...] > > Hmm, have you tried to create a functional trigram index on the > > equivalent of "strings(bytea_column)" or something like that? Hrm. So

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-11-01 Thread Joshua D. Drake
Teodor Sigaev wrote: >> We are trying to get something faster than ~ '%foo%'; >> Which Tsearch2 does not give us :) > > Hmm, why not with GIN? :) Although, right now GIN doesn't support prefix > search, it's possible to use it. Well they run 8.1 :) Joshua D. Drake > > Brain storm method: > >

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-11-01 Thread Teodor Sigaev
We are trying to get something faster than ~ '%foo%'; Which Tsearch2 does not give us :) Hmm, why not with GIN? :) Although, right now GIN doesn't support prefix search, it's possible to use it. Brain storm method: Develop a dictionary which returns all substring for lexeme, for example for

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-10-31 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Do you mean you actually find stuff based on text attributes in JPEG > images and the like? I thought those were compressed ... Typically not --- the design assumption is that the text size wouldn't amount to anything anyway compared to the image data,

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-10-31 Thread Joshua D. Drake
>> We are not storing bytea, a customer is. We are trying to work around >> customer requirements. The data that is being stored is not always text, >> sometimes it is binary (a flash file or jpeg). We are using escaped text >> to be able to search the string contents of that file . > > Hmm, have

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-10-31 Thread Alvaro Herrera
Joshua D. Drake wrote: > Alvaro Herrera wrote: > > Darcy Buskermolen wrote: > >> On October 31, 2006 08:53 am, Teodor Sigaev wrote: > The problem as I remember it is pg_tgrm not tsearch2 directly, I've sent > a self contained test case directly to Teodor which shows the error. > >

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-10-31 Thread Gregory S. Williamson
ade clear ;-) Greg Williamson DBA GlobeXplorer LLC -Original Message- From: [EMAIL PROTECTED] on behalf of Joshua D. Drake Sent: Tue 10/31/2006 7:46 PM To: Teodor Sigaev Cc: Darcy Buskermolen; PgSQL General; PostgreSQL-development Subject: Re: [HACKERS] [GENERAL] Index

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-10-31 Thread Joshua D. Drake
Alvaro Herrera wrote: > Darcy Buskermolen wrote: >> On October 31, 2006 08:53 am, Teodor Sigaev wrote: The problem as I remember it is pg_tgrm not tsearch2 directly, I've sent a self contained test case directly to Teodor which shows the error. 'ERROR: index row requires 8792

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-10-31 Thread Joshua D. Drake
Teodor Sigaev wrote: >> The problem as I remember it is pg_tgrm not tsearch2 directly, I've >> sent a self contained test case directly to Teodor which shows the >> error. >> 'ERROR: index row requires 8792 bytes, maximum size is 8191' > Uh, I see. But I'm really surprised why do you use pg_trgm

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-10-31 Thread Alvaro Herrera
Darcy Buskermolen wrote: > On October 31, 2006 08:53 am, Teodor Sigaev wrote: > > > The problem as I remember it is pg_tgrm not tsearch2 directly, I've sent > > > a self contained test case directly to Teodor which shows the error. > > > > > > 'ERROR: index row requires 8792 bytes, maximum size

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-10-31 Thread Darcy Buskermolen
On October 31, 2006 08:53 am, Teodor Sigaev wrote: > > The problem as I remember it is pg_tgrm not tsearch2 directly, I've sent > > a self contained test case directly to Teodor which shows the error. > > > > 'ERROR: index row requires 8792 bytes, maximum size is 8191' > > Uh, I see. But I'm rea

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-10-31 Thread Darcy Buskermolen
On October 31, 2006 06:42 am, Joshua D. Drake wrote: > Teodor Sigaev wrote: > >> The problem I am after is the 8k index size issue. It is very easy to > >> get a GIST index (especially when using tsearch2) that is larger than > >> that. The problem as I remember it is pg_tgrm not tsearch2 directl

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-10-31 Thread Joshua D. Drake
Teodor Sigaev wrote: >> The problem I am after is the 8k index size issue. It is very easy to >> get a GIST index (especially when using tsearch2) that is larger than >> that. > Hmm, tsearch2 GIST index is specially designed for support huge index > entry: > first, every lexemes in tsvectore are t

Re: [HACKERS] [GENERAL] Index greater than 8k

2006-10-31 Thread Teodor Sigaev
The problem as I remember it is pg_tgrm not tsearch2 directly, I've sent a self contained test case directly to Teodor which shows the error. 'ERROR: index row requires 8792 bytes, maximum size is 8191' Uh, I see. But I'm really surprised why do you use pg_trgm on big text? pg_trgm is desig