Re: [PERFORM] Is postgresql ca do the job for software deployed in ASP ou SaaS mode?

2006-07-04 Thread Mikael Carneholm
Do you really need to create one *DB* per client - that is, is one
schema (in the same DB) per client out of the question? If not, I would
look into moving all reference tables (read-only data, constants and
such) into a common schema (with read permission granted to each
client/role), that way reducing the amount of objects needed to be
created/maintained and at the same time reducing the memory requirements
(lots of shared objects == lots of reused shared buffers). Set the
default_tablespace variable per client (login role) also so that the I/O
load can be balanced. A system based on Opterons such as the HP DL385 or
DL585 with two CPUs (or four if you go for the 585), 8-16Gb of RAM and a
decent storage system with 14-28 disks could be worth evaluating.

/Mikael

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of David
Gagnon
Sent: den 3 juli 2006 13:42
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Is postgresql ca do the job for software deployed in
ASP ou SaaS mode?

Hi all,

  I've been working on my personal project for 3.5 years now.  I
developed an ERP system in web/java.  Now the people I will work with
suggest to offers it in Saas mode.  Which means my customer will connect
to my website and found they ERP software and data there.  It's not the
deployment I planned initially so if you can just validate some
technicals points to be sure it's not crazy using Postgresl here and not
a big $$$ db to do the job.

Typically I will have 1db per client and around 150 tables per db.  So
since I hope I didn`t work all those year for nothing .. I expect to
have bunch of clients witch means the same amount of db since I have 1
db/client. 

Can I hope having several hundred of db on 1 db server?  Like 250 dbs =
250 client = 360 000 tables !!!
So is there a limit for the number of db in the db server ?(this spec is
not on the website) What about the performance? Can I expect to have the
same performance? 

Since I put everything on the web I do needs an High Availability
infrastructure.  I looked into SlonyI and Mammoth to replicate the db
but since SlonyI use triggers can I expect it to do the job?  Is Mammoth
is the only available solution?

Last question and not the least  I'm reading this performance list for
several years now and know suggestion about hardware to run postgresl is
discussed.  Since I wrote software there severals points about hardware
that I don`t understand.  Do you have any suggestion of platform to run
into my Saas configuration?  I do need the WISE one!  I'm pretty sure
that if I was a big company I would be able throw bunch of  but it's
not my case.  I'm pretty sure it exists out there some piece of Hardware
that would do the job perfectly with a fair price.

So far I did understand that Postgresql loves Opteron and I have looked
into the dl145 series of HP.  I did understand that Dell Hardware it`s
not reliable.  But it's still not clear what should be my requirement
for memory, disk, nb cpu, cpu power, etc.

I'm pretty sure it`s better to have more slower CPUs that having the
latest Opteron available on the market, or more slower servers that
having the fastest one...  am I right?  But agains what it`s the optimal
choice?

Thanks you to share your knowledge on those point.  I do consider using
Postgresql is the Smart choice in my project since the beginning but
before putting all the money (That I don`t have ..:-)) to buy some
hardware I just want to be sure I'm not crazy!

Thanks for your help I really appreciate it!!

Best Regards
/David







---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] optimizing LIKE '%2345' queries

2006-07-04 Thread Tarhon-Onu Victor

On Sun, 2 Jul 2006, Gene wrote:


can use an index and perform as fast as searching with like '2345%'?

Is the only way to create a reverse function and create an index using
the reverse function and modify queries to use:

where reverse(column) like reverse('%2345') ?


Hmm.. interesting.
	If (and only if) the records stored in "column" column have fixed 
length (say, all are 50 characters in length) you could create and index 
on, say, substring(column,45,50), and use this in the WHERE clauses in 
your queries.
	Or if the length of those records is not the same maybe it is 
feasible to create an ondex on substring(column, length(column)-5, 
length(column)).


--
Any views or opinions presented within this e-mail are solely those of
the author and do not necessarily represent those of any company, unless
otherwise expressly stated.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Is postgresql ca do the job for software deployed in ASP

2006-07-04 Thread David Gagnon

Hi All,

 First thanks for your help everyone! 





Mikael Carneholm wrote:

Do you really need to create one *DB* per client - that is, is one
schema (in the same DB) per client out of the question? If not, I would
look into moving all reference tables (read-only data, constants and
such) into a common schema (with read permission granted to each
client/role), that way reducing the amount of objects needed to be
created/maintained and at the same time reducing the memory requirements
(lots of shared objects == lots of reused shared buffers). 
For my application there is very little info I can share.  Maybe less 
than 10 on 100 actually so I not sure it worth it ...





Set the
default_tablespace variable per client (login role) also so that the I/O
load can be balanced. A system based on Opterons such as the HP DL385 or
DL585 with two CPUs (or four if you go for the 585), 8-16Gb of RAM and a
decent storage system with 14-28 disks could be worth evaluating.

/Mikael
  
I look into the HP DL385 and DL585 on HP site and they are price between 
3000 and 15000$$ (base price).  Thats quite a difference?  So is the HP 
DL385 with 2 cpus will do the job ?


http://h71016.www7.hp.com/dstore/ctoBases.asp?jumpid=re_NSS_dl585storageserver&oi=E9CED&BEID=19701&SBLID=&ProductLineId=450&FamilyId=2230&LowBaseId=&LowPrice=&familyviewgroup=405&viewtype=Matrix
http://h71016.www7.hp.com/dstore/ctoBases.asp?ProductLineId=431&FamilyId=2048&jumpid=re_hphqiss/Ovw_Buy/DL385

I will look more deeply into them in detail trying to understand 
something ...


Thanks for your help!
Best Regards
/David










-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of David
Gagnon
Sent: den 3 juli 2006 13:42
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Is postgresql ca do the job for software deployed in
ASP ou SaaS mode?

Hi all,

  I've been working on my personal project for 3.5 years now.  I
developed an ERP system in web/java.  Now the people I will work with
suggest to offers it in Saas mode.  Which means my customer will connect
to my website and found they ERP software and data there.  It's not the
deployment I planned initially so if you can just validate some
technicals points to be sure it's not crazy using Postgresl here and not
a big $$$ db to do the job.

Typically I will have 1db per client and around 150 tables per db.  So
since I hope I didn`t work all those year for nothing .. I expect to
have bunch of clients witch means the same amount of db since I have 1
db/client. 


Can I hope having several hundred of db on 1 db server?  Like 250 dbs =
250 client = 360 000 tables !!!
So is there a limit for the number of db in the db server ?(this spec is
not on the website) What about the performance? Can I expect to have the
same performance? 


Since I put everything on the web I do needs an High Availability
infrastructure.  I looked into SlonyI and Mammoth to replicate the db
but since SlonyI use triggers can I expect it to do the job?  Is Mammoth
is the only available solution?

Last question and not the least  I'm reading this performance list for
several years now and know suggestion about hardware to run postgresl is
discussed.  Since I wrote software there severals points about hardware
that I don`t understand.  Do you have any suggestion of platform to run
into my Saas configuration?  I do need the WISE one!  I'm pretty sure
that if I was a big company I would be able throw bunch of  but it's
not my case.  I'm pretty sure it exists out there some piece of Hardware
that would do the job perfectly with a fair price.

So far I did understand that Postgresql loves Opteron and I have looked
into the dl145 series of HP.  I did understand that Dell Hardware it`s
not reliable.  But it's still not clear what should be my requirement
for memory, disk, nb cpu, cpu power, etc.

I'm pretty sure it`s better to have more slower CPUs that having the
latest Opteron available on the market, or more slower servers that
having the fastest one...  am I right?  But agains what it`s the optimal
choice?

Thanks you to share your knowledge on those point.  I do consider using
Postgresql is the Smart choice in my project since the beginning but
before putting all the money (That I don`t have ..:-)) to buy some
hardware I just want to be sure I'm not crazy!

Thanks for your help I really appreciate it!!

Best Regards
/David







---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org




  




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Is postgresql ca do the job for software deployed in ASP ou SaaS mode?

2006-07-04 Thread Mikael Carneholm
> For my application there is very little info I can share.  Maybe less
than 10 on 100 actually so I not sure it worth it ...

Ok, so 90% of the tables are being written to - this either means that
your application uses very little constants, or that it has access to
constans that are stored somewhere else (eg, a JMX Mbean that's
initialized from property files on application startup). Would it be too
much work to redesign the DB model to support more than one client? 

>I look into the HP DL385 and DL585 on HP site and they are price
between 
>3000 and 15000$$ (base price).  Thats quite a difference?  So is the HP

>DL385 with 2 cpus will do the job ?

Yeah, there's quite a difference on the price tags between those two.
I'd vote for the DL385 since the sockets for the two extra CPU's won't
give you linear scalability per $ in the end. A single machine may be
cheaper to administrate, but if administration costs are
irrelevant/negligible I'd go for several 2-socket machines instead of
one 4-socket machine.

/Mikael

> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of David
> Gagnon
> Sent: den 3 juli 2006 13:42
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] Is postgresql ca do the job for software deployed
in
> ASP ou SaaS mode?
>
> Hi all,
>
>   I've been working on my personal project for 3.5 years now.  I
> developed an ERP system in web/java.  Now the people I will work with
> suggest to offers it in Saas mode.  Which means my customer will
connect
> to my website and found they ERP software and data there.  It's not
the
> deployment I planned initially so if you can just validate some
> technicals points to be sure it's not crazy using Postgresl here and
not
> a big $$$ db to do the job.
>
> Typically I will have 1db per client and around 150 tables per db.  So
> since I hope I didn`t work all those year for nothing .. I expect to
> have bunch of clients witch means the same amount of db since I have 1
> db/client. 
>
> Can I hope having several hundred of db on 1 db server?  Like 250 dbs
=
> 250 client = 360 000 tables !!!
> So is there a limit for the number of db in the db server ?(this spec
is
> not on the website) What about the performance? Can I expect to have
the
> same performance? 
>
> Since I put everything on the web I do needs an High Availability
> infrastructure.  I looked into SlonyI and Mammoth to replicate the db
> but since SlonyI use triggers can I expect it to do the job?  Is
Mammoth
> is the only available solution?
>
> Last question and not the least  I'm reading this performance list for
> several years now and know suggestion about hardware to run postgresl
is
> discussed.  Since I wrote software there severals points about
hardware
> that I don`t understand.  Do you have any suggestion of platform to
run
> into my Saas configuration?  I do need the WISE one!  I'm pretty sure
> that if I was a big company I would be able throw bunch of  but
it's
> not my case.  I'm pretty sure it exists out there some piece of
Hardware
> that would do the job perfectly with a fair price.
>
> So far I did understand that Postgresql loves Opteron and I have
looked
> into the dl145 series of HP.  I did understand that Dell Hardware it`s
> not reliable.  But it's still not clear what should be my requirement
> for memory, disk, nb cpu, cpu power, etc.
>
> I'm pretty sure it`s better to have more slower CPUs that having the
> latest Opteron available on the market, or more slower servers that
> having the fastest one...  am I right?  But agains what it`s the
optimal
> choice?
>
> Thanks you to share your knowledge on those point.  I do consider
using
> Postgresql is the Smart choice in my project since the beginning but
> before putting all the money (That I don`t have ..:-)) to buy some
> hardware I just want to be sure I'm not crazy!
>
> Thanks for your help I really appreciate it!!
>
> Best Regards
> /David
>
>
>
>
>
>
>
> ---(end of
broadcast)---
> TIP 4: Have you searched our list archives?
>
>http://archives.postgresql.org
>
>
>
>
>   




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Is postgresql ca do the job for software deployed in ASP

2006-07-04 Thread David Gagnon

Mikael Carneholm wrote:

For my application there is very little info I can share.  Maybe less


than 10 on 100 actually so I not sure it worth it ...

Ok, so 90% of the tables are being written to - this either means that
your application uses very little constants, or that it has access to
constans that are stored somewhere else (eg, a JMX Mbean that's
initialized from property files on application startup). Would it be too
much work to redesign the DB model to support more than one client? 
  
Yes configuration are in property files or somewhere else. I will keep 
this solution in mind but for now I really think that would really 
complicated for what it will give in return...




  

I look into the HP DL385 and DL585 on HP site and they are price

between 
  

3000 and 15000$$ (base price).  Thats quite a difference?  So is the HP



  

DL385 with 2 cpus will do the job ?



Yeah, there's quite a difference on the price tags between those two.
I'd vote for the DL385 since the sockets for the two extra CPU's won't
give you linear scalability per $ in the end. A single machine may be
cheaper to administrate, but if administration costs are
irrelevant/negligible I'd go for several 2-socket machines instead of
one 4-socket machine.
  
I do need 2 machines since I need an HA solution.  So on top of those 
question I try to figure out if Slony-I can do the job in my scenario or 
do I need the Mammoth solution. I'm searching the list right now and 
there is not a lot of info... :-( Any Idea?


So thanks for the info about the DL385 I will look deeply into it !

Best Regards
/David



/Mikael

  

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of David
Gagnon
Sent: den 3 juli 2006 13:42
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Is postgresql ca do the job for software deployed


in
  

ASP ou SaaS mode?

Hi all,

  I've been working on my personal project for 3.5 years now.  I
developed an ERP system in web/java.  Now the people I will work with
suggest to offers it in Saas mode.  Which means my customer will


connect
  

to my website and found they ERP software and data there.  It's not


the
  

deployment I planned initially so if you can just validate some
technicals points to be sure it's not crazy using Postgresl here and


not
  

a big $$$ db to do the job.

Typically I will have 1db per client and around 150 tables per db.  So
since I hope I didn`t work all those year for nothing .. I expect to
have bunch of clients witch means the same amount of db since I have 1
db/client. 


Can I hope having several hundred of db on 1 db server?  Like 250 dbs


=
  

250 client = 360 000 tables !!!
So is there a limit for the number of db in the db server ?(this spec


is
  

not on the website) What about the performance? Can I expect to have


the
  
same performance? 


Since I put everything on the web I do needs an High Availability
infrastructure.  I looked into SlonyI and Mammoth to replicate the db
but since SlonyI use triggers can I expect it to do the job?  Is


Mammoth
  

is the only available solution?

Last question and not the least  I'm reading this performance list for
several years now and know suggestion about hardware to run postgresl


is
  

discussed.  Since I wrote software there severals points about


hardware
  

that I don`t understand.  Do you have any suggestion of platform to


run
  

into my Saas configuration?  I do need the WISE one!  I'm pretty sure
that if I was a big company I would be able throw bunch of  but


it's
  

not my case.  I'm pretty sure it exists out there some piece of


Hardware
  

that would do the job perfectly with a fair price.

So far I did understand that Postgresql loves Opteron and I have


looked
  

into the dl145 series of HP.  I did understand that Dell Hardware it`s
not reliable.  But it's still not clear what should be my requirement
for memory, disk, nb cpu, cpu power, etc.

I'm pretty sure it`s better to have more slower CPUs that having the
latest Opteron available on the market, or more slower servers that
having the fastest one...  am I right?  But agains what it`s the


optimal
  

choice?

Thanks you to share your knowledge on those point.  I do consider


using
  

Postgresql is the Smart choice in my project since the beginning but
before putting all the money (That I don`t have ..:-)) to buy some
hardware I just want to be sure I'm not crazy!

Thanks for your help I really appreciate it!!

Best Regards
/David







---(end of


broadcast)---
  

TIP 4: Have you searched our list archives?

   http://archives.postgresql.org




  








  




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] how to tune this query.

2006-07-04 Thread Luckys
Hi all,
I got this query, I'm having indexes for PropertyId and Dates columns across all the tables, but still it takes ages to get me the result. What indexes would be proposed on this, or I'm helpless?
 
FROM  STG_Property a   FULL OUTER JOIN     STG_PropConfirmedLogs b    
ON (a.PropertyId = b.PropertyId AND a.p_LastModified = b.p_Modified_Date)   FULL OUTER JOIN
    STG_PropConnectionFeesLogs c    ON ((a.PropertyId = c.PropertyId AND a.p_LastModified = 
c.p_ChangedOn)    OR  (b.PropertyId = c.PropertyId AND b.p_Modified_Date = c.p_ChangedOn))
   FULL OUTER JOIN    STG_PropDeletedLogs d    ON ((a.PropertyId = d.PropertyId 
AND a.p_LastModified = d.p_DeletedOn)    OR  (b.PropertyId = d.PropertyId AND b.p_Modified_Date = d.p_DeletedOn)    OR  (
c.PropertyId = d.PropertyId AND c.p_ChangedOn = d.p_DeletedOn))   FULL OUTER
 JOIN    STG_PropFEWALogs e    ON ((a.PropertyId = e.PropertyId AND a.p_LastModified = e.p_Modified_Date)    
OR  (b.PropertyId = e.PropertyId AND b.p_Modified_Date = e.p_Modified_Date) OR  (c.PropertyId = e.PropertyId 
AND c.p_ChangedOn = e.p_Modified_Date)    OR  (d.PropertyId = e.PropertyId AND d.p_DeletedOn = e.p_Modified_Date)) 
  FULL OUTER JOIN    STG_PropInSewerNetworkLogs f    ON ((a.PropertyId = f.PropertyId 
AND a.p_LastModified = f.p_Modified_Date)    OR  (b.PropertyId = f.PropertyId AND b.p_Modified_Date = f.p_Modified_Date)    OR
  (c.PropertyId = f.PropertyId AND c.p_ChangedOn = f.p_Modified_Date)    OR  (d.PropertyId = f.PropertyId
 AND d.p_DeletedOn = f.p_Modified_Date)     OR  (e.PropertyId = f.PropertyId AND e.p_Modified_Date = f.p_Modified_Date))   
FULL OUTER JOIN    STG_PropTypeLogs g    ON ((a.PropertyId = g
.PropertyId AND a.p_LastModified = g.p_LastModified)    OR  (b.PropertyId = g.PropertyId 
AND b.p_Modified_Date = g.p_LastModified)    OR  (c.PropertyId = g.PropertyId 
AND c.p_ChangedOn = g.p_LastModified)    OR  (d.PropertyId = g.PropertyId 
AND d.p_DeletedOn = g.p_LastModified)    OR  (e.PropertyId = g.PropertyId AND e.p_Modified_Date
 = g.p_LastModified)    OR  (f.PropertyId = g.PropertyId AND f.p_Modified_Date = 
g.p_LastModified))
 
-- Luckys


Re: [PERFORM] how to tune this query.

2006-07-04 Thread Nolan Cafferky
I don't think indexes are going to help you here - with the FULL OUTER 
JOINs, the query will have to look at and include each row from each 
table you query from anyway, so it's going to choose sequential scans. 
In addition, some of the lower join conditions are going to take forever.


What's is your goal?  The volume of data that I imagine this query would 
produce can't possibly be useful. I'm guessing at the very least you'll 
want to LEFT OUTER JOIN everything back against STG_Property, and leave 
the other join conditions out of each ON statement.


Luckys wrote:


Hi all,
I got this query, I'm having indexes for PropertyId and Dates columns 
across all the tables, but still it takes ages to get me the result. 
What indexes would be proposed on this, or I'm helpless?
 
FROM  STG_Property a
 
  FULL OUTER JOIN

STG_PropConfirmedLogs b
ON (a.PropertyId = b.PropertyId AND a.p_LastModified = 
b.p_Modified_Date)
 
  FULL OUTER JOIN

STG_PropConnectionFeesLogs c
ON ((a.PropertyId = c.PropertyId AND a.p_LastModified = c.p_ChangedOn)
OR  (b.PropertyId = c.PropertyId AND b.p_Modified_Date = 
c.p_ChangedOn))
 
  FULL OUTER JOIN

STG_PropDeletedLogs d
ON ((a.PropertyId = d.PropertyId AND a.p_LastModified = d.p_DeletedOn)
OR  (b.PropertyId = d.PropertyId AND b.p_Modified_Date = 
d.p_DeletedOn)

OR  ( c.PropertyId = d.PropertyId AND c.p_ChangedOn = d.p_DeletedOn))
 
  FULL OUTER JOIN

STG_PropFEWALogs e
ON ((a.PropertyId = e.PropertyId AND a.p_LastModified = 
e.p_Modified_Date)
OR  (b.PropertyId = e.PropertyId AND b.p_Modified_Date = 
e.p_Modified_Date) OR  (c.PropertyId = e.PropertyId AND c.p_ChangedOn 
= e.p_Modified_Date)
OR  (d.PropertyId = e.PropertyId AND d.p_DeletedOn = 
e.p_Modified_Date))
 
  FULL OUTER JOIN

STG_PropInSewerNetworkLogs f
ON ((a.PropertyId = f.PropertyId AND a.p_LastModified = 
f.p_Modified_Date)
OR  (b.PropertyId = f.PropertyId AND b.p_Modified_Date = 
f.p_Modified_Date)
OR   (c.PropertyId = f.PropertyId AND c.p_ChangedOn = 
f.p_Modified_Date)
OR  (d.PropertyId = f.PropertyId AND d.p_DeletedOn = 
f.p_Modified_Date)
OR  (e.PropertyId = f.PropertyId AND e.p_Modified_Date = 
f.p_Modified_Date))   FULL OUTER JOIN

STG_PropTypeLogs g
ON ((a.PropertyId = g .PropertyId AND a.p_LastModified = 
g.p_LastModified)
OR  (b.PropertyId = g.PropertyId AND b.p_Modified_Date = 
g.p_LastModified)

OR  (c.PropertyId = g.PropertyId AND c.p_ChangedOn = g.p_LastModified)
OR  (d.PropertyId = g.PropertyId AND d.p_DeletedOn = g.p_LastModified)
OR  (e.PropertyId = g.PropertyId AND e.p_Modified_Date = 
g.p_LastModified)
OR  (f.PropertyId = g.PropertyId AND f.p_Modified_Date = 
g.p_LastModified))
 
-- Luckys


--

Nolan Cafferky
Software Developer
IT Department
RBS Interactive
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] optimizing LIKE '%2345' queries

2006-07-04 Thread Gene

Thanks for the suggestion. Actually I went ahead and created a reverse
function using plpgsql, created an index using reverse column and now
my queries use "where reverse(column) like reverse('%2345') and it's
using the index like i hoped it would! Now if I could figure out how
to optimize like  '%2345%' queries. I don't want to create many
indexes though the table is very write heavy.


> Is the only way to create a reverse function and create an index using
> the reverse function and modify queries to use:
>
> where reverse(column) like reverse('%2345') ?

Hmm.. interesting.
If (and only if) the records stored in "column" column have fixed
length (say, all are 50 characters in length) you could create and index
on, say, substring(column,45,50), and use this in the WHERE clauses in
your queries.
Or if the length of those records is not the same maybe it is
feasible to create an ondex on substring(column, length(column)-5,
length(column)).

--
Any views or opinions presented within this e-mail are solely those of
the author and do not necessarily represent those of any company, unless
otherwise expressly stated.



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] optimizing LIKE '%2345' queries

2006-07-04 Thread Chris

Gene wrote:

Thanks for the suggestion. Actually I went ahead and created a reverse
function using plpgsql, created an index using reverse column and now
my queries use "where reverse(column) like reverse('%2345') and it's
using the index like i hoped it would! Now if I could figure out how
to optimize like  '%2345%' queries. I don't want to create many
indexes though the table is very write heavy.


You can't because that text can be anywhere inside the database field, 
so the whole field basically has to be checked to see if it's there.


You could check out full text indexing (tsearch2).


http://www.designmagick.com/article/27/PostgreSQL/Introduction-to-Full-Text-Indexing

--
Postgresql & php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Problem with bitmap-index-scan plan

2006-07-04 Thread jkapad
Hi,

If I run the query

explain analyze select * from ind_uni_100 where a=1 and b=1 and c=1

I get the following plan:

Bitmap Heap Scan on ind_uni_100 (cost=942.50..1411.12 rows=125 width=104)
(actual time=72.556..72.934 rows=116 loops=1)
  Recheck Cond: ((c = 1) AND (a = 1) AND (b = 1))
  ->  BitmapAnd  (cost=942.50..942.50 rows=125 width=0) (actual
time=72.421..72.421 rows=0 loops=1)
->  Bitmap Index Scan on index_c_ind_uni_100  (cost=0.00..314.00
rows=5 width=0) (actual time=21.854..21.854 rows=49832 loops=1)
  Index Cond: (c = 1)
->  Bitmap Index Scan on index_a_ind_uni_100  (cost=0.00..314.00
rows=5 width=0) (actual time=22.371..22.371 rows=50319 loops=1)
  Index Cond: (a = 1)
->  Bitmap Index Scan on index_b_ind_uni_100  (cost=0.00..314.00
rows=5 width=0) (actual time=14.226..14.226 rows=49758 loops=1)
  Index Cond: (b = 1)
Total runtime: 73.395 ms

Which is quite reasonable.The table has 1.000.000 rows (17.242 pages). From
pg_stat_get_blocks_fetched I can see that there were 102 page requests for
table. So all things seem to work great here!

But if I multiply the size of the table ten-times (10.000.000 rows - 172.414
pages) and run the same query I get:
 
explain analyze select * from ind_uni_1000 where a=1 and b=1 and c=1

Bitmap Heap Scan on ind_uni_1000  (cost=9369.50..14055.74 rows=1250 width=104)
(actual time=18111.415..176747.937 rows=1251 loops=1)
  Recheck Cond: ((c = 1) AND (a = 1) AND (b = 1))
  ->  BitmapAnd  (cost=9369.50..9369.50 rows=1250 width=0) (actual
time=17684.587..17684.587 rows=0 loops=1)
->  Bitmap Index Scan on index_c_ind_uni_1000  (cost=0.00..3123.00
rows=50 width=0) (actual time=5704.624..5704.624 rows=500910 loops=1)
  Index Cond: (c = 1)
->  Bitmap Index Scan on index_a_ind_uni_1000  (cost=0.00..3123.00
rows=50 width=0) (actual time=6147.962..6147.962 rows=500080 loops=1)
  Index Cond: (a = 1)
->  Bitmap Index Scan on index_b_ind_uni_1000  (cost=0.00..3123.00
rows=50 width=0) (actual time=5767.754..5767.754 rows=500329 loops=1)
  Index Cond: (b = 1)
Total runtime: 176753.200 ms

which is slower even than a seq scan. Now I get that there were 131.398 page
requests for table in order to retrieve almost 1250 tuples!Can someone explain
why this is happening? All memory parameters are set to default.

Thanks!



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster