[PERFORM] Windows performance again

2005-12-21 Thread Josep Maria Pinyol Fontseca

Hello,

We have the next scenario:

Linux box with postgresql 7.2.1-1 (client)
Windows XP with postgresql 8.1.1 (server)
Windows XP with postgresql 8.1.1 (client)

All connected in 10Mb LAN

In server box, we have a table with 65000 rows and usign "psql" we have 
these results:


Linux box with psql version 7.2.1 versus Windows XP server:

select * from ; -> 7 seconds aprox. to obtain a results.
Network utilization: 100%

Windows XP client box with psql version 8.1.1 versus Windows XP server:

select * from ; -> 60 seconds aprox. to obtain a results
Network utilization: 3%

Windows XP server box with psql version 8.1.1 versus Windows XP server:

select * from ; -> <1 seconds aprox. to obtain a results.
Network utilization: 0%

Is a really problem, because we are migrating a old server to 8.0 
version in a windows box, and our application works really slow


Thanks in advance,

Josep Maria

--

Josep Maria Pinyol i Fontseca
Responsable àrea de programació

ENDEPRO - Enginyeria de programari
Passeig Anselm Clavé, 19 Bx. 08263 Callús (Barcelona)
Tel. +34 936930018 - Mob. +34 600310755 - Fax. +34 938361994
[EMAIL PROTECTED] - http://www.endepro.com


Aquest missatge i els documents en el seu cas adjunts, 
es dirigeixen exclusivament al seu destinatari i poden contenir 
informació reservada i/o CONFIDENCIAL, us del qual no està 
autoritzat ni la divulgació del mateix, prohibit per la legislació 
vigent (Llei 32/2002 SSI-CE). Si ha rebut aquest missatge per error, 
li demanem que ens ho comuniqui immediatament per la mateixa via o 
bé per telèfon (+34936930018) i procedeixi a la seva destrucció. 
Aquest e-mail no podrà considerar-se SPAM.


Este mensaje, y los documentos en su caso anexos, 
se dirigen exclusivamente a su destinatario y pueden contener 
información reservada y/o CONFIDENCIAL cuyo uso no 
autorizado o divulgación está prohibida por la legislación 
vigente (Ley 32/2002 SSI-CE). Si ha recibido este mensaje por error, 
le rogamos que nos lo comunique inmediatamente por esta misma vía o 
por teléfono (+34936930018) y proceda a su destrucción. 
Este e-mail no podrá considerarse SPAM.


This message and the enclosed documents are directed exclusively 
to its receiver and can contain reserved and/or confidential 
information, from which use isn’t allowed its divulgation, forbidden 
by the current legislation (Law 32/2002 SSI-CE). If you have received 
this message by mistake, we kindly ask you to communicate it to us 
right away by the same way or by phone (+34936930018) and destruct it. 
This e-mail can’t be considered as SPAM. 



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Windows performance again

2005-12-21 Thread Richard Huxton

Josep Maria Pinyol Fontseca wrote:


Linux box with psql version 7.2.1 versus Windows XP server:

select * from ; -> 7 seconds aprox. to obtain a results.
Network utilization: 100%

Windows XP client box with psql version 8.1.1 versus Windows XP server:

select * from ; -> 60 seconds aprox. to obtain a results
Network utilization: 3%

Windows XP server box with psql version 8.1.1 versus Windows XP server:

select * from ; -> <1 seconds aprox. to obtain a results.
Network utilization: 0%


It's *got* to be the network configuration on the client machine. I'd be 
tempted to install ethereal on the linux box and watch for the 
difference between the two networked sessions.


I'm guessing it might be something to do with TCP/IP window/buffer sizes 
and delays on ACKs - this certainly used to be an issue on older MS 
systems, but I must admit I thought they'd fixed it for XP.


If not that, could some firewall/security system be slowing network traffic?

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] SAN/NAS options

2005-12-21 Thread Alex Stapleton


I hope this isn't too far off topic for this list.  Postgres is  
the main application that I'm looking to accomodate.  Anything  
else I can do with whatever solution we find is just gravy...
You've given me a lot to go on...  Now I'm going to have to do some  
research as to real-world RAID controller performance.  It's vexing  
(to say the least) that most vendors don't supply any raw  
throughput or TPS stats on this stuff...


One word of advice. Stay away from Dell kit. The PERC 4 controllers  
they use don't implement RAID 10 properly. It's RAID 1 + JBOD array.  
It also has generally dismal IOPS performance too. You might get away  
with running software RAID, either in conjunction with, or entirely  
avoiding the card.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

2005-12-21 Thread Antal Attila

Jim C. Nasby wrote:


How much space does that equate to?
 


Have you optimized the queries?

Items that generally have the biggest impact on performance in
decreasing order:
1. System architecture
2. Database design
3. (for long-running/problem queries) Query plans
4. Disk I/O
5. Memory
6. CPU

So, I'd make sure that the queries have been optimized (and that
includes tuning postgresql.conf) before assuming you need more hardware.

Based on what you've told us (very little parallelization), then your
biggest priority is probably either disk IO or memory (or both). Without
knowing the size of your database/working set it's difficult to provide
more specific advice.
 


Hi!

We have 3 Compaq Proliant ML530 servers with dual Xeon 2.8GHz 
processors, 3 GB DDR RAM, Ultra Wide SCSI RAID5 1rpm and 1000Gbit 
ethernet. We partitioned our databases among these machines, but there 
are cross refrences among the machines theoretically. Now the size of 
datas is about 100-110GB. We've used these servers for 3 years with 
Debian Linux. We have already optimized the given queries and the 
postgresql.conf. We tried more tricks and ideas and we read and asked on 
mailing lists. We cannot do anything, we should buy new server for the 
databases, because we develop our system for newer services, so the size 
will grow along. After that we need better responsiblility and shorter 
execution time for the big queries (These queries are too complicated to 
discuss here, and more times we optimized with plpgsql stored procedures.).
The PostgreSQL 8.1 solved more paralellization and overload problem, the 
average load is decreased significantly on our servers. But the big 
queries aren't fast enough. We think the hardver is the limit. Generally 
2 parallel guery running in working hours, after we make backups at night.


Regards, Atesz


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] SAN/NAS options

2005-12-21 Thread Luke Lonergan
Charles,

On 12/20/05 9:58 PM, "Charles Sprickman" <[EMAIL PROTECTED]> wrote:

> You've given me a lot to go on...  Now I'm going to have to do some
> research as to real-world RAID controller performance.  It's vexing (to
> say the least) that most vendors don't supply any raw throughput or TPS
> stats on this stuff...

Take a look at this:
  http://www.wlug.org.nz/HarddiskBenchmarks

> Anyhow, thanks again.  You'll probably see me back here in the coming
> months as I try to shake some mysql info out of my brain as our pgsql DBA
> gets me up to speed on pgsql and what specifically he's doing to stress
> things.

Cool!

BTW - based on the above benchmark page, I just immediately ordered 2 x of
the Areca 1220 SATA controllers (
http://www.areca.com.tw/products/html/pcie-sata.htm) so that we can compare
them to the 3Ware 9550SX that we've been using.  The 3Ware controllers have
been super fast on sequential access, but I'm concerned about their random
IOPs.  The Areca's aren't as popular, and there's consequently less volume
of them, but people who use them rave about them.

- Luke 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Speed of different procedural language

2005-12-21 Thread Steinar H. Gunderson
On Sun, Dec 18, 2005 at 01:10:21AM -, Ben Trewern wrote:
> I know I should be writing these in C but that's a bit beyond me.  I was 
> going to try PL/Python or PL/Perl or even PL/Ruby.  Has anyone any idea 
> which language is fastest, or is the data access going to swamp the overhead 
> of small functions?

I'm not sure if it's what you ask for, but there _is_ a clear difference
between the procedural languages -- I've had a 10x speed increase from
rewriting PL/PgSQL stuff into PL/Perl, for instance. I'm not sure which ones
would be faster, though -- I believe Ruby is slower than Perl or Python
generally, but I don't know how it all works out in a PL/* setting.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] [GENERAL] need help

2005-12-21 Thread Alban Medici \(NetCentrex\)

Try to execute your query (in psql) with prefixing by EXPLAIN ANALYZE and
send us the result
 db=# EXPLAIN ANALYZE UPDATE s_apotik SET stock = 100 WHERE obat_id='A';

regards

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tino Wildenhain
Sent: mardi 6 décembre 2005 09:55
To: Jenny
Cc: pgsql-general@postgresql.org; pgsql-sql@postgresql.org;
pgsql-performance@postgresql.org
Subject: Re: [PERFORM] [GENERAL] need help

Jenny schrieb:
> I'm running PostgreSQL 8.0.3 on i686-pc-linux-gnu (Fedora Core 2). 
> I've been dealing with Psql for over than 2 years now, but I've never 
> had this case before.
> 
> I have a table that has about 20 rows in it.
> 
>Table "public.s_apotik"
> Column |  Type| Modifiers
> ---+--+--
> obat_id| character varying(10)| not null
> stock  | numeric  | not null
> s_min  | numeric  | not null
> s_jual | numeric  | 
> s_r_jual   | numeric  | 
> s_order| numeric  | 
> s_r_order  | numeric  | 
> s_bs   | numeric  | 
> last_receive   | timestamp without time zone  |
> Indexes:
>"s_apotik_pkey" PRIMARY KEY, btree(obat_id)
>
> When I try to UPDATE one of the row, nothing happens for a very long time.
> First, I run it on PgAdminIII, I can see the miliseconds are growing 
> as I waited. Then I stop the query, because the time needed for it is 
> unbelievably wrong.
> 
> Then I try to run the query from the psql shell. For example, the 
> table has obat_id : A, B, C, D.
> db=# UPDATE s_apotik SET stock = 100 WHERE obat_id='A'; ( nothing 
> happens.. I press the Ctrl-C to stop it. This is what comes out
> :)
> Cancel request sent
> ERROR: canceling query due to user request
> 
> (If I try another obat_id)
> db=# UPDATE s_apotik SET stock = 100 WHERE obat_id='B'; (Less than a 
> second, this is what comes out :) UPDATE 1
> 
> I can't do anything to that row. I can't DELETE it. Can't DROP the table. 
> I want this data out of my database.
> What should I do? It's like there's a falsely pointed index here.
> Any help would be very much appreciated.
> 

1) lets hope you do regulary backups - and actually tested restore.
1a) if not, do it right now
2) reindex the table
3) try again to modify

Q: are there any foreign keys involved? If so, reindex those tables too,
just in case.

did you vacuum regulary?

HTH
Tino

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] 2 phase commit: performance implications?

2005-12-21 Thread Andy Ballingall



>Why not just query adjacent databases, rather than copying the data around?

The reasons I didn't choose this way were:
1) I didn't think there's a way to write a query that can act on the data in
two
Databases as though it was all in one, and I didn't want to get into merging
multiple database query results on the Application side. I'd rather just
have all the needed data sitting in a single database so that I can perform
whatever query I like without complication.
2) Most database accesses are reads, and I didn't want a big network
overhead for these, especially since I'm aiming for each database to be
entirely RAM resident.

>If you really wanted to do this, do you need 2pc?  Once data has been
uploaded to the database for region A, then asynchronously copy the data to
B, C, D and E later, using a queue.  

I've always assumed that my data needed to be consistent. I guess there are
some circumstances where it isn't really a problem, but each would need to
be carefully evaluated. The easy answer is to say 'yes, it must be
consistent'.

>If you try to commit to all at once, then if one fails, then none has the
data.

Yes, I'd prefer things to be that way in any event.

Regards,
Andy


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Windows performance again

2005-12-21 Thread Greg Stark
Richard Huxton  writes:

> Josep Maria Pinyol Fontseca wrote:

> > Windows XP client box with psql version 8.1.1 versus Windows XP server:
> > select * from ; -> 60 seconds aprox. to obtain a results
> > Network utilization: 3%

The 60 seconds sounds suspiciously like a DNS problem.

-- 
greg


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Windows performance again

2005-12-21 Thread Tom Lane
Richard Huxton  writes:
> Josep Maria Pinyol Fontseca wrote:
>> Network utilization: 0%

> It's *got* to be the network configuration on the client machine.

We've seen gripes of this sort before --- check the list archives for
possible fixes.  I seem to recall something about a "QoS patch", as
well as suggestions to get rid of third-party packages that might be
interfering with the TCP stack.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] Wrong index used when ORDER BY LIMIT 1

2005-12-21 Thread Szűcs Gábor

Dear Gurus,

Version: 7.4.6

I use a query on a heavily indexed table which picks a wrong index 
unexpectedly. Since this query is used in response to certain user 
interactions thousands of times in succession (with different constants), 
500ms is not affordable for us. I can easily work around this, but I'd like 
to understand the root of the problem.


Basically, there are two relevant indexes:
- muvelet_vonalkod_muvelet btree (muvelet, ..., idopont)
- muvelet_vonalkod_pk3 btree (idopont, ...)

Query is:
SELECT idopont WHERE muvelet = x ORDER BY idopont LIMIT 1.

I expected the planner to choose the index on muvelet, then sort by idopont.
Instead, it took the other index. I think there is heavy correlation since 
muvelet references to a sequenced pkey and idopont is a timestamp (both 
increase with passing time). May that be a cause?


See full table description and explain analyze results at end of the email.


TIA,
--
G.

 table :
   Table "public.muvelet_vonalkod"
   Column   |   Type   |  Modifiers
+--+---
 az | integer  | not null def. nextval('...')
 olvaso_nev | character varying| not null
 vonalkod   | character varying| not null
 mozgasnem  | integer  | not null
 idopont| timestamp with time zone | not null
 muvelet| integer  |
 minoseg| integer  | not null
 cikk   | integer  |
 muszakhely | integer  |
 muszakkod  | integer  |
 muszaknap  | date |
 repre  | boolean  | not null default false
 hiba   | integer  | not null default 0
Indexes:
"muvelet_vonalkod_pkey" primary key, btree (az)
"muvelet_vonalkod_pk2" unique, btree (olvaso_nev, idopont)
"muvelet_vonalkod_muvelet" btree
(muvelet, mozgasnem, vonalkod, olvaso_nev, idopont)
"muvelet_vonalkod_pk3" btree (idopont, olvaso_nev)
"muvelet_vonalkod_vonalkod" btree
(vonalkod, mozgasnem, olvaso_nev, idopont)
Foreign-key constraints:
"$1" FOREIGN KEY (mozgasnem) REFERENCES mozgasnem(az)
"$2" FOREIGN KEY (muvelet) REFERENCES muvelet(az)
"$3" FOREIGN KEY (minoseg) REFERENCES minoseg(az)
"$4" FOREIGN KEY (cikk) REFERENCES cikk(az)
"$5" FOREIGN KEY (muszakhely) REFERENCES hely(az)
"$6" FOREIGN KEY (muszakkod) REFERENCES muszakkod(az)
"muvelet_vonalkod_muszak_fk"
   FOREIGN KEY (muszakhely, muszaknap, muszakkod)
   REFERENCES muszak(hely, nap, muszakkod)
Triggers:
muvelet_vonalkod_aiud AFTER INSERT OR DELETE OR UPDATE ON 
muvelet_vonalkod FOR EACH ROW EXECUTE PROCEDURE muvelet_vonalkod_aiud()
muvelet_vonalkod_biu BEFORE INSERT OR UPDATE ON muvelet_vonalkod FOR 
EACH ROW EXECUTE PROCEDURE muvelet_vonalkod_biu()
muvelet_vonalkod_noty AFTER INSERT OR DELETE OR UPDATE ON 
muvelet_vonalkod FOR EACH ROW EXECUTE PROCEDURE muvelet_vonalkod_noty()



-- original query, limit
# explain analyze
  select idopont from muvelet_vonalkod
  where muvelet=6859 order by idopont
  limit 1;
   QUERY PLAN 



 Limit  (cost=0.00..25.71 rows=1 width=8) (actual time=579.528..579.529 
rows=1 loops=1)
   ->  Index Scan using muvelet_vonalkod_pk3 on muvelet_vonalkod 
(cost=0.00..8304.42 rows=323 width=8) (actual time=579.522..579.522 rows=1 
loops=1)

 Filter: (muvelet = 6859)
 Total runtime: 579.606 ms
(4 rows)

-- however, if I omit the limit clause:
# explain analyze
  select idopont from muvelet_vonalkod
  where muvelet=6859 order by idopont;
   QUERY PLAN 


---
 Sort  (cost=405.41..405.73 rows=323 width=8) (actual time=1.295..1.395 
rows=360 loops=1)

   Sort Key: idopont
   ->  Index Scan using muvelet_vonalkod_muvelet on muvelet_vonalkod 
(cost=0.00..400.03 rows=323 width=8) (actual time=0.049..0.855 rows=360 loops=1)

 Index Cond: (muvelet = 6859)
 Total runtime: 1.566 ms
(5 rows)

-- workaround 1: the planner is hard to trick...
# explain analyze
  select idopont from
  (select idopont from muvelet_vonalkod
   where muvelet=6859) foo
  order by idopont limit 1;
   QUERY PLAN 


---
 Limit  (cost=0.00..25.71 rows=1 width=8) (actual time=584.403..584.404 
rows=1 loops=1)
   ->  Index Scan using muvelet_vonalkod_pk3 on muvelet_vonalkod 
(cost=0.00..8304.42 rows=323 width=8) (actual time=584.397..584.397 rows=1 
loops=1)

 Filter: (muvelet = 6859)
 Total runtime: 584.482 ms
(4 rows)

-- workaround 2: quite ugly but seems to work (at least for this
-- one test case):
# explain analyze
  select idopont from
  (select idopont from muvelet_vonalkod
   where muvelet=6859 orde

[PERFORM] ORDER BY costs

2005-12-21 Thread Carlos Benkendorf
Hi,     We´ve a SELECT that even without ORDER BY is returning the rows in the order that we liked but when we add the ORDER BY clause the runtime and costs are much bigger.     We have to use ORDER BY otherwise in some future postgresql version probably it will not return in the correct order anymore.     But if we use ORDER BY it´s too much expensive... is there a way to have the same costs and runtime but with the ORDER BY clause?     Why is not the planner using the access plan builded for the "without order by" select  even if we use the order by clause? The results are both the same...     Postgresql version: 8.0.3     Without order by:  explain analyzeSELECT * FROM iparq.ARRIPT where (ANOCALC =  2005and CADASTRO =  19and CODVENCTO =  00a
 nd
 PARCELA >=  00 ) or (ANOCALC =  2005and CADASTRO =  19and CODVENCTO >  00 ) or (ANOCALC =  2005and CADASTRO >  19 ) or (ANOCALC >  2005 );   Index Scan using pk_arript, pk_arript, pk_arript, pk_arript on arript  (cost=0.00..122255.35 rows=146602 width=897) (actual time=9.303..1609.987 rows=167710 loops=1)   Index Cond: (((anocalc = 2005::numeric) AND (cadastro = 19::numeric) AND (codvencto = 0::numeric) AND (parcela >= 0::numeric)) OR ((anocalc = 2005::numeric) AND (cadastro = 19::numeric) AND (codvencto > 0::numeric)) OR ((anocalc = 2005::numeric) AND (cadastro > 19::numeric)) OR (anocalc > 2005::numeric)) Total runtime: 1712.456 ms(3 rows)      With order by:explain analyzeSELECT * FROM iparq.ARRIPT where (ANOCALC =  2005and CADASTRO =  19and COD
 VENCTO
 =  00and PARCELA >=  00 ) or (ANOCALC =  2005and CADASTRO =  19and CODVENCTO >  00 ) or (ANOCALC =  2005and CADASTRO >  19 ) or (ANOCALC >  2005 )order by ANOCALC asc, CADASTRO asc, CODVENCTO asc, PARCELA asc;   Sort  (cost=201296.59..201663.10 rows=146602 width=897) (actual time=9752.555..10342.363 rows=167710 loops=1)   Sort Key: anocalc, cadastro, codvencto, parcela   ->  Index Scan using pk_arript, pk_arript, pk_arript, pk_arript on arript  (cost=0.00..122255.35 rows=146602 width=897) (actual time=0.402..1425.085 rows=167710 loops=1) Index Cond: (((anocalc = 2005::numeric) AND (cadastro = 19::numeric) AND (codvencto = 0::numeric) AND (parcela >= 0::numeric)) OR ((anocalc = 2005::numeric) AND (cadastro = 19::numeric) AND (codvencto > 0::numeric)) OR ((anocalc 
 =
 2005::numeric) AND (cadastro > 19::numeric)) OR (anocalc > 2005::numeric)) Total runtime: 10568.290 ms(5 rows)  Table definition: Table "iparq.arript"  Column   | Type  | Modifiers---+---+--- anocalc   | numeric(4,0)  | not null cadastro  | numeric(8,0)  | not null codvencto | numeric(2,0)  | 
 not
 null parcela   | numeric(2,0)  | not null inscimob  | character varying(18) | not null codvencto2    | numeric(2,0)  | not null parcela2  | numeric(2,0)  | not null codpropr  | numeric(10,0) | not null dtaven    | numeric(8,0)  | not null anocalc2  |
 numeric(4,0)  |..Indexes:    "pk_arript" PRIMARY KEY, btree (anocalc, cadastro, codvencto, parcela)    "iarchave04" UNIQUE, btree (cadastro, anocalc, codvencto, parcela)    "iarchave02" btree (inscimob, anocalc, codvencto2, parcela2)    "iarchave03" btree (codpropr, dtaven)    "iarchave05" btree (anocalc, inscimob, codvencto2, parcela2)     Best regards and thank you very much in advance,     Carlos Benkendorf
		 
Yahoo! doce lar. Faça do Yahoo! sua homepage.

Re: [PERFORM] Windows performance again

2005-12-21 Thread Merlin Moncure
> > It's *got* to be the network configuration on the client machine.
> 
> We've seen gripes of this sort before --- check the list archives for
> possible fixes.  I seem to recall something about a "QoS patch", as
> well as suggestions to get rid of third-party packages that might be
> interfering with the TCP stack.

I personally checked out the last report from a poster who got the issue
on win2k but not on winxp.  I ran his exact dump into my 2k server with
no problems.  This is definitely some type of local issue.

Josep: does your table have any large ( > 1k ) fields in it?

Merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Wrong index used when ORDER BY LIMIT 1

2005-12-21 Thread Michael Fuhr
On Wed, Dec 21, 2005 at 07:03:00PM +0100, Sz?cs Gbor wrote:
> Version: 7.4.6
[...]
> Query is:
> SELECT idopont WHERE muvelet = x ORDER BY idopont LIMIT 1.
> 
> I expected the planner to choose the index on muvelet, then sort by idopont.
> Instead, it took the other index.

I think the planner is guessing that since you're ordering on
idopont, scanning the idopont index will find the first matching
row faster than using the muvelet index would.  In many cases that's
a good bet, but in this case the guess is wrong and you end up with
a suboptimal plan.

I just ran some tests with 8.1.1 and it chose the better plan for
a query similar to what you're doing.  One of the developers could
probably explain why; maybe it's because of the changes that allow
better use of multicolumn indexes.  Try 8.1.1 if you can and see
if you get better results.

> -- workaround 2: quite ugly but seems to work (at least for this
> -- one test case):
> # explain analyze
>   select idopont from
>   (select idopont from muvelet_vonalkod
>where muvelet=6859 order by idopont) foo
>   order by idopont limit 1;

Another workaround is to use OFFSET 0 in the subquery.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Wrong index used when ORDER BY LIMIT 1

2005-12-21 Thread Tom Lane
=?ISO-8859-2?Q?Sz=FBcs_G=E1bor?= <[EMAIL PROTECTED]> writes:
> Query is:
> SELECT idopont WHERE muvelet = x ORDER BY idopont LIMIT 1.

Much the best solution for this would be to have an index on
(muvelet, idopont)
--- perhaps you can reorder the columns of "muvelet_vonalkod_muvelet"
instead of making a whole new index --- and then say

SELECT idopont WHERE muvelet = x ORDER BY muvelet, idopont LIMIT 1

PG 8.1 can apply such an index to your original query, but older
versions will need the help of the modified ORDER BY to recognize
that the index is usable.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] ORDER BY costs

2005-12-21 Thread Tom Lane
Carlos Benkendorf <[EMAIL PROTECTED]> writes:
>  Table "iparq.arript"
>   Column   | Type  | Modifiers
> ---+---+---
>  anocalc   | numeric(4,0)  | not null
>  cadastro  | numeric(8,0)  | not null
>  codvencto | numeric(2,0)  | not null
>  parcela   | numeric(2,0)  | not null
>  inscimob  | character varying(18) | not null
>  codvencto2| numeric(2,0)  | not null
>  parcela2  | numeric(2,0)  | not null
>  codpropr  | numeric(10,0) | not null
>  dtaven| numeric(8,0)  | not null
>  anocalc2  | numeric(4,0)  |

I suspect you'd find a significant performance improvement from changing
the NUMERIC columns to int or bigint as needed.  Numeric comparisons are
pretty slow.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Speed of different procedural language

2005-12-21 Thread Merlin Moncure
> On Sun, Dec 18, 2005 at 01:10:21AM -, Ben Trewern wrote:
> > I know I should be writing these in C but that's a bit beyond me.  I
was
> > going to try PL/Python or PL/Perl or even PL/Ruby.  Has anyone any
idea
> > which language is fastest, or is the data access going to swamp the
> overhead
> > of small functions?
> 
> I'm not sure if it's what you ask for, but there _is_ a clear
difference
> between the procedural languages -- I've had a 10x speed increase from
> rewriting PL/PgSQL stuff into PL/Perl, for instance. I'm not sure
which
> ones
> would be faster, though -- I believe Ruby is slower than Perl or
Python
> generally, but I don't know how it all works out in a PL/* setting.

So far, I use plpgsql for everything...queries being first class and
all...I don't have any performance problems with it.  I have cut the
occasional C routine, but for flexibility not for speed.

PL/Perl routines cannot directly execute each other, meaning you can't
pass high level objects between them like refcursors. YMMV

Since most database apps are bound by the server one way or another I
would imagine you should be choosing a language on reasons other than
performance. 

Maybe Ben you could provide an example of what you are trying to do that
is not fast enough?

Merlin 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Speed of different procedural language

2005-12-21 Thread Michael Fuhr
On Wed, Dec 21, 2005 at 12:06:47PM +0100, Steinar H. Gunderson wrote:
> On Sun, Dec 18, 2005 at 01:10:21AM -, Ben Trewern wrote:
> > I know I should be writing these in C but that's a bit beyond me.  I was 
> > going to try PL/Python or PL/Perl or even PL/Ruby.  Has anyone any idea 
> > which language is fastest, or is the data access going to swamp the 
> > overhead 
> > of small functions?
> 
> I'm not sure if it's what you ask for, but there _is_ a clear difference
> between the procedural languages -- I've had a 10x speed increase from
> rewriting PL/PgSQL stuff into PL/Perl, for instance.

The difference is clear only in specific cases; just because you
saw a 10x increase in some cases doesn't mean you can expect that
kind of increase, or indeed any increase, in others.  I've seen
PL/pgSQL beat all other PL/* challengers handily many times,
especially when the function does a lot of querying and looping
through large result sets.

I tend to use PL/pgSQL except in cases where PL/pgSQL can't do what
I want or the job would be much easier in another language (e.g.,
string manipulation, for which I'd use PL/Perl or PL/Ruby).  Even
then I might use the other language only to write small functions
that a PL/pgSQL function could call.

As Merlin suggested, maybe Ben could tell us what he wants to do
that he thinks should be written in C or a language other than
PL/pgSQL.  Without knowing what problem is to be solved it's near
impossible to recommend an appropriate tool.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Speed of different procedural language

2005-12-21 Thread Steinar H. Gunderson
On Wed, Dec 21, 2005 at 02:24:42PM -0700, Michael Fuhr wrote:
> The difference is clear only in specific cases; just because you
> saw a 10x increase in some cases doesn't mean you can expect that
> kind of increase, or indeed any increase, in others. I've seen
> PL/pgSQL beat all other PL/* challengers handily many times,
> especially when the function does a lot of querying and looping
> through large result sets.

That's funny, my biggest problems with PL/PgSQL have been (among others)
exactly with large result sets...

Anyhow, the general idea is: It _does_ matter which one you use, so you'd
better test if it matters to you :-)

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Speed of different procedural language

2005-12-21 Thread Michael Fuhr
On Wed, Dec 21, 2005 at 10:38:10PM +0100, Steinar H. Gunderson wrote:
> On Wed, Dec 21, 2005 at 02:24:42PM -0700, Michael Fuhr wrote:
> > The difference is clear only in specific cases; just because you
> > saw a 10x increase in some cases doesn't mean you can expect that
> > kind of increase, or indeed any increase, in others. I've seen
> > PL/pgSQL beat all other PL/* challengers handily many times,
> > especially when the function does a lot of querying and looping
> > through large result sets.
> 
> That's funny, my biggest problems with PL/PgSQL have been (among others)
> exactly with large result sets...

Out of curiosity, do you have a simple test case?  I'd be interested
in seeing what you're doing in PL/pgSQL that's contradicting what
I'm seeing.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

2005-12-21 Thread William Yu

Juan Casero wrote:
Can you elaborate on the reasons the opteron is better than the Xeon when it 
comes to disk io?   I have a PostgreSQL 7.4.8 box running a DSS.   One of our 


Opterons have 64-bit IOMMU -- Xeons don't. That means in 64-bit mode, 
transfers to > 4GB, the OS must allocated the memory < 4GB, DMA to that 
block and then the CPU must do extra work in copying the memory to > 
4GB. Versus on the Opteron, it's done by the IO adaptor using DMA in the 
background.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] ORDER BY costs

2005-12-21 Thread Carlos Benkendorf
I restored the table in another database and repeated the analyze again with original column definitions (numeric):     With order by:  Sort  (cost=212634.30..213032.73 rows=159374 width=897) (actual time=9286.817..9865.030 rows=167710 loops=1)   Sort Key: anocalc, cadastro, codvencto, parcela   ->  Index Scan using pk_arript, pk_arript, pk_arript, pk_arript on arript  (cost=0.00..126604.64 rows=159374 width=897) (actual time=0.152..1062.664 rows=167710 loops=1) Index Cond: (((anocalc = 2005::numeric) AND (cadastro = 19::numeric) AND (codvencto = 0::numeric) AND (parcela >= 0::numeric)) OR ((anocalc = 2005::numeric) AND (cadastro = 19::numeric) AND (codvencto > 0::numeric)) OR ((anocalc = 2005::numeric) AND (cadastro > 19::numeric)) OR (anocalc > 2005::numeric)) Total runtime: 10086.884 ms(5 rows) 
    Without order by: Index Scan using pk_arript, pk_arript, pk_arript, pk_arript on arript  (cost=0.00..126604.64 rows=159374 width=897) (actual time=0.154..809.566 rows=167710 loops=1)   Index Cond: (((anocalc = 2005::numeric) AND (cadastro = 19::numeric) AND (codvencto = 0::numeric) AND (parcela >= 0::numeric)) OR ((anocalc = 2005::numeric) AND (cadastro = 19::numeric) AND (codvencto > 0::numeric)) OR ((anocalc = 2005::numeric) AND (cadastro > 19::numeric)) OR (anocalc > 2005::numeric)) Total runtime: 894.218 ms(3 rows)  Then I recreated the table and changed the primary key column type definitions to smallint, integer and bigint.     CREATE TABLE arript (    anocalc smallint NOT NULL,    cadastro integer  NOT NULL,    codvencto
 smallint NOT NULL,    parcela smallint NOT NULL,    inscimob character varying(18) NOT NULL,    codvencto2 smallint NOT NULL,    parcela2 smallint NOT NULL,    codpropr bigint    NOT NULL,    dtaven integer  NOT NULL,    anocalc2 smallint,    dtabase integer,    vvt numeric(14,2),    vvp numeric(14,2),...  ...     Now the new analyze:     With order by: Sort  (cost=180430.98..180775.10 rows=137649 width=826) (actual time=4461.524..5000.707 rows=167710 loops=1)   Sort Key: anocalc, cadastro, codvencto, parcela   ->&n
 bsp;
 Index Scan using pk_arript, pk_arript, pk_arript, pk_arript on arript  (cost=0.00..26.93 rows=137649 width=826) (actual time=0.142..763.255 rows=167710 loops=1) Index Cond: (((anocalc = 2005) AND (cadastro = 19) AND (codvencto = 0) AND (parcela >= 0)) OR ((anocalc = 2005) AND (cadastro = 19) AND (codvencto > 0)) OR ((anocalc = 2005) AND (cadastro > 19)) OR (anocalc > 2005)) Total runtime: 5222.729 ms(5 rows)        Without order by: Index Scan using pk_arript, pk_arript, pk_arript, pk_arript on arript  (cost=0.00..26.93 rows=137649 width=826) (actual time=0.135..505.250 rows=167710 loops=1)   Index Cond: (((anocalc = 2005) AND (cadastro = 19) AND (codvencto = 0) AND (parcela >= 0)) OR ((anocalc = 2005) AND (cadastro = 19) AND (codvencto > 0)) OR ((anocalc = 2005) AND (cadastro > 19)) OR (anocalc >
 2005)) Total runtime: 589.528 ms(3 rows)  Total runtime summary:Primary key columns defined with integer/smallint/bigint and select with order by: 5222.729 msPrimary key columns defined with integer/smallint/bigint and select without order by: 589.528 msPrimary key columns defined with numeric and select with order by: 10086.884 msPrimary key columns defined with numeric and select without order by: 894.218 ms     Using order by and integer/smallint/bigint (5222.729) is almost half total runtime of select over numeric columns (10086.884) but is still 6 x more from the numbers of the original select (without order by and number columns=894.218).     Is there something more that could be done? Planner cost constants?     Thanks very much in advance!<
 /DIV> 
    Benkendorf      
		 
Yahoo! doce lar. Faça do Yahoo! sua homepage.

Re: [PERFORM] Speed of different procedural language

2005-12-21 Thread Steinar H. Gunderson
On Wed, Dec 21, 2005 at 03:10:28PM -0700, Michael Fuhr wrote:
>> That's funny, my biggest problems with PL/PgSQL have been (among others)
>> exactly with large result sets...
> Out of curiosity, do you have a simple test case?  I'd be interested
> in seeing what you're doing in PL/pgSQL that's contradicting what
> I'm seeing.

I'm not sure if I have the code anymore (it was under 7.4 or 8.0), but it was
largely scanning through ~2 million rows once, noting differences from the
previous rows as it went.

In that case, I didn't benchmark against any of the other PL/* languages, but
it was pretty clear that even on a pretty speedy Opteron, it was CPU bound,
which it really shouldn't have been.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] CPU and RAM

2005-12-21 Thread Harry Jackson
I am currently using a dual Opteron (248) single core system (RAM
PC3200) and for a change I am finding that the bottleneck is not disk
I/O but CPU/RAM (not sure which). The reason for this is that the most
frequently accessed tables/indexes are all held in RAM and when
querying the database there is almost no disk activity which is great,
most of the time. However, the database is growing and this database
is supporting an OLTP system where the retrieval of the data is an
order of magnitude more important than the insertion and general
upkeep of the data. It supports a search engine[0] and contains a
reverse index, lexicon and the actual data table (currently just under
2Gb for the three tables and associated indexes).

At the moment everything is working OK but I am noticing an almost
linear increase in time to retrieve data from the database as the data
set increases in size. Clustering knocks the access times down by 25%
but it also knocks users off the website and can take up to 30 minutes
which is hardly an ideal scenario. I have also considered partitioning
the tables up using extendible hashing and tries to allocate the terms
in the index to the correct table but after some testing I noticed no
noticeable gain using this method which surprised me a bit.

The actual size of the database is not that big (4Gb) but I am
expecting this to increase to at least 20Gb over the next year or so.
This means that search times are going to jump dramatically which also
means the site becomes completely unusable. This also means that
although disk access is currently low I am eventually going to run out
of RAM and require a decent disk subsystem.

Do people have any recommendations as to what hardware would alleviate
my current CPU/RAM problem but with a mind to the future would still
be able to cope with heavy disk access. My budget is about £2300/$4000
which is not a lot of money when talking databases so suggestions of a
Sun Fire T2000 or similar systems will be treated with the utmost
disdain ;) unless you are about to give me one to keep.

--
Harry
http://www.hjackson.org
http://www.uklug.co.uk


Before anyone asks I have considered using tsearch2.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] MySQL is faster than PgSQL but a large margin in my program... any ideas why?

2005-12-21 Thread Madison Kelly

Hi all,

  On a user's request, I recently added MySQL support to my backup 
program which had been written for PostgreSQL exclusively until now. 
What surprises me is that MySQL is about 20%(ish) faster than PostgreSQL.


  Now, I love PostgreSQL and I want to continue recommending it as the 
database engine of choice but it is hard to ignore a performance 
difference like that.


  My program is a perl backup app that scans the content of a given 
mounted partition, 'stat's each file and then stores that data in the 
database. To maintain certain data (the backup, restore and display 
values for each file) I first read in all the data from a given table 
(one table per partition) into a hash, drop and re-create the table, 
then start (in PostgreSQL) a bulk 'COPY..' call through the 'psql' shell 
app.


  In MySQL there is no 'COPY...' equivalent so instead I generate a 
large 'INSERT INTO file_info_X (col1, col2, ... coln) VALUES (...), 
(blah) ... (blah);'. This doesn't support automatic quoting, obviously, 
so I manually quote my values before adding the value to the INSERT 
statement. I suspect this might be part of the performance difference?


  I take the total time needed to update a partition (load old data 
into hash + scan all files and prepare COPY/INSERT + commit new data) 
and devide by the number of seconds needed to get a score I call a 
'U.Rate). On average on my Pentium3 1GHz laptop I get U.Rate of ~4/500. 
On MySQL though I usually get a U.Rate of ~7/800.


  If the performace difference comes from the 'COPY...' command being 
slower because of the automatic quoting can I somehow tell PostgreSQL 
that the data is pre-quoted? Could the performance difference be 
something else?


  If it would help I can provide code samples. I haven't done so yet 
because it's a little convoluded. ^_^;


  Thanks as always!

Madison


Where the big performance concern is when

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Madison Kelly (Digimer)
   TLE-BU; The Linux Experience, Back Up
Main Project Page:  http://tle-bu.org
Community Forum:http://forum.tle-bu.org
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] MySQL is faster than PgSQL but a large margin in my program... any ideas why?

2005-12-21 Thread Stephen Frost
* Madison Kelly ([EMAIL PROTECTED]) wrote:
>   If the performace difference comes from the 'COPY...' command being 
> slower because of the automatic quoting can I somehow tell PostgreSQL 
> that the data is pre-quoted? Could the performance difference be 
> something else?

I doubt the issue is with the COPY command being slower than INSERTs
(I'd expect the opposite generally, actually...).  What's the table type
of the MySQL tables?  Is it MyISAM or InnoDB (I think those are the main
alternatives)?  IIRC, MyISAM doesn't do ACID and isn't transaction safe,
and has problems with data reliability (aiui, equivilant to doing 'fsync
= false' for Postgres).  InnoDB, again iirc, is transaction safe and
whatnot, and more akin to the default PostgreSQL setup.

I expect some others will comment along these lines too, if my response
isn't entirely clear. :)

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] Speed of different procedural language

2005-12-21 Thread Michael Fuhr
On Thu, Dec 22, 2005 at 02:08:23AM +0100, Steinar H. Gunderson wrote:
> On Wed, Dec 21, 2005 at 03:10:28PM -0700, Michael Fuhr wrote:
> >> That's funny, my biggest problems with PL/PgSQL have been (among others)
> >> exactly with large result sets...
> > Out of curiosity, do you have a simple test case?  I'd be interested
> > in seeing what you're doing in PL/pgSQL that's contradicting what
> > I'm seeing.
> 
> I'm not sure if I have the code anymore (it was under 7.4 or 8.0), but it was
> largely scanning through ~2 million rows once, noting differences from the
> previous rows as it went.
> 
> In that case, I didn't benchmark against any of the other PL/* languages, but
> it was pretty clear that even on a pretty speedy Opteron, it was CPU bound,
> which it really shouldn't have been.

Try looping through two million rows with PL/Perl or PL/Tcl and
you'll probably see significantly worse performance than with
PL/pgSQL -- so much worse that I'd be surprised to see those languages
make up the difference with whatever processing they'd be doing for
each row unless it was something they're particularly good at and
PL/pgSQL is particularly bad at.

In 8.1 PL/Perl has a couple of ways to fetch query results:
spi_exec_query to fetch all the rows at once into a single data
structure, and spi_query/spi_fetchrow to fetch the rows one at a
time.  In my tests with one million rows, spi_exec_query was around
8 times slower than a loop in PL/pgSQL, not to mention requiring a
lot of memory.  spi_query/spi_fetchrow was about 25 times slower
but didn't require the amount of memory that spi_exec_query did.
A PL/Tcl function that used spi_exec was about 10 times slower than
PL/pgSQL, or only slightly slower than PL/Perl and spi_exec_query.

If you didn't benchmark the two million row query, do you have an
example that you did benchmark?  I don't doubt that PL/Perl and
other langauges can do some things faster than PL/pgSQL, but looping
through large result sets doesn't seem to be one of them.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] MySQL is faster than PgSQL but a large margin in my program... any ideas why?

2005-12-21 Thread Kevin Brown
On Wednesday 21 December 2005 20:14, Stephen Frost wrote:
> * Madison Kelly ([EMAIL PROTECTED]) wrote:
> >   If the performace difference comes from the 'COPY...' command being
> > slower because of the automatic quoting can I somehow tell PostgreSQL
> > that the data is pre-quoted? Could the performance difference be
> > something else?
>
> I doubt the issue is with the COPY command being slower than INSERTs
> (I'd expect the opposite generally, actually...).  What's the table type
> of the MySQL tables?  Is it MyISAM or InnoDB (I think those are the main
> alternatives)?  IIRC, MyISAM doesn't do ACID and isn't transaction safe,
> and has problems with data reliability (aiui, equivilant to doing 'fsync
> = false' for Postgres).  InnoDB, again iirc, is transaction safe and
> whatnot, and more akin to the default PostgreSQL setup.
>
> I expect some others will comment along these lines too, if my response
> isn't entirely clear. :)

Is fsync() on in your postgres config?  If so, that's why you're slower.  The 
default is to have it on for stability (writes are forced to disk).  It is 
quite a bit slower than just allowing the write caches to do their job, but 
more stable.  MySQL does not force writes to disk.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

2005-12-21 Thread Juan Casero
I just sent my boss an email asking him for a Sun v20z with dual 2.2 Ghz 
opterons, 2 Gigs of RAM and RAID 1.  I would have liked a better server 
capable of RAID but that seems to be out of his budget right now.  Ok so I 
assume I get this Sun box.  Most likely I will go with Linux since it is a 
fair bet he doesn't want to pay for the Solaris 10 x86 license.  Although I 
kind of like the idea of using Solaris 10 x86 for this.   I will assume I 
need to install the x64 kernel that comes with say Fedora Core 4.  Should I 
run the Postgresql 8.x binaries in 32 bit mode or 64 bit mode?   My instinct 
tells me 64 bit mode is most efficient for our database size about 20 gigs 
right now but may grow to 100 gigs in a year or so.  I just finished loading 
a 20 gig database on a dual 900 Mhz Ultrasparc III system with 2 gigs of ram 
and about 768 megs of shared memory available for the posgresql server 
running Solaris 10.  The load has smoked a P4 3.2 Ghz system I am using also 
with 2 gigs of ram running postgresql 8.0.3.   I mean I started the sparc 
load after the P4 load.  The sparc load has finished already rebuilding the 
database from a pg_dump file but the P4 system is still going.  The p4 has 
1.3 Gigs of shared memory allocated to postgresql.  How about them apples?


Thanks,
Juan

On Wednesday 21 December 2005 18:57, William Yu wrote:
> Juan Casero wrote:
> > Can you elaborate on the reasons the opteron is better than the Xeon when
> > it comes to disk io?   I have a PostgreSQL 7.4.8 box running a DSS.   One
> > of our
>
> Opterons have 64-bit IOMMU -- Xeons don't. That means in 64-bit mode,
> transfers to > 4GB, the OS must allocated the memory < 4GB, DMA to that
> block and then the CPU must do extra work in copying the memory to >
> 4GB. Versus on the Opteron, it's done by the IO adaptor using DMA in the
> background.
>
> ---(end of broadcast)---
> TIP 6: explain analyze is your friend

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] MySQL is faster than PgSQL but a large margin in

2005-12-21 Thread Luke Lonergan
What version of postgres?

Copy has been substantially improved in bizgres and also in 8.1.
- Luke
--
Sent from my BlackBerry Wireless Device


-Original Message-
From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
To: pgsql-performance@postgresql.org 
Sent: Wed Dec 21 21:03:18 2005
Subject: [PERFORM] MySQL is faster than PgSQL but a large margin in my 
program... any ideas why?

Hi all,

   On a user's request, I recently added MySQL support to my backup 
program which had been written for PostgreSQL exclusively until now. 
What surprises me is that MySQL is about 20%(ish) faster than PostgreSQL.

   Now, I love PostgreSQL and I want to continue recommending it as the 
database engine of choice but it is hard to ignore a performance 
difference like that.

   My program is a perl backup app that scans the content of a given 
mounted partition, 'stat's each file and then stores that data in the 
database. To maintain certain data (the backup, restore and display 
values for each file) I first read in all the data from a given table 
(one table per partition) into a hash, drop and re-create the table, 
then start (in PostgreSQL) a bulk 'COPY..' call through the 'psql' shell 
app.

   In MySQL there is no 'COPY...' equivalent so instead I generate a 
large 'INSERT INTO file_info_X (col1, col2, ... coln) VALUES (...), 
(blah) ... (blah);'. This doesn't support automatic quoting, obviously, 
so I manually quote my values before adding the value to the INSERT 
statement. I suspect this might be part of the performance difference?

   I take the total time needed to update a partition (load old data 
into hash + scan all files and prepare COPY/INSERT + commit new data) 
and devide by the number of seconds needed to get a score I call a 
'U.Rate). On average on my Pentium3 1GHz laptop I get U.Rate of ~4/500. 
On MySQL though I usually get a U.Rate of ~7/800.

   If the performace difference comes from the 'COPY...' command being 
slower because of the automatic quoting can I somehow tell PostgreSQL 
that the data is pre-quoted? Could the performance difference be 
something else?

   If it would help I can provide code samples. I haven't done so yet 
because it's a little convoluded. ^_^;

   Thanks as always!

Madison


Where the big performance concern is when

-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
   Madison Kelly (Digimer)
TLE-BU; The Linux Experience, Back Up
Main Project Page:  http://tle-bu.org
Community Forum:http://forum.tle-bu.org
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

2005-12-21 Thread Tom Arthurs




AFAIK there are no licensing costs for solaris, unless you are talking
about a software support agreement, which is not required.

Juan Casero wrote:

  I just sent my boss an email asking him for a Sun v20z with dual 2.2 Ghz 
opterons, 2 Gigs of RAM and RAID 1.  I would have liked a better server 
capable of RAID but that seems to be out of his budget right now.  Ok so I 
assume I get this Sun box.  Most likely I will go with Linux since it is a 
fair bet he doesn't want to pay for the Solaris 10 x86 license.  Although I 
kind of like the idea of using Solaris 10 x86 for this.   I will assume I 
need to install the x64 kernel that comes with say Fedora Core 4.  Should I 
run the Postgresql 8.x binaries in 32 bit mode or 64 bit mode?   My instinct 
tells me 64 bit mode is most efficient for our database size about 20 gigs 
right now but may grow to 100 gigs in a year or so.  I just finished loading 
a 20 gig database on a dual 900 Mhz Ultrasparc III system with 2 gigs of ram 
and about 768 megs of shared memory available for the posgresql server 
running Solaris 10.  The load has smoked a P4 3.2 Ghz system I am using also 
with 2 gigs of ram running postgresql 8.0.3.   I mean I started the sparc 
load after the P4 load.  The sparc load has finished already rebuilding the 
database from a pg_dump file but the P4 system is still going.  The p4 has 
1.3 Gigs of shared memory allocated to postgresql.  How about them apples?


Thanks,
Juan

On Wednesday 21 December 2005 18:57, William Yu wrote:
  
  
Juan Casero wrote:


  Can you elaborate on the reasons the opteron is better than the Xeon when
it comes to disk io?   I have a PostgreSQL 7.4.8 box running a DSS.   One
of our
  

Opterons have 64-bit IOMMU -- Xeons don't. That means in 64-bit mode,
transfers to > 4GB, the OS must allocated the memory < 4GB, DMA to that
block and then the CPU must do extra work in copying the memory to >
4GB. Versus on the Opteron, it's done by the IO adaptor using DMA in the
background.

---(end of broadcast)---
TIP 6: explain analyze is your friend

  
  
---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


  





Re: [PERFORM] Speed of different procedural language

2005-12-21 Thread Tom Lane
Michael Fuhr <[EMAIL PROTECTED]> writes:
> Try looping through two million rows with PL/Perl or PL/Tcl and
> you'll probably see significantly worse performance than with
> PL/pgSQL -- so much worse that I'd be surprised to see those languages
> make up the difference with whatever processing they'd be doing for
> each row unless it was something they're particularly good at and
> PL/pgSQL is particularly bad at.

I'd expect plpgsql to suck at purely computational tasks, compared to
the other PLs, but to win at tasks involving database access.  These
are two sides of the same coin really --- plpgsql is tightly tied to the
PG query execution engine, to the extent of using it even for simply
adding 2 and 2, but that also gives it relatively low overhead for
invoking a database query.  Perl, Tcl, et al have their own
computational engines and can easily beat the PG SQL engine for simple
arithmetic and string-pushing.  But they pay a high overhead for
calling back into the database engine.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] CPU and RAM

2005-12-21 Thread Qingqing Zhou

On Thu, 22 Dec 2005, Harry Jackson wrote:

> I am currently using a dual Opteron (248) single core system (RAM
> PC3200) and for a change I am finding that the bottleneck is not disk
> I/O but CPU/RAM (not sure which). The reason for this is that the most
> frequently accessed tables/indexes are all held in RAM and when
> querying the database there is almost no disk activity which is great,
> most of the time.
>
> At the moment everything is working OK but I am noticing an almost
> linear increase in time to retrieve data from the database as the data
> set increases in size. Clustering knocks the access times down by 25%

Let's find out what's going on first. Can you find out the most expensive
query. Also, according to you what you said: (1) execution time is linear
to data set size (2) no disk IO - so why cluster will improve 25%?

Regards,
Qingqing

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

2005-12-21 Thread Jignesh K. Shah

Hi Juan,

Solaris 10 license is for free.. Infact I believe you do receive the 
media with Sun Fire V20z. If you want support then there are various 
"pay" plans depending on the level of support. If not your license 
allows you Right to Use anyway for free.


That said I haven't done much testing with 32/64 bit differences. 
However for long term purposes, 64-bit always seems to be the safe bet. 
As for your load performance, lot of it depends on your file system 
layout also.


Regards,
Jignesh



Juan Casero wrote:

I just sent my boss an email asking him for a Sun v20z with dual 2.2 Ghz 
opterons, 2 Gigs of RAM and RAID 1.  I would have liked a better server 
capable of RAID but that seems to be out of his budget right now.  Ok so I 
assume I get this Sun box.  Most likely I will go with Linux since it is a 
fair bet he doesn't want to pay for the Solaris 10 x86 license.  Although I 
kind of like the idea of using Solaris 10 x86 for this.   I will assume I 
need to install the x64 kernel that comes with say Fedora Core 4.  Should I 
run the Postgresql 8.x binaries in 32 bit mode or 64 bit mode?   My instinct 
tells me 64 bit mode is most efficient for our database size about 20 gigs 
right now but may grow to 100 gigs in a year or so.  I just finished loading 
a 20 gig database on a dual 900 Mhz Ultrasparc III system with 2 gigs of ram 
and about 768 megs of shared memory available for the posgresql server 
running Solaris 10.  The load has smoked a P4 3.2 Ghz system I am using also 
with 2 gigs of ram running postgresql 8.0.3.   I mean I started the sparc 
load after the P4 load.  The sparc load has finished already rebuilding the 
database from a pg_dump file but the P4 system is still going.  The p4 has 
1.3 Gigs of shared memory allocated to postgresql.  How about them apples?



Thanks,
Juan

On Wednesday 21 December 2005 18:57, William Yu wrote:
 


Juan Casero wrote:
   


Can you elaborate on the reasons the opteron is better than the Xeon when
it comes to disk io?   I have a PostgreSQL 7.4.8 box running a DSS.   One
of our
 


Opterons have 64-bit IOMMU -- Xeons don't. That means in 64-bit mode,
transfers to > 4GB, the OS must allocated the memory < 4GB, DMA to that
block and then the CPU must do extra work in copying the memory to >
4GB. Versus on the Opteron, it's done by the IO adaptor using DMA in the
background.

---(end of broadcast)---
TIP 6: explain analyze is your friend
   



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly
 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] MySQL is faster than PgSQL but a large margin in my

2005-12-21 Thread Madison Kelly

Stephen Frost wrote:

* Madison Kelly ([EMAIL PROTECTED]) wrote:

 If the performace difference comes from the 'COPY...' command being 
slower because of the automatic quoting can I somehow tell PostgreSQL 
that the data is pre-quoted? Could the performance difference be 
something else?



I doubt the issue is with the COPY command being slower than INSERTs
(I'd expect the opposite generally, actually...).  What's the table type
of the MySQL tables?  Is it MyISAM or InnoDB (I think those are the main
alternatives)?  IIRC, MyISAM doesn't do ACID and isn't transaction safe,
and has problems with data reliability (aiui, equivilant to doing 'fsync
= false' for Postgres).  InnoDB, again iirc, is transaction safe and
whatnot, and more akin to the default PostgreSQL setup.

I expect some others will comment along these lines too, if my response
isn't entirely clear. :)

Stephen


Ah, that makes a lot of sense (I read about the 'fsync' issue before, 
now that you mention it). I am not too familiar with MySQL but IIRC 
MyISAM is their open-source DB and InnoDB is their commercial one, ne? 
If so, then I am running MyISAM.


  Here is the MySQL table. The main difference from the PostgreSQL 
table is that the 'varchar(255)' columns are 'text' columns in PostgreSQL.


mysql> DESCRIBE file_info_1;
+-+--+--+-+-+---+
| Field   | Type | Null | Key | Default | Extra |
+-+--+--+-+-+---+
| file_group_name | varchar(255) | YES  | | NULL|   |
| file_group_uid  | int(11)  |  | | 0   |   |
| file_mod_time   | bigint(20)   |  | | 0   |   |
| file_name   | varchar(255) |  | | |   |
| file_parent_dir | varchar(255) |  | MUL | |   |
| file_perm   | int(11)  |  | | 0   |   |
| file_size   | bigint(20)   |  | | 0   |   |
| file_type   | char(1)  |  | | |   |
| file_user_name  | varchar(255) | YES  | | NULL|   |
| file_user_uid   | int(11)  |  | | 0   |   |
| file_backup | char(1)  |  | MUL | i   |   |
| file_display| char(1)  |  | | i   |   |
| file_restore| char(1)  |  | | i   |   |
+-+--+--+-+-+---+

  I will try turning off 'fsync' on my test box to see how much of a 
performance gain I get and to see if it is close to what I am getting 
out of MySQL. If that does turn out to be the case though I will be able 
to comfortably continue recommending PostgreSQL from a stability point 
of view.


Thanks!!

Madison

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Madison Kelly (Digimer)
   TLE-BU; The Linux Experience, Back Up
Main Project Page:  http://tle-bu.org
Community Forum:http://forum.tle-bu.org
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] MySQL is faster than PgSQL but a large margin in

2005-12-21 Thread Madison Kelly

Luke Lonergan wrote:

What version of postgres?

Copy has been substantially improved in bizgres and also in 8.1.
- Luke


Currently 7.4 (what comes with Debian Sarge). I have run my program on 
8.0 but not since I have added MySQL support. I should run the tests on 
the newer versions of both DBs (using v4.1 for MySQL which is also 
mature at this point).


As others mentioned though, so far the most likely explanation is the 
'fsync' being enabled on PostgreSQL.


Thanks for the reply!

Madison

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Madison Kelly (Digimer)
   TLE-BU; The Linux Experience, Back Up
Main Project Page:  http://tle-bu.org
Community Forum:http://forum.tle-bu.org
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] MySQL is faster than PgSQL but a large margin in

2005-12-21 Thread Luke Lonergan
Madison,

On 12/21/05 10:58 PM, "Madison Kelly" <[EMAIL PROTECTED]> wrote:

> Ah, that makes a lot of sense (I read about the 'fsync' issue before,
> now that you mention it). I am not too familiar with MySQL but IIRC
> MyISAM is their open-source DB and InnoDB is their commercial one, ne?
> If so, then I am running MyISAM.

You can run either storage method with MySQL, I expect the default is
MyISAM.

COPY performance with or without fsync was sped up recently nearly double in
Postgresql.  The Bizgres version (www.bizgres.org, www.greenplum.com) is the
fastest, Postgres 8.1.1 is close, depending on how fast your disk I/O is (as
I/O speed increases Bizgres gets faster).

fsync isn't really an "issue" and I'd suggest you not run without it! We've
found that "fdatasync" as the wal sync method is actually a bit faster than
fsync if you want a bit better speed.

So, I'd recommend you upgrade to either bizgres or Postgres 8.1.1 to get the
maximum COPY speed.

>Here is the MySQL table. The main difference from the PostgreSQL
> table is that the 'varchar(255)' columns are 'text' columns in PostgreSQL.

Shouldn't matter.
 
> mysql> DESCRIBE file_info_1;
> +-+--+--+-+-+---+
> | Field   | Type | Null | Key | Default | Extra |
> +-+--+--+-+-+---+
> | file_group_name | varchar(255) | YES  | | NULL|   |
> | file_group_uid  | int(11)  |  | | 0   |   |
> | file_mod_time   | bigint(20)   |  | | 0   |   |
> | file_name   | varchar(255) |  | | |   |
> | file_parent_dir | varchar(255) |  | MUL | |   |
> | file_perm   | int(11)  |  | | 0   |   |
> | file_size   | bigint(20)   |  | | 0   |   |
> | file_type   | char(1)  |  | | |   |
> | file_user_name  | varchar(255) | YES  | | NULL|   |
> | file_user_uid   | int(11)  |  | | 0   |   |
> | file_backup | char(1)  |  | MUL | i   |   |
> | file_display| char(1)  |  | | i   |   |
> | file_restore| char(1)  |  | | i   |   |
> +-+--+--+-+-+---+

What's a bigint(20)?  Are you using "numeric" in Postgresql?
 
>I will try turning off 'fsync' on my test box to see how much of a
> performance gain I get and to see if it is close to what I am getting
> out of MySQL. If that does turn out to be the case though I will be able
> to comfortably continue recommending PostgreSQL from a stability point
> of view.

Again - fsync is a small part of the performance - you will need to run
either Postgres 8.1.1 or Bizgres to get good COPY speed.

- Luke



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] MySQL is faster than PgSQL but a large margin in

2005-12-21 Thread Luke Lonergan
Madison,


On 12/21/05 11:02 PM, "Madison Kelly" <[EMAIL PROTECTED]> wrote:

> Currently 7.4 (what comes with Debian Sarge). I have run my program on
> 8.0 but not since I have added MySQL support. I should run the tests on
> the newer versions of both DBs (using v4.1 for MySQL which is also
> mature at this point).

Yes, this is *definitely* your problem.  Upgrade to Postgres 8.1.1 or
Bizgres 0_8_1 and your COPY speed could double without even changing fsync
(depending on your disk speed).  We typically get 12-14MB/s from Bizgres on
Opteron CPUs and disk subsystems that can write at least 60MB/s.  This means
you can load 100GB in 2 hours.

Note that indexes will also slow down loading.
 
- Luke



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster