[PERFORM] System overload / context switching / oom, 8.3

2010-02-02 Thread Rob
 total-cpu-usage
run blk new|__in_ _out_|_read _writ|_int_ _csw_|usr sys idl wai hiq siq

SPECS:

PostgreSQL 8.3.9 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2
20061115 (prerelease) (Debian 4.1.1-21)
Installed from the debian etch-backports package.

Linux 2.6.18-6-686-bigmem #1 SMP Thu Nov 5 17:30:05 UTC 2009 i686
GNU/Linux (Debian Etch)

8 MB RAM
4 Quad Core Intel(R) Xeon(R) CPU   E5440  @ 2.83GHz stepping 06
L1 I cache: 32K, L1 D cache: 32K,  L2 cache: 6144K

LSI Logic SAS based MegaRAID driver (batter backed/write cache enabled)
Dell PERC 6/i
# 8 SEAGATE   Model: ST973451SS Rev: SM04  (72 GB) ANSI SCSI revision: 05

RAID Configuration:
sda RAID1  2 disks (with pg_xlog wal files on it's own partition)
sdb RAID10 6 disks (pg base dir only)

POSTGRES:

261 databases
238 active databases (w/connection processes)
863 connections to those 238 databases

postgresql.conf:
max_connections = 1100
shared_buffers = 800MB
max_prepared_transactions = 0
work_mem = 32MB
maintenance_work_mem = 64MB
max_fsm_pages = 330
max_fsm_relations = 1
vacuum_cost_delay = 50ms
bgwriter_delay = 150ms
bgwriter_lru_maxpages = 250
bgwriter_lru_multiplier = 2.5
wal_buffers = 8MB
checkpoint_segments = 32
checkpoint_timeout = 5min
checkpoint_completion_target = 0.9
effective_cache_size = 5000MB
default_statistics_target = 100
log_min_duration_statement = 1000
log_checkpoints = on
log_connections = on
log_disconnections = on
log_temp_files = 0
track_counts = on
autovacuum = on
log_autovacuum_min_duration = 0

Thanks for any ideas!
Rob



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] System overload / context switching / oom, 8.3

2010-02-02 Thread Rob
Scott Marlowe wrote:
> On Tue, Feb 2, 2010 at 12:11 PM, Rob  wrote:
>   
>> postgresql.conf:
>> max_connections = 1100
>> work_mem = 32MB
>> 
>
> 32MB * 1000 = 32,000MB...  And that's if you max out connections and
> they each only do 1 sort.  If you're running many queries that run > 1
> sorts it'll happen a lot sooner.
>
> Either drop max connections or work_mem is what I'd do to start with.
> If you have one or two reporting apps that need it higher, then set it
> higher for just those connections / users

Thanks much.  So does dropping work_mem to the default of 1MB sound good?

By moving databases around we're getting max_connections below 600 or 700.



Re: [PERFORM] System overload / context switching / oom, 8.3

2010-02-02 Thread Rob
Kevin Grittner wrote:
> Rob  wrote:
>  
>   
>> 8gb ram
>> ~240 active databases
>> 800+ db connections via tcp.
>> 
>  
> 8 GB RAM divided by 800 DB connections is 10 MB per connection.  You
> seriously need to find some way to use connection pooling.  I'm not
> sure the best way to do that with 240 active databases.
>   

By wrangling the applications, We've got the number of connections down
to 530 and number of active databases down to 186.

The application's poor connection management exacerbates the problem.

Thanks for the idea,
Rob



[PERFORM] Identical Queries

2007-03-01 Thread Rob Schall
Question for anyone...

I tried posting to the bugs, and they said this is a better question for here.
I have to queries. One runs in about 2 seconds. The other takes upwards
of 2 minutes. I have a temp table that is created with 2 columns. This
table is joined with the larger database of call detail records.
However, these 2 queries are handled very differently.

The queries:
First

calldetail=> EXPLAIN SELECT current.* FROM current JOIN anitmp ON
current.destnum=anitmp.ani AND istf=true;
  QUERY PLAN
--
 Nested Loop  (cost=0.00..2026113.09 rows=500908 width=108)
   ->  Seq Scan on anitmp  (cost=0.00..33.62 rows=945 width=8)
 Filter: (istf = true)
   ->  Index Scan using i_destnum on current  (cost=0.00..2137.36
rows=531 width=108)
 Index Cond: (current.destnum = "outer".ani)
(5 rows)

Second
calldetail=> EXPLAIN SELECT current.* FROM current JOIN anitmp ON
current.orignum=anitmp.ani AND istf=false;
QUERY PLAN
---
 Hash Join  (cost=35.99..3402035.53 rows=5381529 width=108)
   Hash Cond: ("outer".orignum = "inner".ani)
   ->  Seq Scan on current  (cost=0.00..907191.05 rows=10170805 width=108)
   ->  Hash  (cost=33.62..33.62 rows=945 width=8)
 ->  Seq Scan on anitmp  (cost=0.00..33.62 rows=945 width=8)
   Filter: (istf = false)
(6 rows)


The tables:
   Table "public.current"
  Column  |Type | Modifiers
--+-+---
 datetime | timestamp without time zone |
 orignum  | bigint  |
 destnum  | bigint  |
 billto   | bigint  |
 cost | numeric(6,4)|
 duration | numeric(8,1)|
 origcity | character(12)   |
 destcity | character(12)   |
 file | character varying(30)   |
 linenum  | integer |
 carrier  | character(1)|
Indexes:
"i_destnum" btree (destnum)
"i_orignum" btree (orignum)


Table "public.anitmp"
 Column |  Type   | Modifiers
+-+---
 ani| bigint  |
 istf   | boolean |


I was also asked to post the EXPLAIN ANALYZE for both:

calldetail=> EXPLAIN ANALYZE SELECT current.* FROM anitmp JOIN current ON 
istf=false AND current.orignum=anitmp.ani;
QUERY PLAN
---
 Hash Join  (cost=35.99..3427123.39 rows=5421215 width=108) (actual 
time=1994.164..157443.544 rows=157 loops=1)
   Hash Cond: ("outer".orignum = "inner".ani)
   ->  Seq Scan on current  (cost=0.00..913881.09 rows=10245809 width=108) 
(actual time=710.986..137963.320 rows=10893541 loops=1)
   ->  Hash  (cost=33.62..33.62 rows=945 width=8) (actual time=10.948..10.948 
rows=0 loops=1)
 ->  Seq Scan on anitmp  (cost=0.00..33.62 rows=945 width=8) (actual 
time=10.934..10.939 rows=2 loops=1)
   Filter: (istf = false)
 Total runtime: 157443.900 ms
(7 rows)

calldetail=> EXPLAIN ANALYZE SELECT current.* FROM current JOIN anitmp ON 
current.destnum=anitmp.ani AND istf=true;
 QUERY PLAN
-
 Nested Loop  (cost=0.00..2037526.69 rows=504602 width=108) (actual 
time=88.752..1050.295 rows=1445 loops=1)
   ->  Seq Scan on anitmp  (cost=0.00..33.62 rows=945 width=8) (actual 
time=8.189..8.202 rows=2 loops=1)
 Filter: (istf = true)
   ->  Index Scan using i_destnum on current  (cost=0.00..2149.40 rows=534 
width=108) (actual time=62.365..517.454 rows=722 loops=2)
 Index Cond: (current.destnum = "outer".ani)
 Total runtime: 1052.862 ms
(6 rows)


Anyone have any ideas for me? I have indexes on each of the necessary
columns.

Rob



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Identical Queries

2007-03-01 Thread Rob Schall
There are 4 entries (wanted to make the playing field level for this
test). There are 2 with true for istf and 2 with false.

Rob


Stephan Szabo wrote:
> On Thu, 1 Mar 2007, Rob Schall wrote:
>
>   
>> Question for anyone...
>>
>> I tried posting to the bugs, and they said this is a better question for 
>> here.
>> I have to queries. One runs in about 2 seconds. The other takes upwards
>> of 2 minutes. I have a temp table that is created with 2 columns. This
>> table is joined with the larger database of call detail records.
>> However, these 2 queries are handled very differently.
>> 
>
> How many rows are there in anitmp and how many rows in anitmp have
> istf=true and how many have istf=false? If you don't currently analyze the
> temp table after adding the rows, you might find that doing an analyze
> helps, or at least makes the row estimates better.
>   


---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Building multiple indexes concurrently

2010-03-16 Thread Rob Wultsch
Lets say I have a large table bigTable to which I would like to add
two btree indexes. Is there a more efficient way to create indexes
than:
CREATE INDEX idx_foo on bigTable (foo);
CREATE INDEX idx_baz on bigTable (baz);
Or
CREATE INDEX CONCURRENTLY idx_foo on bigTable (foo);
CREATE INDEX CONCURRENTLY idx_baz on bigTable (baz);

Are there any particular performance optimizations that would be in
play in such a scenario?

At a minimum I assume that if both of the commands were started at
about the same time they would each scan the table in the same
direction and whichever creation was slower would benefit from most of
the table data it needed being prepopulated in shared buffers. Is this
the case?

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Building multiple indexes concurrently

2010-03-17 Thread Rob Wultsch
On Wed, Mar 17, 2010 at 7:30 AM, Tom Lane  wrote:
> Greg Smith  writes:
>> Rob Wultsch wrote:
>>> At a minimum I assume that if both of the commands were started at
>>> about the same time they would each scan the table in the same
>>> direction and whichever creation was slower would benefit from most of
>>> the table data it needed being prepopulated in shared buffers. Is this
>>> the case?
>
>> This might be optimistic;
>
> No, it's not optimistic in the least, at least not since we implemented
> synchronized seqscans (in 8.3 or thereabouts).
>
>                        regards, tom lane
>

Where can I find details about this in the documentation?

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Random Page Cost and Planner

2010-05-25 Thread Rob Wultsch
On Tue, May 25, 2010 at 4:26 PM, David Jarvis  wrote:
> shared_buffers = 1GB
> temp_buffers = 32MB
> work_mem = 32MB
> maintenance_work_mem = 64MB
> effective_cache_size = 256MB

Shouldn't effective_cache_size be significantly larger?

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance of temporary vs. regular tables

2010-05-28 Thread Rob Wultsch
On Fri, May 28, 2010 at 4:04 AM, Joachim Worringen
 wrote:
> On 05/26/2010 06:03 PM, Joachim Worringen wrote:
>>
>> Am 25.05.2010 12:41, schrieb Andres Freund:
>>>
>>> On Tuesday 25 May 2010 11:00:24 Joachim Worringen wrote:
>>>>
>>>> Thanks. So, the Write-Ahead-Logging (being used or not) does not matter?
>>>
>>> It does matter quite significantly in my experience. Both from an io
>>> and a cpu
>>> overhead perspective.
>>
>> O.k., looks as if I have to make my own experience... I'll let you know
>> if possible.
>
> As promised, I did a tiny benchmark - basically, 8 empty tables are filled
> with 100k rows each within 8 transactions (somewhat typically for my
> application). The test machine has 4 cores, 64G RAM and RAID1 10k drives for
> data.
>
> # INSERTs into a TEMPORARY table:
> [joac...@testsrv scaling]$ time pb query -d scaling_qry_1.xml
>
> real    3m18.242s
> user    1m59.074s
> sys     1m51.001s
>
> # INSERTs into a standard table:
> [joac...@testsrv scaling]$ time pb query -d scaling_qry_1.xml
>
> real    3m35.090s
> user    2m5.295s
> sys     2m2.307s
>
> Thus, there is a slight hit of about 10% (which may even be within
> meausrement variations) - your milage will vary.
>
>  Joachim
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
I think it would be interesting to create a ram disk and insert into
it. In the MySQL community even thought MyISAM has fallen out of use
the Memory table (based on MyISAM) is still somewhat used.


-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-24 Thread Rob Wultsch
On Fri, Jun 18, 2010 at 1:55 PM, Josh Berkus  wrote:
>
>> It must be a setting, not a version.
>>
>> For instance suppose you have a session table for your website and a
>> users table.
>>
>> - Having ACID on the users table is of course a must ;
>> - for the sessions table you can drop the "D"
>
> You're trying to solve a different use-case than the one I am.
>
> Your use-case will be solved by global temporary tables.  I suggest that
> you give Robert Haas some help & feedback on that.
>
> My use case is people using PostgreSQL as a cache, or relying entirely
> on replication for durability.
>
> --
>                                  -- Josh Berkus
>                                     PostgreSQL Experts Inc.
>                                     http://www.pgexperts.com
>


Is he? Wouldn't a global temporary table have content that is not
visible between db connections? A db session many not be the same as a
user session.

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance on new linux box

2010-07-07 Thread Rob Wultsch
On Wed, Jul 7, 2010 at 4:06 PM, Ryan Wexler  wrote:
> Postgresql was previously running on a single cpu linux machine with 2 gigs
> of memory and a single sata drive (v8.3).  Basically a desktop with linux on
> it.  I experienced slow performance.
>
> So, I finally moved it to a real server.  A dually zeon centos machine with
> 6 gigs of memory and raid 10, postgres 8.4.  But, I am now experiencing even
> worse performance issues.
>
> My system is consistently highly transactional.  However, there is also
> regular complex queries and occasional bulk loads.
>
> On the new system the bulk loads are extremely slower than on the previous
> machine and so are the more complex queries.  The smaller transactional
> queries seem comparable but i had expected an improvement.  Performing a db
> import via psql -d databas -f dbfile illustrates this problem.  It takes 5
> hours to run this import.  By contrast, if I perform this same exact import
> on my crappy windows box with only 2 gigs of memory and default postgres
> settings it takes 1 hour.  Same deal with the old linux machine.  How is
> this possible?
>
> Here are some of my key config settings:
> max_connections = 100
> shared_buffers = 768MB
> effective_cache_size = 2560MB
> work_mem = 16MB
> maintenance_work_mem = 128MB
> checkpoint_segments = 7
> checkpoint_timeout = 7min
> checkpoint_completion_target = 0.5
>
> I have tried varying the shared_buffers size from 128 all the way to 1500mbs
> and got basically the same result.   Is there a setting change I should be
> considering?
>
> Does 8.4 have performance problems or is this unique to me?
>
> thanks
>
>

I think the most likely explanation is that the crappy box lied about
fsync'ing data and your server is not. Did you purchase a raid card
with a bbu? If so, can you set the write cache policy to write-back?

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] now() gives same time within the session

2010-07-12 Thread Rob Wultsch
On Mon, Jul 12, 2010 at 4:15 AM, A. Kretschmer
 wrote:
> In response to atul.g...@globaldatapoint.com :
>> Hi,
>>
>>
>>
>> I need to log the start and end time of the procedures in a table. But the
>> start and end time are same. This is how I recreated the issue.
>>
>>
>>
>> create table test_time (time timestamp);
>>
>> delete from  test_time;
>>
>> insert into test_time select now();
>
>
> Use timeofday() instead, now() returns the transaction starting time.


Is this part of the SQL standard?

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Using more tha one index per table

2010-07-21 Thread Rob Wultsch
On Wed, Jul 21, 2010 at 12:53 AM, A. Kretschmer <
andreas.kretsch...@schollglas.com> wrote:

> In response to Elias Ghanem :
> > Hi,
> > I have a question concerning the uses of indexes in Postgresql.
> > I red that in PG a query can not use more than one index per table: "a
> query or
> > data manipulation command can use at most one index per table".
>
> That's not true, but it's true for MySQL, afaik.
>
>
> Andreas
> --
> Andreas Kretschmer
> Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
> GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

That is not true either, though MySQL is less good at using bitmap'ed
indexes. 5.0 can use "merge indexes",

-- 
Rob Wultsch
wult...@gmail.com


Re: [PERFORM] Using more tha one index per table

2010-07-22 Thread Rob Wultsch
On Thu, Jul 22, 2010 at 1:35 AM, Richard Huxton  wrote:

> On 22/07/10 03:27, Greg Smith wrote:
>
>> Steve Atkins wrote:
>>
>>> If http://postgresql.org/docs/9.0/* were to 302 redirect to
>>> http://postgresql.org/docs/current/* while 9.0 is the current release
>>> (and similarly for 9.1 and so on) I suspect we'd find many more links
>>> to current and fewer links to specific versions after a year or two.
>>>
>>
>> True, but this would leave people with no way to bookmark a permanent
>> link to whatever is the current version, which will represent a
>> regression for how some people want the site to work.
>>
>
> Having a quick look at the website, a simple change might be to have a
> large "CURRENT MANUALS" link above all the versioned links. That should help
> substantially.
>
> --
>  Richard Huxton
>  Archonet Ltd
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

I suggested a few weeks ago adding a drop down menu for other version of the
manual for a page. I have not had time to write a patch, but I think it is
something that MySQL does better that pg.

As an example take a look at the page on select for MySQL:
http://dev.mysql.com/doc/refman/5.1/en/select.html .

If you want a earlier or later version they are easily accessible via a link
on the left.


-- 
Rob Wultsch
wult...@gmail.com


Re: [PERFORM] BBU Cache vs. spindles

2010-10-22 Thread Rob Wultsch
On Fri, Oct 22, 2010 at 8:37 AM, Greg Smith  wrote:
> Tom Lane wrote:
>>
>> You've got entirely too simplistic a view of what the "delta" might be,
>> I fear.  In particular there are various sorts of changes that involve
>> inserting the data carried in the WAL record and shifting pre-existing
>> data around to make room, or removing an item and moving remaining data
>> around.  If you try to replay that type of action against a torn page,
>> you'll get corrupted results.
>>
>
> I wasn't sure exactly how those were encoded, thanks for the clarification.
>  Given that, it seems to me there are only two situations where
> full_page_writes is safe to turn off:
>
> 1) The operating system block size is exactly the same database block size,
> and all writes are guaranteed to be atomic to that block size.
> 2) You're using a form of journaled filesystem where data blocks are never
> updated, they're always written elsewhere and the filesystem is redirected
> to that new block once it's on disk.
>
> Looks to me like whether or not there's a non-volatile write cache sitting
> in the middle, like a BBU protected RAID card, doesn't really make any
> difference here then.
>
> I think that most people who have thought they were safe to turn off
> full_page_writes in the past did so because they believed they were in
> category (1) here.  I've never advised anyone to do that, because it's so
> difficult to validate the truth of.  Just given that, I'd be tempted to join
> in on suggesting this parameter just go away in the name of safety, except
> that I think category (2) here is growing now.  ZFS is the most obvious
> example where the atomic write implementation seems to always make disabling
> full_page_writes safe.
>

For the sake of argument, has PG considered using a double write
buffer similar to InnodB?


-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] BBU Cache vs. spindles

2010-10-22 Thread Rob Wultsch
On Fri, Oct 22, 2010 at 10:28 AM, Kevin Grittner
 wrote:
> Rob Wultsch  wrote:
>
>> has PG considered using a double write buffer similar to InnodB?
>
> That seems inferior to the full_page_writes strategy, where you only
> write a page twice the first time it is written after a checkpoint.
> We're talking about when we might be able to write *less*, not more.
>
> -Kevin
>

By "write" do you mean number of writes, or the number of bytes of the
writes? For number of writes, yes a double write buffer will lose. In
terms of number of bytes, I would think full_page_writes=off + double
write buffer should be far superior, particularly given that the WAL
is shipped over the network to slaves.

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] BBU Cache vs. spindles

2010-10-22 Thread Rob Wultsch
On Fri, Oct 22, 2010 at 12:05 PM, Kevin Grittner
 wrote:
> Rob Wultsch  wrote:
>
>> I would think full_page_writes=off + double write buffer should be
>> far superior, particularly given that the WAL is shipped over the
>> network to slaves.
>
> For a reasonably brief description of InnoDB double write buffers, I
> found this:
>
> http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/
>
> One big question before even considering this would by how to
> determine whether a potentially torn page "is inconsistent".
> Without a page CRC or some such mechanism, I don't see how this
> technique is possible.
>
> Even if it's possible, it's far from clear to me that it would be an
> improvement.  The author estimates (apparently somewhat loosely)
> that it's a 5% to 10% performance hit in InnoDB; I'm far from
> certain that full_page_writes cost us that much.  Does anyone have
> benchmark numbers handy?
>
> -Kevin
>

Ignoring (briefly) the cost in terms of performance of the different
system, not needing full_page_writes would make geographically
dispersed replication possible for certain cases where it is not
currently (or at least rather painful).

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] BBU Cache vs. spindles

2010-10-22 Thread Rob Wultsch
On Fri, Oct 22, 2010 at 1:15 PM, Kevin Grittner
 wrote:
> Rob Wultsch  wrote:
>
>> not needing full_page_writes would make geographically dispersed
>> replication possible for certain cases where it is not currently
>> (or at least rather painful).
>
> Do you have any hard numbers on WAL file size impact?  How much does
> pglesslog help in a file-based WAL transmission environment?  Should
> we be considering similar filtering for streaming replication?
>
> -Kevin
>

No, I am DBA that mostly works on MySQL. I have had to deal with
(handwaving...) tangential issues recently. I really would like to
work with PG more and this seems like it would be a significant
hindrance for certain usage patterns. Lots of replication does not
take place over gig...


-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] BBU Cache vs. spindles

2010-10-26 Thread Rob Wultsch
On Tue, Oct 26, 2010 at 5:41 AM, Robert Haas  wrote:
> On Fri, Oct 22, 2010 at 3:05 PM, Kevin Grittner
>  wrote:
>> Rob Wultsch  wrote:
>>
>>> I would think full_page_writes=off + double write buffer should be
>>> far superior, particularly given that the WAL is shipped over the
>>> network to slaves.
>>
>> For a reasonably brief description of InnoDB double write buffers, I
>> found this:
>>
>> http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/
>>
>> One big question before even considering this would by how to
>> determine whether a potentially torn page "is inconsistent".
>> Without a page CRC or some such mechanism, I don't see how this
>> technique is possible.
>
> There are two sides to this problem: figuring out when to write a page
> to the double write buffer, and figuring out when to read it back from
> the double write buffer.  The first seems easy: we just do it whenever
> we would XLOG a full page image.  As to the second, when we write the
> page out to the double write buffer, we could also write to the double
> write buffer the LSN of the WAL record which depends on that full page
> image.  Then, at the start of recovery, we scan the double write
> buffer and remember all those LSNs.  When we reach one of them, we
> replay the full page image.
>
> The good thing about this is that it would reduce WAL volume; the bad
> thing about it is that it would probably mean doing two fsyncs where
> we only now do one.
>

The double write buffer is one of the few areas where InnoDB does more
IO (in the form of fsynch's) than PG. InnoDB also has fuzzy
checkpoints (which help to keep dirty pages in memory longer),
buffering of writing out changes to secondary indexes, and recently
tunable page level compression.

Given that InnoDB is not shipping its logs across the wire, I don't
think many users would really care if it used the double writer or
full page writes approach to the redo log (other than the fact that
the log files would be bigger). PG on the other hand *is* pushing its
logs over the wire...

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] BBU Cache vs. spindles

2010-10-26 Thread Rob Wultsch
On Tue, Oct 26, 2010 at 7:25 AM, Robert Haas  wrote:
> On Tue, Oct 26, 2010 at 10:13 AM, Rob Wultsch  wrote:
>> The double write buffer is one of the few areas where InnoDB does more
>> IO (in the form of fsynch's) than PG. InnoDB also has fuzzy
>> checkpoints (which help to keep dirty pages in memory longer),
>> buffering of writing out changes to secondary indexes, and recently
>> tunable page level compression.
>
> Baron Schwartz was talking to me about this at Surge.  I don't really
> understand how the fuzzy checkpoint stuff works, and I haven't been
> able to find a good description of it anywhere.  How does it keep
> dirty pages in memory longer?  Details on the other things you mention
> would be interesting to hear, too.

For checkpoint behavior:
http://books.google.com/books?id=S_yHERPRZScC&pg=PA606&lpg=PA606&dq=fuzzy+checkpoint&source=bl&ots=JJrzRUKBGh&sig=UOMPsRy5E-YDgjAFkaSVn3dps_M&hl=en&ei=_k8yTOfeHYzZnAepyumLBA&sa=X&oi=book_result&ct=result&resnum=8&ved=0CEYQ6AEwBw#v=onepage&q=fuzzy%20checkpoint&f=false

I would think that best case behavior "sharp" checkpoints with a large
checkpoint_completion_target would have behavior similar to a fuzzy
checkpoint.

Insert (for innodb 1.1+ evidently there is also does delete and purge)
buffering:
http://dev.mysql.com/doc/refman/5.5/en/innodb-insert-buffering.html

For a recent ~800GB db I had to restore, the insert buffer saved 92%
of io needed for secondary indexes.

Compression:
http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-compression-internals.html

For many workloads 50% compression results in negligible impact to
performance. For certain workloads compression can help performance.
Please note that InnoDB also has non-tunable toast like feature.


>> Given that InnoDB is not shipping its logs across the wire, I don't
>> think many users would really care if it used the double writer or
>> full page writes approach to the redo log (other than the fact that
>> the log files would be bigger). PG on the other hand *is* pushing its
>> logs over the wire...
>
> So how is InnoDB doing replication?  Is there a second log just for that?
>

The other log is the "binary log" and it is one of the biggest
problems with MySQL. Running MySQL in such a way that the binary log
stays in sync with the InnoDB redo has a very significant impact on
performance.
http://www.mysqlperformanceblog.com/2010/10/23/mysql-limitations-part-2-the-binary-log/
http://mysqlha.blogspot.com/2010/09/mysql-versus-mongodb-update-performance.html
(check out the pretty graph)

If you are going to West you should considering heading over to the
Facebook office on Tuesday as the MySQL team is having something of an
open house:
http://www.facebook.com/event.php?eid=160712450628622

Mark Callaghan from the Facebook MySQL Engineering (and several
members of their ops team, for that matter) team understands InnoDB
dramatically better than I do.

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] BBU Cache vs. spindles

2010-10-27 Thread Rob Wultsch
On Wed, Oct 27, 2010 at 6:55 PM, Robert Haas  wrote:
> On Wed, Oct 27, 2010 at 12:41 AM, Rob Wultsch  wrote:
>> On Tue, Oct 26, 2010 at 7:25 AM, Robert Haas  wrote:
>>> On Tue, Oct 26, 2010 at 10:13 AM, Rob Wultsch  wrote:
>>>> The double write buffer is one of the few areas where InnoDB does more
>>>> IO (in the form of fsynch's) than PG. InnoDB also has fuzzy
>>>> checkpoints (which help to keep dirty pages in memory longer),
>>>> buffering of writing out changes to secondary indexes, and recently
>>>> tunable page level compression.
>>>
>>> Baron Schwartz was talking to me about this at Surge.  I don't really
>>> understand how the fuzzy checkpoint stuff works, and I haven't been
>>> able to find a good description of it anywhere.  How does it keep
>>> dirty pages in memory longer?  Details on the other things you mention
>>> would be interesting to hear, too.
>>
>> For checkpoint behavior:
>> http://books.google.com/books?id=S_yHERPRZScC&pg=PA606&lpg=PA606&dq=fuzzy+checkpoint&source=bl&ots=JJrzRUKBGh&sig=UOMPsRy5E-YDgjAFkaSVn3dps_M&hl=en&ei=_k8yTOfeHYzZnAepyumLBA&sa=X&oi=book_result&ct=result&resnum=8&ved=0CEYQ6AEwBw#v=onepage&q=fuzzy%20checkpoint&f=false
>>
>> I would think that best case behavior "sharp" checkpoints with a large
>> checkpoint_completion_target would have behavior similar to a fuzzy
>> checkpoint.
>
> Well, under that definition of a fuzzy checkpoint, our checkpoints are
> fuzzy even with checkpoint_completion_target=0.
>
> What Baron seemed to be describing was a scheme whereby you could do
> what I might call partial checkpoints.  IOW, you want to move the redo
> pointer without writing out ALL the dirty buffers in memory, so you
> write out the pages with the oldest LSNs and then move the redo
> pointer to the oldest LSN you have left.  Except that doesn't quite
> work, because the page might have been dirtied at LSN X and then later
> updated again at LSN Y, and you still have to flush it to disk before
> moving the redo pointer to any value >X.  So you work around that by
> maintaining a "first dirtied" LSN for each page as well as the current
> LSN.
>
> I'm not 100% sure that this is how it works or that it would work in
> PG, but even assuming that it is and does, I'm not sure what the
> benefit is over the checkpoint-spreading logic we have now.  There
> might be some benefit in sorting the writes that we do, so that we can
> spread out the fsyncs.  So, write all the blocks to a give file,
> fsync, and then repeat for each underlying data file that has at least
> one dirty block.  But that's completely orthogonal to (and would
> actually be hindered by) the approach described in the preceding
> paragraph.

I wish I could answer your questions better. I am a power user that
does not fully understand InnoDB internals. There are not all that
many folks that have a very good understanding of InnoDB internals
(given how well it works there is not all that much need).

>
>> Insert (for innodb 1.1+ evidently there is also does delete and purge)
>> buffering:
>> http://dev.mysql.com/doc/refman/5.5/en/innodb-insert-buffering.html
>
> We do something a bit like this for GIST indices.  It would be
> interesting to see if it also has a benefit for btree indices.
>
>> For a recent ~800GB db I had to restore, the insert buffer saved 92%
>> of io needed for secondary indexes.
>>
>> Compression:
>> http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-compression-internals.html
>>
>> For many workloads 50% compression results in negligible impact to
>> performance. For certain workloads compression can help performance.
>> Please note that InnoDB also has non-tunable toast like feature.
>
> Interesting.  I am surprised this works well.  It seems that this only
> works for pages that can be compressed by >=50%, which seems like it
> could result in a lot of CPU wasted on failed attempts to compress.

In my world, the spinning disk is almost always the bottleneck.
Trading CPU for IO is almost always a good deal for me.

>
>>>> Given that InnoDB is not shipping its logs across the wire, I don't
>>>> think many users would really care if it used the double writer or
>>>> full page writes approach to the redo log (other than the fact that
>>>> the log files would be bigger). PG on the other hand *is* pushing its
>>>> logs over the wire...
>>>
>>> So how is InnoDB doing replication?  Is there a second log just for that?
>>>
>>
>> Th

[PERFORM] Group commit and commit delay/siblings

2010-12-05 Thread Rob Wultsch
Manual: 
http://www.postgresql.org/docs/9.0/static/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS
Recent discussion:
http://www.facebook.com/notes/mysql-at-facebook/group-commit-in-postgresql/465781235932

It is my understanding that group commit in PG works without the
commit_delay or commit_siblings being enabled. For many people coming
from other databases, the existence of these GUC seems to suggest that
group commit does not work without the being enabled.

Are these setting useful, and if so how should they be tuned?
If they are generally are not useful, should these settings be removed?

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Group commit and commit delay/siblings

2010-12-05 Thread Rob Wultsch
On Sun, Dec 5, 2010 at 7:30 PM, Jignesh Shah  wrote:
> The commit_siblings = 5 basically checks that it sleeps only when that
> many backends are active. This I think is a very expensive check and I
> would rather make commit_siblings=0 (which the current code does not
> support.. it only supports minimum of 1) The check is expensive
> irrespective of the settings .. But anyway here is the real kicker.
> In all the tests I did with recent verions 8.4 and version 9.0 , it
> seems that the default behavior handles the load well enough and one
> does not have to use commit_delay at all. Since when the load is very
> high all of them are basically in sync phase and the desired thing
> happens anyway.
>
> Infact using commit_delay will actually add the cost of doing
> commit_siblings check and can hurt the performance by increasing CPU
> consumption.. Doing commit_siblings check for every transaction is a
> killer since it does not return after meeting the minimum backends and
> goes through every backend to calculate the total number before
> comparing with the minimum. This is probably why most people see a
> drop in performance when using commit_delay compared to the default.
>
> Anyway  I would recommended right now to stick with the default and
> not really use it. It does the sync absorbtion well if you have two
> many users (though not perfect).

Sounds like this setting should go away unless there is a very good
reason to keep it.


-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

2010-12-17 Thread Rob Wultsch
On Fri, Dec 17, 2010 at 10:08 AM, Tom Polak
 wrote:
> What kind of performance can I expect out of Postgres compare to MSSQL?

You should take any generalizations with a grain of salt. I suggest
that you do a POC.

> Let's assume that Postgres is running on Cent OS x64 and MSSQL is running
> on Windows 2008 x64, both are on identical hardware running RAID 5 (for
> data redundancy/security), SAS drives 15k RPM, dual XEON Quad core CPUs,
> 24 GB of RAM.

RAID-5 = suckage for databases.

Things to think about:
How big is your data set and how big is your working set?
Do you have a raid card? Is it properly configured?


-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints

2011-02-13 Thread Rob Wultsch
On Thu, Feb 10, 2011 at 9:25 AM, Chris Browne  wrote:
> robertmh...@gmail.com (Robert Haas) writes:
>> On Thu, Feb 10, 2011 at 11:45 AM, Kevin Grittner
>>  wrote:
>>> Well, I'm comfortable digging in my heels against doing *lame* hints
>>> just because "it's what all the other kids are doing," which I think
>>> is the only thing which would have satisfied the OP on this thread.
>>> From both on-list posts and ones exchanged off-list with me, it
>>> seems he was stubbornly resistant to properly tuning the server to
>>> see if any problems remained, or posting particular problems to see
>>> how they would be most effectively handled in PostgreSQL.  We
>>> obviously can't be drawn into dumb approaches because of
>>> ill-informed demands like that.
>>
>> Nor was I proposing any such thing.  But that doesn't make "we don't
>> want hints" an accurate statement.  Despite the impression that OP
>> went away with, the real situation is a lot more nuanced than that,
>> and the statement on the Todo list gives the wrong impression, IMHO.
>
> I have added the following comment to the ToDo:
>
>   We are not interested to implement hints in ways they are commonly
>   implemented on other databases, and proposals based on "because
>   they've got them" will not be welcomed.  If you have an idea that
>   avoids the problems that have been observed with other hint systems,
>   that could lead to valuable discussion.
>
> That seems to me to characterize the nuance.


Where exactly are the problems with other systems noted? Most other
systems have this option so saying "They have problems" is a giant cop
out.


-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] commit so slow program looks frozen

2006-10-31 Thread Rob Lemley
Merlin Moncure wrote:
> On 10/28/06, Simon Riggs <[EMAIL PROTECTED]> wrote:
>> On Thu, 2006-10-26 at 11:06 -0400, Merlin Moncure wrote:
>> > On 10/26/06, Carlo Stonebanks <[EMAIL PROTECTED]> wrote:
>> > > This is pretty interesting - where can I read more on this?
>> Windows isn't
>> > > actually hanging, one single command line window is - from its
>> behaviour, it
>> > > looks like the TCL postgresql package is waiting for pg_exec to
>> come back
>> > > from the commit (I believe the commit has actually gone through).
>> > >
>> > > It could even be that there's something wrong with the TCL
>> package, but from
>> > > my understanding it is one of the most complete interfaces out
>> there - which
>> > > is weird, because TCL seems to be the most unpopular language in the
>> > > community.
>> >
>> > when it happens, make sure to query pg_locks and see what is going on
>> > there lock issues are not supposed to manifest on a commit, which
>> > releases locks, but you never know.  There have been reports of
>> > insonsistent lock ups on windows (espeically multi-processor) which
>> > you might be experiencing. Make sure you have the very latest version
>> > of pg 8.1.x.  Also consider checking out 8.2 and see if you can
>> > reproduce the behavior there...this will require compiling postgresql.
>>
>> Merlin,
>>
>> Rumour has it you managed to get a BT from Windows. That sounds like it
>> would be very useful here.

Could it be there is a hangup in communication with the backend via the
libpq library?

I have a situation on Windows where psql seems to be hanging randomly
AFTER completing (or almost completing) a vacuum full analyze verbose.

I'm running the same databases on a single postgres instance on a Dell
4gb RAM 2 processor xeon (hyper-threading turned off) running Debian
GNU/Linux.  The windows system is an IBM 24gb RAM, 4 processor xeon
(hyperthreading turned off).  No problems on the Dell, it runs pgbench
faster than the windows IBM system.  The Dell Linux system zips through
vacuumdb --all --analyze --full --verbose with no problems.  The windows
machine is running 6 instances of postgresql because of problems trying
to load all of the databases into one instance on windows.

The last output from psql is:

INFO:  free space map contains 474 pages in 163 relations
DETAIL:  A total of 2864 page slots are in use (including overhead).
2864 page slots are required to track all free space.
Current limits are:  42 page slots, 25000 relations, using 4154 KB.

(I've currently restarted postgresql with more reasonable fsm_page_slots
and fsm_relations).

It appears that psql is hung in the call to WS2_32!select.
The psql stack trace looks like this:

ntdll!KiFastSystemCallRet
ntdll!NtWaitForSingleObject+0xc
mswsock!SockWaitForSingleObject+0x19d
mswsock!WSPSelect+0x380
WS2_32!select+0xb9
WARNING: Stack unwind information not available. Following frames may be
wrong.
libpq!PQenv2encoding+0x1fb
libpq!PQenv2encoding+0x3a1
libpq!PQenv2encoding+0x408
libpq!PQgetResult+0x58
libpq!PQgetResult+0x188
psql+0x4c0f
psql+0x954d
psql+0x11e7
psql+0x1238
kernel32!IsProcessorFeaturePresent+0x9e

With more detail:

 # ChildEBP RetAddr  Args to Child
00 0022f768 7c822124 71b23a09 07a8 0001
ntdll!KiFastSystemCallRet (FPO: [0,0,0])
01 0022f76c 71b23a09 07a8 0001 0022f794
ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
02 0022f7a8 71b23a52 07a8 0780 
mswsock!SockWaitForSingleObject+0x19d (FPO: [Non-Fpo])
03 0022f898 71c0470c 0781 0022fc40 0022fb30 mswsock!WSPSelect+0x380
(FPO: [Non-Fpo])
04 0022f8e8 6310830b 0781 0022fc40 0022fb30 WS2_32!select+0xb9 (FPO:
[Non-Fpo])
WARNING: Stack unwind information not available. Following frames may be
wrong.
05 0022fd68 631084b1   001d libpq!PQenv2encoding+0x1fb
06 0022fd88 63108518 0001  00614e70 libpq!PQenv2encoding+0x3a1
07 0022fda8 631060f8 0001  00614e70 libpq!PQenv2encoding+0x408
08 0022fdc8 63106228 00614e70 00613a71 00615188 libpq!PQgetResult+0x58
09 0022fde8 00404c0f 00614e70 00613a71 0041ac7a libpq!PQgetResult+0x188
0a 0022fe98 0040954d 00613a71 00423180 00423185 psql+0x4c0f
0b 0022ff78 004011e7 0006 00613b08 00612aa8 psql+0x954d
0c 0022ffb0 00401238 0001 0009 0022fff0 psql+0x11e7
0d 0022ffc0 77e523e5   7ffdc000 psql+0x1238
0e 0022fff0  00401220  78746341
kernel32!IsProcessorFeaturePresent+0x9e

the pg_locks table:
-[ RECORD 1 ]-+
locktype  | relation
database  | 19553
relation  | 10342
page  |
tuple |
transactionid |
classid   |
objid |
objsubid  |
transaction   | 1998424
pid   | 576
mode  | AccessShareLock
granted   | t
-[ RECORD 2 ]-+
locktype  | transactionid
database  |
relation  |
page  |
tuple |
transactionid | 1998424
classid   |
objid |
objsubid  |
transaction   | 1998424
pid   | 576
mode  | Exc

Re: [PERFORM] Quad processor options

2004-05-11 Thread Rob Sell

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bjoern Metzdorf
Sent: Tuesday, May 11, 2004 3:11 PM
To: scott.marlowe
Cc: [EMAIL PROTECTED]; Pgsql-Admin (E-mail)
Subject: Re: [PERFORM] Quad processor options

scott.marlowe wrote:
>>Next drives I'll buy will certainly be 15k scsi drives.
> 
> Better to buy more 10k drives than fewer 15k drives.  Other than slightly 
> faster select times, the 15ks aren't really any faster.

Good to know. I'll remember that.

>>In peak times we can get up to 700-800 connections at the same time. 
>>There are quite some updates involved, without having exact numbers I'll 
>>think that we have about 70% selects and 30% updates/inserts.
> 
> Wow, a lot of writes then.

Yes, it certainly could also be only 15-20% updates/inserts, but this is 
also not negligible.

> Sure, adaptec makes one, so does lsi megaraid.  Dell resells both of 
> these, the PERC3DI and the PERC3DC are adaptec, then lsi in that order, I 
> believe.  We run the lsi megaraid with 64 megs battery backed cache.

The LSI sounds good.

> Intel also makes one, but I've heard nothing about it.

It could well be the ICP Vortex one, ICP was bought by Intel some time ago..

> I haven't directly tested anything but the adaptec and the lsi megaraid.  
> Here at work we've had massive issues trying to get the adaptec cards 
> configured and installed on, while the megaraid was a snap.  Installed RH,

> installed the dkms rpm, installed the dkms enabled megaraid driver and 
> rebooted.  Literally, that's all it took.

I didn't hear anything about dkms for debian, so I will be hand-patching 
as usual :)

Regards,
Bjoern


-

Personally I would stay away from anything intel over 2 processors.  I have
done some research and if memory serves it something like this. Intel's
architecture makes each processor compete for bandwidth on the bus to the
ram. Amd differs in that each proc has its own bus to the ram.

Don't take this as god's honest fact but just keep it in mind when
considering a Xeon solution, it may be worth your time to do some deeper
research into this. There is some on this here
http://www4.tomshardware.com/cpu/20030422/ 

Rob


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Configuring PostgreSQL to minimize impact of checkpoints

2004-05-11 Thread Rob Sell


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of scott.marlowe
Sent: Tuesday, May 11, 2004 2:23 PM
To: Paul Tuckfield
Cc: [EMAIL PROTECTED]; Matthew Nuzum; [EMAIL PROTECTED]; Rob
Fielding
Subject: Re: [PERFORM] Configuring PostgreSQL to minimize impact of
checkpoints

On Tue, 11 May 2004, Paul Tuckfield wrote:

> If you are having a "write storm" or bursty writes that's burying 
> performance, a scsi raid controler with writeback cache will greatly 
> improve the situation, but I do believe they run around $1-2k.   If 
> it's write specific problem, the cache matters more than the striping, 
> except to say that write specfic perf problems should avoid raid5

Actually, a single channel MegaRAID 320-1 (single channel ultra 320) is 
only $421 at http://www.siliconmechanics.com/c248/u320-scsi.php  It works 
pretty well for me, having 6 months of a production server on one with 
zero hickups and very good performance.  They have a dual channel intel 
card for only $503, but I'm not at all familiar with that card.

The top of the line megaraid is the 320-4, which is only $1240, which 
ain't bad for a four channel RAID controller.

Battery backed cache is an addon, but I think it's only about $80 or so.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

-

If you don't mind slumming on ebay :-) keep an eye out for PERC III cards,
they are dell branded LSI cards. Perc = Power Edge Raid Controller. There
are models on there dual channel u320 and dell usually sells them with
battery backed cache.  That's how I have acquired all my high end raid
cards.

Rob


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Configuring PostgreSQL to minimize impact of checkpoints

2004-05-11 Thread Rob Fielding
Matthew Nuzum wrote:
Jack Orenstein <[EMAIL PROTECTED]> writes:

I'm looking at one case in which two successive transactions, each
updating a handful of records, take 26 and 18 *seconds* (not msec) to
complete. These transactions normally complete in under 30 msec.

None of this is necessarily going to fix matters for an installation
that has no spare I/O capacity, though.  And from the numbers you're
quoting I fear you may be in that category.  "Buy faster disks" may
be the only answer ...

I had a computer once that had an out-of-the-box hard drive configuration
that provided horrible disk performance.  I found a tutorial at O'Reilly
that explained how to use hdparm to dramatically speed up disk performance
on Linux.  I've noticed on other computers I've set up recently that hdparm
seems to be used by default out of the box to give good performance.
Maybe your computer is using all of it's I/O capacity because it's using PIO
mode or some other non-optimal method of accessing the disk.
There's certainly some scope there. I have an SGI Octane whos SCSI 2 
disks were set-up by default with no write buffer and CTQ depth of zero 
:/ IDE drivers in Linux maybe not detecting your IDE chipset correctly 
and stepping down, however unlikely there maybe something odd going on 
but you could check hdparm out. Ensure correct cables too, and the 
aren't crushed or twisted too bad I digress...

Assuming you're running with optimal schema and index design (ie you're 
not doing extra work unnecessarily), and your backend has 
better-then-default config options set-up (plenty of tips around here), 
then disk arrangement is critical to smoothing the ride.

Taking things to a relative extreme, we implemented a set-up with issues 
similar sounding to yours. It was resolved by first optimising 
everything but hardware, then finally optimising hardware. This served 
us because it meant we squeezed as much out of the available hardware, 
before finally throwing more at it, getting us the best possible returns 
(plus further post optimisation on the new hardware).

First tip would to take your pg_xlog and put it on another disk (and 
channel). Next if you're running a journalled fs, get that journal off 
onto another disk (and channel). Finally, get as many disks for the data 
store and spread the load across spindles. You're aiming here to 
distribute the contention and disk I/O more evenly to remove the 
congestion. sar and iostat help out as part of the analysis.

You say you're using IDE, for which I'd highly recommend switching to 
SCSI and mutliple controllers because IDE isn't great for lots of other 
reasons. Obviously budgets count, and playing with SCSI certainly limits 
that. We took a total of 8 disks across 2 SCSI 160 channels and split up 
the drives into a number of software RAID arrays. RAID0 mirrors for the 
os, pg_xlog, data disk journal and swap and the rest became a RAID5 
array for the data. You could instead implement your DATA disk as 
RAID1+0 if you wanted more perf at the cost of free space. Anyway, it's 
certainly not the fastest config out there, but it made all the 
difference to this particular application. Infact, we had so much free 
I/O we recently installed another app on there (based on mysql, sorry) 
which runs concurrently, and itself 4 times faster than it originally did...

YMMV, just my 2p.

--

Rob Fielding
[EMAIL PROTECTED]
www.dsvr.co.uk  Development Designer Servers Ltd

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] OT: Help with performance problems

2004-04-26 Thread Rob Fielding
scott.marlowe wrote:
On Fri, 23 Apr 2004, Chris Hoover wrote:

DB's on Powervaults 220S using raid 5 (over 6 disks)

What controller is this, the adaptec?  We've found it to be slower than 
the LSI megaraid based controller, but YMMV.
Wow, really? You got any more details of the chipset, mobo and kernel 
driver ?

I've been taken to my wits end wrestling with an LSI MegaRAID 320-1 
controller on a supermicro board all weekend. I just couldn't get 
anything more than 10MB/sec out of it with megaraid driver v1 OR v2 in 
Linux 2.4.26, nor the version in 2.6.6-rc2. After 2 days of humming the 
Adaptec mantra I gave in and switched the array straight onto the 
onboard Adaptec 160 controller (same cable and everything). Software 
RAID 5 gets me over 40MB sec for a nominal cpu hit - more than 4 times 
what I could get out of the MegaRAID controller :( Even the 2nd SCSI-2 
channel gets 40MB/sec max (pg_xlog :)

And HOW LONG does it take to detect drives during POSTo never 
mind ... I really just wanna rant :) There should be a free counseling 
service for enraged sysops.

--
Rob Fielding
[EMAIL PROTECTED]
www.dsvr.co.uk  Development Designer Servers Ltd
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] System overload / context switching / oom, 8.3

2010-02-03 Thread Rob Lemley

Andy Colson wrote:

> work_mem = 32MB
> maintenance_work_mem = 64MB


if you have lots and lots of connections, you might need to cut these 
down?


definitely, work_mem is the main focus.

If I understand correctly, th 64MB maintenance_work_mem is per vacuum 
task, and on this system there are 3 autovacuums.  I was wondering if 
with this many databases, possibly decreasing the maintenance_work_mem 
significantly and starting up more autovacuums.


Yes, also moving databases to other servers in order to decrease the 
number of connections.


> effective_cache_size = 5000MB

I see your running a 32bit, but with bigmem support, but still, one 
process is limited to 4gig.  You'd make better use of all that ram if 
you switched to 64bit.  And this cache, I think, would be limited to 
4gig.
All of the cache is being used because the operating system kernel is 
built with the memory extensions to access outside the 32bit range.  
This is the cache size reported by free(1). However, there may be 
advantages to switch to 64bit.




The oom-killer is kicking in, at some point, so somebody is using too 
much ram.  There should be messages or logs or something, right?  
(I've never enabled the oom stuff so dont know much about it).  But 
the log messages might be helpful.


Also, do you know what the oom max memory usage is set to?  You said:
"oom_adj -17.  vm_overcommit_memory set to 2, but at this time 
vm_overcommit_ratio was still at 50 (has since been changed to 90, 
should this be 100?)"


Oh man.  I encourage everyone to find out what /proc//oom_adj 
means.  You have to set this to keep the Linux "oom-killer" from doing a 
kill -9 on postgres postmaster.  On Debian:


echo -17 >> /proc/$(cat /var/run/postgresql/8.3-main.pid)/oom_adj

This is my experience with oom-killer.  After putting -17 into 
/proc/pid/oom_adj, oom-killer seemed to kill one of the database 
connection processes.  Then the postmaster attempted to shut down all 
processes because of possible shared memory corruption.  The database 
then went into recovery mode.  After stopping the database some of the 
processes were stuck and could not be killed.  The operating system was 
rebooted and the database returned with no data loss.


My earlier experience with oom-killer: If you don't have this setting in 
oom_adj, then it seems likely (certain?) that oom-killer kills the 
postmaster because of the algorithm oom-killer uses (called badness()) 
which adds children process scores to their parent's scores.  I don't 
know if sshd was killed but I don't think anyone could log in to the 
OS.  After rebooting there was a segmentation violation when trying to 
start the postmaster.  I don't think that running pg_resetxlog with 
defaults is a good idea.  My colleague who has been investigating the 
crash believes that we could have probably eliminated at least some of 
the data loss with more judicious use of pg_resetxlog.


There was a discussion on the postgres lists about somehow having the 
postgres distribution include the functionality to set oom_adj on 
startup.  To my knowledge, that's not in 8.3 so I wrote a script and 
init.d script to do this on Debian systems.


As far as vm.over_commit memory goes, there are three settings and most 
recommend setting it to 2 for postgres.  However, this does not turn off 
oom-killer!  You need to put -17 in /proc//oom_adj whether you do 
anything about vm.over_commit memory or not  We had vm_overcommit_memory 
set to 2 and oom-killer became active and killed the postmaster.


Kind of off-topic, but a Linux kernel parameter that's often not set on 
database servers is elevator=deadline which sets up the io scheduling 
algorithm.  The algorithm can be viewed/set at runtime for example the 
disk /dev/sdc in /sys/block/sdc/queue/scheduler.


Rob




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] How to force Nested Loop plan?

2003-08-30 Thread Rob Nagler
I'm trying to understand how I can get the planner to always do the
right thing with this query:

EXPLAIN ANALYZE
SELECT
aa_t.min_date_time
FROM
aa_t
, bb_t
, cc_t
WHERE bb_t.bb_id = aa_t.bb_id
AND aa_t.realm_id = cc_t.realm_id
AND aa_t.server_id = 21
ORDER BY aa_t.min_date_time desc
LIMIT 1
OFFSET 674
;

There's a extreme elbow in the performance curve around the sum of
LIMIT and OFFSET.  The two plans follow.  First for the query above:

 Limit  (cost=21569.56..21601.56 rows=1 width=84) (actual time=59.60..59.69 rows=1 
loops=1)
   ->  Nested Loop  (cost=0.00..110535.66 rows=3454 width=84) (actual time=0.19..59.20 
rows=676 loops=1)
 ->  Nested Loop  (cost=0.00..93177.46 rows=3454 width=65) (actual 
time=0.14..44.41 rows=676 loops=1)
   ->  Index Scan Backward using aa_t20 on aa_t  (cost=0.00..76738.77 
rows=3454 width=46) (actual time=0.10..31.30 rows=676 loops=1)
 Filter: (server_id = 21::numeric)
   ->  Index Scan using cc_t1 on cc_t  (cost=0.00..4.75 rows=1 width=19) 
(actual time=0.01..0.01 rows=1 loops=676)
 Index Cond: ("outer".realm_id = cc_t.realm_id)
 ->  Index Scan using bb_t1 on bb_t  (cost=0.00..5.01 rows=1 width=19) (actual 
time=0.02..0.02 rows=1 loops=676)
   Index Cond: (bb_t.bb_id = "outer".bb_id)
 Total runtime: 59.89 msec
(10 rows)

Setting OFFSET to 675 in the above query, results in this 100 times
slower plan:

 Limit  (cost=21614.48..21614.48 rows=1 width=84) (actual time=4762.39..4762.39 rows=1 
loops=1)
   ->  Sort  (cost=21612.79..21621.42 rows=3454 width=84) (actual 
time=4761.45..4761.92 rows=677 loops=1)
 Sort Key: aa_t.min_date_time
 ->  Merge Join  (cost=21139.96..21409.80 rows=3454 width=84) (actual 
time=4399.80..4685.24 rows=41879 loops=1)
   Merge Cond: ("outer".bb_id = "inner".bb_id)
   ->  Sort  (cost=8079.83..8184.53 rows=41879 width=19) (actual 
time=936.99..967.37 rows=41879 loops=1)
 Sort Key: bb_t.bb_id
 ->  Seq Scan on bb_t  (cost=0.00..4864.79 rows=41879 width=19) 
(actual time=0.06..729.60 rows=41879 loops=1)
   ->  Sort  (cost=13060.13..13068.76 rows=3454 width=65) (actual 
time=3462.76..3493.97 rows=41879 loops=1)
 Sort Key: aa_t.bb_id
 ->  Merge Join  (cost=12794.42..12857.14 rows=3454 width=65) 
(actual time=2923.62..3202.78 rows=41879 loops=1)
   Merge Cond: ("outer".realm_id = "inner".realm_id)
   ->  Sort  (cost=12762.78..12771.41 rows=3454 width=46) 
(actual time=2920.78..2950.87 rows=41879 loops=1)
 Sort Key: aa_t.realm_id
 ->  Index Scan using aa_t5 on aa_t  
(cost=0.00..12559.79 rows=3454 width=46) (actual time=0.18..2589.22 rows=41879 loops=1)
   Index Cond: (server_id = 21::numeric)
   ->  Sort  (cost=31.64..32.78 rows=455 width=19) (actual 
time=2.54..33.12 rows=42163 loops=1)
 Sort Key: cc_t.realm_id
 ->  Seq Scan on cc_t  (cost=0.00..11.55 rows=455 
width=19) (actual time=0.04..0.86 rows=455 loops=1)
 Total runtime: 4792.84 msec
(20 rows)

Twiddling effective_cache_size and random_page_cost allows for a large
LIMIT+OFFSET number but not enough.  These tests are made with 40
effective_cache_size and random_page_cost of 4.

I can increase the LIMIT+OFFSET elbow to 1654 by changing the
query thusly:

< AND aa_t.server_id = 21
---
> AND aa_t.server_id IN (21, 0)

The value 0 is an invalid server_id, so I know it won't be returned.
However, I've got 41K rows that could be returned by this query and
growing, and 1654 is obviously not enough.  (aa is 690K rows, bb is
41K rows, and cc is 500 rows.)

If I drop the ORDER BY, the query goes much faster, but the query is
useless without the ORDER BY.

I've figured out that the second plan is slow, because it is writing a
huge result set to disk (+200MB).  This doesn't make sense to me,
since sort_mem is 32000.

Is there a way to tell the optimizer to use Nested Loop plan always
instead of the Merge/Join plan?  Turning off enable_mergejoin is
obviously not an option.

Thanks,
Rob



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] How to force Nested Loop plan?

2003-08-30 Thread Rob Nagler
Tom Lane writes:
> The reason the planner does not much like this plan is that it's
> estimating that quite a lot of rows will have to be hit in min_date_time
> order before it finds enough rows with server_id = 21.  Thus the high
> cost estimate for the above step.

Thanks for the speedy and useful reply!  More questions follow. :)

Very interesting.  How does it know "quite a lot"?  Is there something
I can do to get the planner to analyze the data better?

> I suspect that the reason you like this plan is that there's actually
> substantial correlation between server_id and min_date_time, such that
> the required rows are found quickly.  Before trying to force the planner
> into what you consider an optimal plan, you had better ask yourself
> whether you can expect that correlation to hold up in the future.
> If not, your plan could become pessimal pretty quickly.

The correlation holds.  min_date_time increases over time as records
are inserted.  server_id is uniformly distributed over time.  There's
no randomness.  There is at least one 21 record for every value of
min_date_time.  21 is a special server_id containing aggregate
(denormalized) data for the other servers.  I thought about putting it
in a separate table, but this would complicate the code as the data is
identical to the non-aggregated case.

Do you have any suggestions for organizing the data/query now that you
know this?

> I'd suggest creating a double-column index:

Thanks.  I'll try this.

I'm a very big fan of declarative programming.  However, there's a
danger in declarative programming when the interperter isn't smart
enough.  When I add this index, I will slow down inserts (about
20K/day) and increase data size (this is the second largest table in
the database).  Moreover, if the planner is improved, I've should fix
my code, delete the index, etc.

Is there a way of giving the planner direct hints as in Oracle?  They
can be ignored when the optimizer is improved, just as "register" is
ignored by C compilers nowadays.

Adding the extra index and ORDER BY is also not easy in our case.  The
query is dynamically generated.  I can force the query ORDER BY to be
whatever I want, but I would lose the ability to do interesting
things, like the automatic generation of ORDER BY when someone clicks
on a column header in the application.  Indeed there are times when
people want to sort on other columns in the query. I reduced the
problem to the salient details for my post to this board.  What if the
ORDER BY was:

ORDER BY aa_t.server_id DESC, cc_t.name ASC

Would the planner do the right thing?

> PS: does server_id really need to be NUMERIC?  Why not integer, or at
> worst bigint?

It is a NUMERIC(18).  It could be a bigint.  What would be the change
in performance of this query if we changed it to bigint?

BTW, my customer is probably going to be switching to Oracle.  This
particular query has been one of the reasons.  Maybe this change will
help us stay with Postgres.

Thanks,
Rob



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] How to force Nested Loop plan?

2003-08-30 Thread Rob Nagler
attention to it before this.  When we first set up
Oracle, we got into all of its parameters pretty heavily.  With
Postgres, we just tried it and it worked.  This is the first query
where we ran out of ideas to try.

BTW, everybody's help on this list is fantastic.  Usually, I can find
the answer to my question (and have been doing so for 3 years) on this
list without asking.

Thanks,
Rob



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] How to force Nested Loop plan?

2003-08-31 Thread Rob Nagler
Ron Johnson writes:
> Dumb question: given your out-of-the-box satisfaction, could it be
> that postgresql.conf hasn't been tweaked?

Here are the modified values:

shared_buffers = 8000
wal_buffers = 80
sort_mem = 32000
effective_cache_size = 40
random_page_cost = 4
autocommit = false
timezone = UTC

I had run a test with effective_cache_size to high value to see what
would happen.  Also adjusted random_page_cost:

random_page_cost effective_cache_size   elbow
4   4   675
.5  4   592
.1  4   392
4   100030

My conclusion is that random_page_cost should be left alone and
effective_cache_size higher is better.

BTW, the hardware is 2 x 2.4ghz Xeon, 1.2GB, SCSI (linux software
raid) with 10K disks.  This is close to the production box.  Although
we are planning on adding more memory to production.

Rob



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] How to force Nested Loop plan?

2003-08-31 Thread Rob Nagler
Tom Lane writes:
> Keep in mind though that you seem to be experimenting with a
> fully-cached database; you may find that the planner's beliefs more
> nearly approach reality when actual I/O has to occur.

My hope is that the entire database should fit in memory.  This may
not be in the case right now with only 1GB, but it should be close.
The pgsql/data/base/NNN directory is about 1.5GB on production.  I'm
pretty sure with constant vacuuming, we could keep that size down.
A pgdump is about 60MB now, growing at about .5MB a day.

> Another thing I'd be interested to know about is how closely the
> physical order of the table entries correlates with min_date_time.

Probably "pretty close".  The primary key of aa_t is (bb_id,
server_id), and bb_id is a sequence.  aa_t is updated heavily on
production, but these tests are on a fresh import so vacuuming and
index order is not a factor.  We do a reload every now and then to
improve performance on production.  min_date_time is highly correlated
with bb_id, because both are increasing constantly.  server_id is one
of 16 values.

> A high correlation reduces the actual cost of the indexscan (since
> visiting the rows in index order becomes less of a random-access
> proposition).  We are aware that the planner doesn't model this effect
> very well at present ...

Oracle's optimizer is lacking here, too.  The best optimizer I've seen
was at Tandem, and even then hints were required.

Are there plans for explicit hints to the planner?

Thanks,
Rob



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] reindex/vacuum locking/performance?

2003-10-03 Thread Rob Nagler
I've read some posts that says vacuum doesn't lock, but my experience
today indicates the opposite.  It seemed that "vacuum full analyze"
was locked waiting and so were other postmaster processes.  It
appeared to be deadlock, because all were in "WAITING" state according
to ps.  I let this go for about a 1/2 hour, and then killed the vacuum
at which point all other processes completed normally.

The same thing seemed to be happening with reindex on a table.  It
seems that the reindex locks the table and some other resource which
then causes deadlock with other active processes.

Another issue seems to be performance.  A reindex on some indexes is
taking 12 minutes or so.  Vacuum seems to be slow, too.  Way longer
than the time it takes to reimport the entire database (30 mins).

In summary, I suspect that it is better from a UI perspective to bring
down the app on Sat at 3 a.m and reimport with a fixed time period
than to live through reindexing/vacuuming which may deadlock.  Am I
missing something?

Thanks,
Rob



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-03 Thread Rob Nagler
> vacuum full does require exclusive lock, plain vacuum does not.

I think I need full, because there are updates on the table.  As I
understand it, an update in pg is an insert/delete, so it needs
to be garbage collected.

> It's considerably more likely that the vacuum was waiting for an open
> client transaction (that had a read or write lock on some table) to
> finish than that there was an undetected deadlock.  I suggest looking at
> your client code.  Also, in 7.3 or later you could look at the pg_locks
> view to work out exactly who has the lock that's blocking vacuum.

My client code does a lot.  I look at more often than I'd like to. :-) 

I don't understand why the client transaction would block if vacuum
was waiting.  Does vacuum lock the table and then try to get some
other "open transaction" resource?  Free space?  I guess I don't
understand what other resources would be required of vacuum.  The
client transactions are short (< 1s).  They don't deadlock normally,
only with reindex and vacuum did I see this behavior.

> vacuum full is indeed slow.  That's why we do not recommend it as a
> routine maintenance procedure.  The better approach is to do plain
> vacuums often enough that you don't need vacuum full.

The description of vacuum full implies that is required if the db
is updated frequently.   This db gets about 1 txn a second, possibly
more at peak load.

> In pre-7.4
> releases you might need periodic reindexes too, depending on whether
> your usage patterns tickle the index-bloat problem.

7.3, and yes, we have date indexes as well as sequences for primary
keys.
  
> But it is easily
> demonstrable that reindexing is cheaper than rebuilding the database.

IOW, vacuum+reindex is faster than dump+restore?  I didn't see this,
then again, I had this locking problem, so the stats are distorted.

One other question: The reindex seems to lock the table for the entire
process as opposed to freeing the lock between index rebuilds.  It was
hard to see, but it seemed like the clients were locked for the entire
"reindex table bla" command.

Sorry for lack of detail, but I didn't expect these issues so I wasn't
keeping track of the system state as closely as I should have.  Next
time. :-)

Thanks,
Rob

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Speeding up Aggregates

2003-10-10 Thread Rob Nagler
Greg Stark writes:
> Call it a wishlist bug. The problem is it would be a hard feature to
> implement properly. And none of the people paid to work on postgres
> by various companies seem to have this on their to-do lists. So
> don't expect it in the near future.

We are using Postgres heavily, and we should be able to find some time
and/or funding to help.

We're becoming more and more frustrated with the discontinuous
behaviour of the planner.  It seems every complex query we have these
days needs some "hint" like "ORDER BY foo DESC LIMIT 1" to make it run
on the order of seconds, not minutes.  We usually figure out a way to
write the query so the planner does the right thing, and pushes
the discontinuity out far enough that the user doesn't see it.
However, it takes a lot of work, and it seems to me that work would be
put to better use improving the planner than improving our knowledge
of how to get the planner to do the right thing by coding the SQL in
some unusual way.

Please allow me to go out on a limb here.  I know that Tom is
philosophically opposed to planner hints.  However, we do have a
problem that the planner is never going to be smart enough.  This
leaves the SQL coder the only option of collecting a bag of
(non-portable) SQL tricks.  I saw what the best SQL coders did at
Tandem, and frankly, it's scary.  Being at Tandem, the SQL coders also
had the option (if they could argue their case strong enough) of
adding a new rule to the optimizer.  This doesn't solve the problem
for outsiders no matter how good they are at SQL.

Would it be possible to extend the planner with a pattern matching
language?  It would formalize what it is doing already, and would
allow outsiders to teach the planner about idiosyncrasies without
changing the SQL.  It would be like a style sheet in Latex (or Scribe :-)
if you are familiar with these typesetting languages.

Comments?

Rob



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] vacuum locking

2003-10-17 Thread Rob Nagler
It seems a simple "vacuum" (not full or analyze) slows down the
database dramatically.  I am running vacuum every 15 minutes, but it
takes about 5 minutes to run even after a fresh import.  Even with
vacuuming every 15 minutes, I'm not sure vacuuming is working
properly.

There are a lot of updates.  The slowest relation is the primary key
index, which is composed of a sequence.  I've appended a csv with the
parsed output from vacuum.  The page counts are growing way too fast
imo.  I believe this is caused by the updates, and index pages not
getting re-used.  The index values aren't changing, but other values
in the table are.

Any suggestions how to make vacuuming more effective and reducing the
time it takes to vacuum?  I'd settle for less frequent vacuuming or
perhaps index rebuilding.  The database can be re-imported in about an
hour.

Rob

Spacing every 15 minutes
Pages,Tuples,Deleted
7974,1029258,1536
7979,1025951,4336
7979,1026129,52
7979,1025618,686
7979,1025520,152
7980,1025583,28
7995,1028008,6
8004,1030016,14
8010,1026149,4965
8012,1026684,6
8014,1025910,960
8020,1026812,114
8027,1027642,50
8031,1027913,362
8040,1028368,784
8046,1028454,1143
8049,1029155,6
8053,1029980,10
8065,1031506,24
8084,1029134,4804
8098,1031004,346
8103,1029412,3044
8118,1029736,1872
8141,1031643,1704
8150,1032597,286
8152,1033222,6
8159,1029436,4845
8165,1029987,712
8170,1030229,268
8176,1029568,1632
8189,1030136,1540
8218,1030915,3963
8255,1033049,4598
8297,1036583,3866
8308,1031412,8640
8315,1031987,1058
8325,1033892,6
8334,1030589,4625
8350,1031709,1040
8400,1033071,5946
8426,1031555,8368
8434,1031638,2240
8436,1031703,872
8442,1031891,612



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] vacuum locking

2003-10-17 Thread Rob Nagler
Shridhar Daithankar writes:
> You should try 7.4 beta and pg_autovacuum which is a contrib module
> in CVS tip. 

It's on our todo list. :)

How does pg_autovacuum differ from vacuumdb?  I mean it seems to call
the vacuum operation underneath just as vacuumdb does.  I obviously
didn't follow the logic as to how it gets there. :-)

> Make sure that you have FSM properly tuned. Bump it from defaults to
> suit your needs. I hope you have gone thr. this page for general
> purpose setting.

I didn't start vacuuming regularly until recently, so I didn't see
this problem.

> Assuming those were incremental figures, largest you have is ~8000
> tuples per 15 minutes and 26 pages. I think with proper FSM/shared
> buffers/effective cache and a pg_autovacuum with 1 min. polling
> interval, you could end up in lot better shape.

Here are the numbers that are different.  I'm using 7.3:

shared_buffers = 8000
sort_mem = 8000
vacuum_mem = 64000
effective_cache_size = 4

free says:
 total   used   free sharedbuffers cached
Mem:   10306761005500  25176  0  85020 382280
-/+ buffers/cache: 538200 492476
Swap:  2096472 2728201823652

It seems effective_cache_size is about right.

vacuum_mem might be slowing down the system?  But if I reduce it,
won't vacuuming get slower?

max_fsm_relations is probably too low (the default in my conf file
says 100, probably needs to be 1000).  Not sure how this affects disk
usage.

Here's the summary for the two active tables during a vacuum interval
with high activity.  The other tables don't get much activity, and are
much smaller.  As you see the 261 + 65 adds up to the bulk of the 5
minutes it takes to vacuum.

INFO:  Removed 8368 tuples in 427 pages.
CPU 0.06s/0.04u sec elapsed 1.54 sec.
INFO:  Pages 24675: Changed 195, Empty 0; Tup 1031519: Vac 8368, Keep 254, UnUsed 1739.
Total CPU 2.92s/2.58u sec elapsed 65.35 sec.

INFO:  Removed 232 tuples in 108 pages.
CPU 0.01s/0.02u sec elapsed 0.27 sec.
INFO:  Pages 74836: Changed 157, Empty 0; Tup 4716475: Vac 232, Keep 11, UnUsed
641.
Total CPU 10.19s/6.03u sec elapsed 261.44 sec.

How would vacuuming every minute finish in time?  It isn't changing
much in the second table, but it's taking 261 seconds to wade through
5m rows.

Assuming I vacuum every 15 minutes, it would seem like max_fsm_pages
should be 1000, because that's about what was reclaimed.  The default
is 1.  Do I need to change this?

Sorry to be so dense, but I just don't know the right values are.

Thanks muchly for the advice,
Rob



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] vacuum locking

2003-10-17 Thread Rob Nagler
Manfred Koizar writes:
> ISTM you are VACCUMing too aggressively.  You are reclaiming less than
> 1% and 0.005%, respectively, of tuples.  I would increase FSM settings
> to ca. 1000 fsm_relations, 10 fsm_pages and VACUUM *less* often,
> say every two hours or so.

I did this.  We'll see how it goes.

> ... or configure autovacuum to VACUUM a table when it has 10% dead
> tuples.

This solution doesn't really fix the fact that VACUUM consumes the
disk while it is running.  I want to avoid the erratic performance on
my web server when VACUUM is running.

mfg,
Rob


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] vacuum locking

2003-10-17 Thread Rob Nagler
Josh Berkus writes:
> Yes, but it will have less of an impact on the system while it's running.

We'll find out.   I lowered it to vacuum_mem to 32000.

> What sort of disk array do you have?   That seems like a lot of time 
> considering how little work VACUUM is doing.

Vendor: DELL Model: PERCRAID Mirror  Rev: V1.0
  Type:   Direct-AccessANSI SCSI revision: 02

Two 10K disks attached.

Rob

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] vacuum locking

2003-10-22 Thread Rob Nagler
Vivek Khera writes:
> AMI or Adaptec based?

Adaptec, I think.  AIC-7899 LVD SCSI is what dmidecode says, and
Red Hat/Adaptec aacraid driver, Aug 18 2003 is what comes up when it
boots.  I haven't be able to use the aac utilities with this driver,
however, so it's hard to interrogate the device.

> If AMI, make sure it has write-back cache enabled (and you have
> battery backup!), and disable the 'readahead' feature if you can.

I can't do this so easily.  It's at a colo, and it's production.
I doubt this has anything to do with this problem, anyway.  We're
talking about hundreds of megabytes of data.

> What's the disk utilization proir to running vacuum?  If it is
> hovering around 95% or more of capacity, of course you're gonna
> overwhelm it.

Here's the vmstat 5 at a random time:

   procs  memoryswap  io system cpu
 r  b  w   swpd   free   buff  cache  si  sobibo   incs  us  sy  id
 0  0  0 272372  38416  78220 375048   0   3 2 00 0   2   2   0
 0  0  0 272372  3  78320 375660   0   034   274  382   284   5   1  94
 0  1  0 272372  23012  78372 375924   0   025   558  445   488   8   2  90
 1  0  0 272368  22744  78472 376192   0   6   125   594  364   664   9   3  88

And here's it during vacuum:

   procs  memoryswap  io system cpu
 r  b  w   swpd   free   buff  cache  si  sobibo   incs  us  sy  id
 1  2  1 277292   9620  72028 409664  46  32  4934  4812 1697   966   8   4  88
 0  3  0 277272   9588  72096 412964  61   0  7303  2478 1391   976   3   3  94
 2  2  0 277336   9644  72136 393264 1326  32  2827  2954 1693  1519   8   3  89
The pages are growing proportionately with the number of tuples, btw.
Here's a vacuum snippet from a few days ago after a clean import,
running every 15 minutes:

INFO:  Removed 2192 tuples in 275 pages.
CPU 0.06s/0.01u sec elapsed 0.91 sec.
INFO:  Pages 24458: Changed 260, Empty 0; Tup 1029223: Vac 2192, Keep 3876, UnUsed 26.
Total CPU 2.91s/2.22u sec elapsed 65.74 sec.

And here's the latest today, running every 2 hours:

INFO:  Removed 28740 tuples in 1548 pages.
CPU 0.08s/0.06u sec elapsed 3.73 sec.
INFO:  Pages 27277: Changed 367, Empty 0; Tup 1114178: Vac 28740, Keep 1502, UnUsed 
10631.
Total CPU 4.78s/4.09u sec elapsed 258.10 sec.

The big tables/indexes are taking longer, but it's a big CPU/elapsed
time savings to vacuum every two hours vs every 15 minutes.

There's still the problem that when vacuum is running interactive
performance drops dramatically.  A query that takes a couple of
seconds to run when the db isn't being vacuumed will take minutes when
vacuum is running.  It's tough for me to correlate exactly, but I
suspect that while postgres is vacuuming an index or table, nothing else
runs.  In between relations, other stuff gets to run, and then vacuum
hogs all the resources again.  This could be for disk reasons or
simply because postgres locks the index or table while it is being
vacuumed.  Either way, the behavior is unacceptable.  Users shouldn't
have to wait minutes while the database picks up after itself.

The concept of vacuuming seems to be problematic.  I'm not sure why
the database simply can't garbage collect incrementally.  AGC is very
tricky, especially AGC that involves gigabytes of data on disk.
Incremental garbage collection seems to be what other databases do,
and it's been my experience that other databases don't have the type
of unpredictable behavior I'm seeing with Postgres.  I'd rather the
database be a little bit slower on average than have to figure out the
best time to inconvenience my users.

Since my customer already has Oracle, we'll be running tests in the
coming month(s :-) with Oracle to see how it performs under the same
load and hardware.  I'll keep this group posted.

Rob



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] vacuum locking

2003-10-23 Thread Rob Nagler
Tom Lane writes:
> ... if all tuples are the same size, and if you never have any

Incorrect.  If the tuples smaller, Oracle does the right thing.  If
there's enough space in the page, it shifts the tuples to make room.
That's what pctfree, pctused and pctincrease allow you to control.
It's all in memory so its fast, and I don't think it has to update any
indices.

> transactions that touch enough tuples to overflow your undo segment

That's easily configured, and hasn't been a problem in the databases
I've managed.

> (or even just sit there for a long time, preventing you from recycling

That's probably bad software or a batch system--which is tuned
differently.  Any OLTP system has to be able to partition its problems
to keep transactions short and small.  If it doesn't, it will not be
usable.

> undo-log space; this is the dual of the VACUUM-can't-reclaim-dead-tuple
> problem).  And a few other problems that any Oracle DBA can tell you
> about.  I prefer our system.

Oracle seems to make the assumption that data changes, which is why it
manages free space within each page as well as within free lists.  The
database will be bigger but you get much better performance on DML.
It is very good at caching so reads are fast.

Postgres seems to make the assumption that updates and deletes are
rare.  A delete/insert policy for updates means that a highly indexed
table requires lots of disk I/O when the update happens and the
concomitant garbage collection when vacuum runs.  But then MVCC makes
the assumption that there's lots of DML.  I don't understand the
philosphical split here.

I guess I don't understand what application profiles/statistics makes
you prefer Postgres' approach over Oracle's.

> The increased I/O activity is certainly to be expected, but what I find
> striking here is that you've got substantial swap activity in the second
> trace.  What is causing that?  Not VACUUM I don't think.  It doesn't have
> any huge memory demand.  But swapping out processes could account for
> the perceived slowdown in interactive response.

The box is a bit memory starved, and we'll be addressing that
shortly.  I don't think it accounts for 3 minute queries, but perhaps
it might.  vacuum_mem is 32mb, btw.

Rob



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] Use of multipart index with "IN"

2003-10-24 Thread Rob Messer
I have a reporting system that does regular queries on a table with a
multipart index.  I am running version 7.3.4.  Here is the table
definition:

   Table "public.ds_rec_fld"
Column |  Type   | Modifiers
---+-+---
 dsid  | character varying(20)   | not null
 recid | integer | not null
 field_name| character varying(20)   | not null
 option_tag| character varying(10)   | not null
 option_value  | integer |
 field_text| character varying(2000) |
 field_type_cd | character varying(8)|
Indexes: ds_rf_ndx1 btree (recid, field_name, option_value)

Normally queries are done using recid and field_name, so Postgresql
returns rows very quickly as expected.  Here is a sample explain
analyze output for a typical query:

db=> explain analyze
db-> select field_name, option_tag from ds_rec_fld where recid = 3000
and field_name = 'Q3A1';
   QUERY PLAN

-
 Index Scan using ds_rf_ndx1 on ds_rec_fld  (cost=0.00..163.09 rows=40
width=38) (actual time=0.06..0.07 rows=1 loops=1)
   Index Cond: ((recid = 3001) AND (field_name = 'Q3A1'::character
varying))
 Total runtime: 0.12 msec
(3 rows)
 
The problem comes in when we are selecting multiple field_name values
in one query.  The normal SQL syntax we have been using is like this:

select field_name, option_tag from ds_rec_fld where recid = 3001 and
field_name in ('Q3A1', 'Q3A9');

This is just a simplified example, at times there can be a lot of
field_name values in one query in the "in" clause.  Here postgresql
refuses to use the full index, instead doing a filter based on part of
the first recid part of index.  Here is the explain analyze output:

 Index Scan using ds_rf_ndx1 on ds_rec_fld  (cost=0.00..30425.51
rows=80 width=38) (actual time=0.18..1.08 rows=2 loops=1)
   Index Cond: (recid = 3001)
   Filter: ((field_name = 'Q3A1'::character varying) OR (field_name =
'Q3A9'::character varying))
 Total runtime: 1.12 msec
(4 rows)

So, 10 times longer.  This is an issue because at times we are
iterating through thousands of recid values.

I did a vacuum analyze, adjusted random_page_cost, etc. all to no
avail. 

I also noticed that the problem goes away when I reformat the query
like this:

select field_name, option_tag from ds_rec_fld where 
(recid = 3001 and field_name = 'Q3A1') or
(recid = 3001 and field_name = 'Q3A9')

Here is the explain analyze output for this:

  Index Scan using ds_rf_ndx1, ds_rf_ndx1 on ds_rec_fld 
(cost=0.00..326.57 rows=80 width=38) (actual time=0.07..0.10 rows=2
loops=1)
   Index Cond: (((recid = 3001) AND (field_name = 'Q3A1'::character
varying)) OR ((recid = 3001) AND (field_name = 'Q3A9'::character
varying)))
 Total runtime: 0.16 msec
(3 rows)

Much better.  So I have partially solved my own problem, but there are
other places that this is not this simple to fix.

Therefore, my question is, is there some way to force postgresql to use
the full index and still stick with the shorter "field_name in ('...',
'...')" syntax?  

If anyone has any thoughts please let me know.  Also it strikes me that
perhaps the optimizer could be tweaked to treat the first case like the
second one.  Thanks in advance,

Rob

__
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] vacuum locking

2003-10-24 Thread Rob Nagler
Stephen writes:
> I ran into the same problem with VACUUM on my Linux box. If you are running
> Linux, take a look at "elvtune" or read this post:

The default values were -r 64 -w 8192.  The article said this was
"optimal".  I just futzed with different values anywere from -w 128 -r
128 to -r 16 -w 8192.  None of these mattered much when vacuum is
running. 

This is a RAID1 box with two disks.  Even with vacuum and one other
postmaster running, it's still got to get a lot of blocks through the
I/O system.

Rob



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] vacuum locking

2003-10-24 Thread Rob Nagler
Mario Weilguni writes:
> of course both approaches have advantages, it simply depends on the usage 
> pattern. A case where oracle really rules over postgresql are m<-->n 
> connection tables where each record consist of two foreign keys, the 
> overwrite approach is a big win here.

That's usually our case.  My company almost always has "groupware"
problems to solve.  Every record has a "realm" (security) foreign key
and typically another key.  The infrastructure puts the security
key on queries to avoid returning the wrong realm's data.

Rob



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] vacuum locking

2003-10-24 Thread Rob Nagler
Vivek Khera writes:
> Also, how close are you to the capacity of your disk bandwidth?  I
> don't see that in your numbers.  I know in freebsd I can run "systat
> -vmstat" and it gives me a percentage of utilization that lets me know
> when I'm near the capacity.

The vacuum totally consumes the system.  It's in a constant "D".  As
near as I can tell, it's hitting all blocks in the database.

The problem is interactive performance when vacuum is in a D state.
Even with just two processes doing "stuff" (vacuum and a select, let's
say), the select is very slow.

My understanding of the problem is that if a query hits the disk hard
(and many of my queries do) and vacuum is hitting the disk hard, they
contend for the same resource and nobody wins.  The query optimizer
has lots of problems with my queries and ends up doing silly sorts.
As a simple example, one query goes like this:

select avg(f1) from t1 group by f2;

This results in a plan like:

 Aggregate  (cost=171672.95..180304.41 rows=115086 width=32)
   ->  Group  (cost=171672.95..177427.26 rows=1150862 width=32)
 ->  Sort  (cost=171672.95..174550.10 rows=1150862 width=32)
   Sort Key: f2
   ->  Seq Scan on t1  (cost=0.00..39773.62 rows=1150862 width=32)

This is of course stupid, because it sorts a 1M rows, which probably
means it has to hit disk (sort_mem can only be so large).  Turns out
there are only about 20 different values of f2, so it would be much
better to aggregate without sorting.  This is the type of query which
runs while vacuum runs and I'm sure the two are just plain
incompatible.  vacuum is read intensive and this query is write
intensive.

Rob



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] vacuum locking

2003-10-24 Thread Rob Nagler
Greg Stark writes:
> Note that pctfree/pctused are a big performance drain on the usual case. Try
> setting them to 0/100 on a table that doesn't get updates (like a many-many
> relation table) and see how much faster it is to insert and scan.

Right.  You can optimize each table independently.  The "usual" case
doesn't exist in most databases, I've found, which is why Oracle does
better. 

> Judging by the number of FAQ lists out there that explain various quirks of
> rollback segment configuration I wouldn't say it's so easily configured.

Maybe we just got lucky. :-)

> The biggest problem is on systems where there's a combination of both users.

As is ours.

> You need tremendous rollback segments to deal with the huge volume of oltp
> transactions that can occur during a single DSS query. And the DSS query
> performance is terrible as it has to check the rollback segments for a large
> portion of the blocks it reads.

The DSS issues only come into play I think if the queries are long.
This is our problem.  Postgres does a bad job with DSS, I believe.  I
mentioned the select avg(f1) from t1 group by f2 in another message.
If it were optimized for "standard" SQL, such as, avg, sum, etc., I
think it would do a lot better with DSS-type problems.  Our problem
seems to be that the DSS queries almost always hit disk to sort.

> Arguably it's the other way around. Postgres's approach wins whenever most of
> the tuples in a table have been updated, in that case it just has to scan the
> whole table ignoring old records not visible to the transaction. Oracle has to
> consult the rollback segment for any recently updated tuple. Oracle's wins in
> the case where most of the tuples haven't changed so it can just scan the
> table without consulting lots of rollback segments.

I see what you're saying.  I'm not a db expert, just a programmer
trying to make his queries go faster, so I'll acknowledge that the
design is theoretically better. 

In practice, I'm still stuck.  As a simple example, this query
select avg(f1) from t1 group by f2

Takes 33 seconds (see explain analyze in another note in this thread)
to run on idle hardware with about 1GB available in the cache.  It's
clearly hitting disk to do the sort.  Being a dumb programmer, I
changed the query to:

select f1 from t1;

And wrote the rest in Perl.  It takes 4 seconds to run.  Why?  The
Perl doesn't sort to disk, it aggregates in memory.  There are 18 rows
returned.  What I didn't mention is that I originally had:

select avg(f1), t2.name from t1, t2 where t2.f2 = t1.f2 group by t2.name;

Which is much worse:

 Aggregate  (cost=161046.30..162130.42 rows=8673 width=222) (actual 
time=72069.10..87455.69 rows=18 loops=1)
   ->  Group  (cost=161046.30..161479.95 rows=86729 width=222) (actual 
time=71066.38..78108.17 rows=963660 loops=1)
 ->  Sort  (cost=161046.30..161263.13 rows=86729 width=222) (actual 
time=71066.36..72445.74 rows=963660 loops=1)
   Sort Key: t2.name
   ->  Merge Join  (cost=148030.15..153932.66 rows=86729 width=222) 
(actual time=19850.52..27266.40 rows=963660 loops=1)
 Merge Cond: ("outer".f2 = "inner".f2)
 ->  Sort  (cost=148028.59..150437.74 rows=963660 width=58) 
(actual time=19850.18..21750.12 rows=963660 loops=1)
   Sort Key: t1.f2
   ->  Seq Scan on t1  (cost=0.00..32479.60 rows=963660 
width=58) (actual time=0.06...39 rows=963660 loops=1)
 ->  Sort  (cost=1.56..1.60 rows=18 width=164) (actual 
time=0.30..737.59 rows=931007 loops=1)
   Sort Key: t2.f2
   ->  Seq Scan on t2  (cost=0.00..1.18 rows=18 width=164) 
(actual time=0.05..0.08 rows=18 loops=1)
 Total runtime: 87550.31 msec

Again, there are about 18 values of f2.  The optimizer even knows this
(it's a foreign key to t2.f2), but instead it does the query plan in
exactly the wrong order.  It hits disk probably 3 times as much as the
simpler query judging by the amount of time this query takes (33 vs 88
secs).  BTW, adding an index to t1.f2 has seriously negative effects
on many other DSS queries.

I'm still not sure that the sort problem is our only problem when
vacuum runs.  It's tough to pin down.  We'll be adding more memory to
see if that helps with the disk contention.

Rob



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] vacuum locking

2003-10-27 Thread Rob Nagler
Greg Stark writes:
> Sorry I was unclear. By "usual case" I meant reading, as opposed to updates.
> The size of the on-disk representation turns out to be a major determinant in
> a lot of database applications, since the dominant resource is i/o bandwidth.
> Try doing a fresh import of a large table with pctfree 0 pctuse 100 and
> compare how long a select takes on it compared to the original table.

BTW, I greatly appreciate your support on this stuff.  This list is a
fantastic resource.

I think we agree.  The question is what is the workload.  On tables
without updates, postgres will be fast enough.  However, postgres is
slow on tables with updates afaict.  I think of OLTP as a system with
updates.  One can do DSS on an OLTP database with Oracle, at least it
seems to work for one of our projects.

> FIrstly, that type of query will be faster in 7.4 due to implementing a new
> method for doing groups called hash aggregates.

We'll be trying it as soon as it is out.

> Secondly you could try raising sort_mem. Postgres can't know how much memory
> it really has before it swaps, so there's a parameter to tell it. And swapping
> would be much worse than doing disk sorts.

It is at 8000.  This is probably as high as I can go with multiple
postmasters.  The sort area is shared in Oracle (I think :-) in the
UGA.

> You can raise sort_mem to tell it how much memory it's allowed to
> use before it goes to disk sorts. You can even use ALTER SESSION to
> raise it in a few DSS sessions but leave it low the many OLTP
> sessions. If it's high in OLTP sessions then you could quickly hit
> swap when they all happen to decide to use the maximum amount at the
> same time. But then you don't want to be doing big sorts in OLTP
> sessions anyways.

This is a web app.  I can't control what the user wants to do.
Sometimes they update data, and other times they simply look at it.

I didn't find ALTER SESSION for postgres (isn't that Oracle?), so I
set sort_mem in the conf file to 512000, restarted postrgres.  Reran
the simpler query (no name) 3 times, and it was still 27 secs.

> Unfortunately there's no way to tell how much memory it thinks it's
> going to use. I used to use a script to monitor the pgsql_tmp
> directory in the database to watch for usage.

I don't have to.  The queries that run slow are hitting disk.
Anything that takes a minute has to be writing to disk.

> Well, first of all it doesn't really because you said to group by t2.name not
> f1. You might expect it to at least optimize something like this:

I put f2 in the group by, and it doesn't matter.  That's the point.
It's the on-disk sort before the aggregate that's killing the query.

> but even then I don't think it actually is capable of using foreign keys as a
> hint like that. I don't think Oracle does either actually, but I'm not sure.

I'll be finding out this week.

> To convince it to do the right thing you would have to do either:
> 
> SELECT a, t2.name 
>   FROM (SELECT avg(f1),f2 FROM t1 GROUP BY f2) AS t1 
>   JOIN t2 USING (f2)
> 
> Or use a subquery:
> 
> SELECT a, (SELECT name FROM t2 WHERE t2.f2 = t1.f2)
>   FROM t1
>  GROUP BY f2 

This doesn't solve the problem.  It's the GROUP BY that is doing the
wrong thing.  It's grouping, then aggregating.

Rob



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] vacuum locking

2003-10-27 Thread Rob Nagler
Greg Stark writes:
> I don't understand why you would expect overwriting to win here. 
> What types of updates do you do on these tables? 

These are statistics that we're adjusting.  I think that's pretty
normal stuff.  The DSS component is the avg() of these numbers on
particular groups.  The groups are related to foreign keys to
customers and other things.

> Normally I found using update on such a table was too awkward to
> contemplate so I just delete all the relation records that I'm
> replacing for the key I'm working with and insert new ones. This
> always works out to be cleaner code. In fact I usually leave such
> tables with no UPDATE grants on them.

In accounting apps, we do this, too.  It's awkward with all the
relationships to update all the records in the right order.  But
Oracle wins on delete/insert, too, because it reuses the tuples it
already has in memory, and it can reuse the same foreign key index
pages, too, since the values are usually the same.

The difference between Oracle and postgres seems to be optimism.
postgres assumes the transaction will fail and/or that a transaction
will modify lots of data that is used by other queries going on in
parallel.  Oracle assumes that the transaction is going to be
committed, and it might as well make the changes in place.

> In that situation I would have actually expected Postgres to do as well as or
> better than Oracle since that makes them both functionally
> equivalent.

I'll find out soon enough. :-)

Rob



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] vacuum locking

2003-10-29 Thread Rob Nagler
Greg Stark writes:
> > > SELECT a, (SELECT name FROM t2 WHERE t2.f2 = t1.f2)
> > >   FROM t1
> > >  GROUP BY f2 
> > 
> > This doesn't solve the problem.  It's the GROUP BY that is doing the
> > wrong thing.  It's grouping, then aggregating.
> 
> But at least in the form above it will consider using an index on f2, and it
> will consider using indexes on t1 and t2 to do the join.

There are 20 rows in t2, so an index actually slows down the join.
I had to drop the index on t1.f2, because it was trying to use it
instead of simply sorting 20 rows.

I've got preliminary results for a number of "hard" queries between
oracle and postgres (seconds):

 PG ORA 
  0   5 q1
  1   0 q2
  0   5 q3
  2   1 q4
219   7 q5
217   5 q6
 79   2 q7
 31   1 q8

These are averages of 10 runs of each query.  I didn't optimize
pctfree, etc., but I did run analyze after the oracle import.

One of the reason postgres is faster on the q1-4 is that postgres
supports OFFSET/LIMIT, and oracle doesn't.  q7 and q8 are the queries
that I've referred to recently (avg of group by).

q5 and q6 are too complex to discuss here, but the fundamental issue
is the order in which postgres decides to do things.  The choice for
me is clear: the developer time trying to figure out how to make the
planner do the "obviously right thing" has been too high with
postgres.  These tests demonstate to me that for even complex queries,
oracle wins for our problem.

It looks like we'll be migrating to oracle for this project from these
preliminary results.  It's not just the planner problems.  The
customer is more familiar with oracle, and the vacuum performance is
another problem.

Rob

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] vacuum locking

2003-10-30 Thread Rob Nagler
Josh Berkus writes:
> I hope that you'll stay current with PostgreSQL developments so that you can 
> do a similarly thourough evaluation for your next project.

Oh, no worries.  This project just happens to be a tough one.  We're
heavily invested in Postgres.  Other projects we maintain that use
Postgres are zoescore.com, colosla.org, and paintedsnapshot.com.

I am currently working on a very large project where the customer is
very committed to Postgres/open source.  We're in discussions about
what to do about the scalability problems we saw in the other project.
You can help by addressing a dilema we (my new customer and I) see.
I apologize for the length of what follows, but I'm trying to be as
clear as possible about our situation.

I have had a lot push back from the core Postgres folks on the idea of
planner hints, which would go a long way to solve the performance
problems we are seeing.  I presented an alternative approach: have a
"style sheet" (Scribe, LaTex) type of solution in the postmaster,
which can be customized by end users.  That got no response so I
assume it wasn't in line with the "Postgres way" (more below).

The vacuum problem is very serious for the problematic database to the
point that one of my customer's customers said:

However, I am having a hard time understanding why the system is so
slow... from my perspective it seems like you have some fundamental
database issues that need to be addressed.

This is simply unacceptable, and that's why we're moving to Oracle.
It's very bad for my business reputation.

I don't have a ready solution to vacuuming, and none on the list have
been effective.  We'll be adding more memory, but it seems to be disk
bandwidth problem.  I run Oracle on much slower system, and I've never
noticed problems of this kind, even when a database-wide validation is
running.  When vacuum is running, it's going through the entire
database, and that pretty much trashes all other queries, especially
DSS queries.  As always it is just software, and there's got to be
80/20 solution.

Our new project is large, high-profile, but not as data intensive as
the problematic one.  We are willing to commit significant funding and
effort to make Postgres faster.  We are "business value" driven.  That
means we solve problems practically instead of theoretically.  This
seems to be in conflict with "the Postgres way", which seems to be
more theoretical.  Our business situation comes ahead of theories.

My customer (who monitors this list) and I believe that our changes
would not be accepted back into the Postgres main branch.  That
presents us with a difficult situation, because we don't want to own a
separate branch.  (Xemacs helped push emacs, and maybe that's what has
to happen here, yet it's not a pretty situation.)

We'll be meeting next week to discuss the situation, and how we'll go
forward.   We have budget in 2003 to spend on this, but only if the
situation can be resolved.  Otherwise, we'll have to respect the data
we are seeing, and think about our choice of technologies.

Thanks for the feedback.

Rob



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] vacuum locking

2003-10-30 Thread Rob Nagler
scott.marlowe writes:
> t2 was 'vacuum full'ed and analyzed, right?  Just guessing.

Fresh import.  I've been told this includes a ANALYZE.

Rob



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] vacuum locking

2003-10-30 Thread Rob Nagler
Tom Lane writes:
> Rob Nagler <[EMAIL PROTECTED]> writes:
> > q5 and q6 are too complex to discuss here,
> 
> How do you expect us to get better if you don't show us the problems?

With all due respect and thanks for the massive amount of help, I have
presented the problems.  q5 and q6 are a subset of the following
general problems:

* Multiple ORDER BY results in no index used.
  Solution: drop multiple ORDER BY, only use first

* Vacuum locks out interactive users
  Solution: don't run vacuum full and only run vacuum at night

* Low cardinality index on large table confuses planner
  Solution: Drop (foreign key) index, which hurts other performance

* Grouped aggregates result in disk sort
  Solution: Wait to 7.4 (may work), or write in Perl (works today)

* Extreme non-linear performance (crossing magic number in
  optimizer drops performance three orders of magnitude)
  Solution: Don't cross magic number, or code in Perl

The general problem is that our system generates 90% of the SQL we
need.  There are many advantages to this, such as being able to add
OFFSET/LIMIT support with a few lines of code in a matter of hours.
Every time we have to custom code a query, or worse, code it in Perl,
we lose many benefits.  I understand the need to optimize queries, but
my general experience with Oracle is that I don't have to do this very
often.  When the 80/20 rule inverts, there's something fundamentally
wrong with the model.  That's where we feel we're at.  It's cost us a
tremendous amount of money to deal with these query optimizations.

The solution is not to fix the queries, but to address the root
causes.  That's what my other note in this thread is about.  I hope
you understand the spirit of my suggestion, and work with us to
finding an acceptable approach to the general problems.

> BTW, have you tried any of this with a 7.4beta release?

I will, but for my other projects, not this one.  I'll run this data,
because it's a great test case.

We have a business decision to make: devote more time to Postgres or
go with Oracle.  I spent less than a day getting the data into Oracle
and to create the benchmark.  The payoff is clear, now.  The risk of
7.4 is still very high, because the vacuum problem still looms and a
simple "past performance is a good indicator of future performance".
Going forward, there's no choice.  We've had to limit end-user
functionality to get Postgres working as well as it does, and that's
way below where Oracle is without those same limits and without any
effort put into tuning.

Thanks again for all your support.

Rob



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Pg+Linux swap use

2003-10-31 Thread Rob Sell
Not being one to hijack threads, but I haven't heard of this performance hit
when using HT, I have what should all rights be a pretty fast server, dual
2.4 Xeons with HT 205gb raid 5 array, 1 gig of memory. And it is only 50% as
fast as my old server which was a dual AMD MP 1400's with a 45gb raid 5
array and 1gb of ram. I have read everything I could find on Pg performance
tweaked all the variables that were suggested and nothing. Which is why I
subscribed to this list, just been lurking so far but this caught my eye. 

Rob

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Andrew Sullivan
Sent: Friday, October 31, 2003 8:36 AM
To: [EMAIL PROTECTED]
Subject: Re: [PERFORM] Pg+Linux swap use

On Fri, Oct 31, 2003 at 12:03:59PM -0200, alexandre :: aldeia digital wrote:
> Scott, Jeff and Shridhar:
> 
> 1 GB RAM :)
> 
> The stock kernels are not the same, HyperThreading enabled. 80

Some people have reported that things actually slow down with HT
enabled.  Have you tried turning it off?

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 8: explain analyze is your friend


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Pg+Linux swap use

2003-10-31 Thread Rob Sell
For the record I am running on SuSE with a pretty much stock kernel. Not to
sound naïve, but is turning of HT something done in the bios?

Rob

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bill Moran
Sent: Friday, October 31, 2003 9:56 AM
To: [EMAIL PROTECTED]
Subject: Re: [PERFORM] Pg+Linux swap use

Just for an additional viewpoint.  I'm finishing up a project based on
FreeBSD
and PostgreSQL.  The target server is a Dual 2.4G Intel machine.  I have
tested
the application with hyperthreading enabled and disabled.  To all
appearances,
enabling hyperthreading makes the box act like a quad, with the expected
increase
in processing capability - _for_this_application_.

I have also heard the claims and seen the tests that show hyperthreading
occasionally decreasing performance.  I think in the end, you just have to
test your particular application to see how it reacts.

Rob Sell wrote:
> Not being one to hijack threads, but I haven't heard of this performance
hit
> when using HT, I have what should all rights be a pretty fast server, dual
> 2.4 Xeons with HT 205gb raid 5 array, 1 gig of memory. And it is only 50%
as
> fast as my old server which was a dual AMD MP 1400's with a 45gb raid 5
> array and 1gb of ram. I have read everything I could find on Pg
performance
> tweaked all the variables that were suggested and nothing. Which is why I
> subscribed to this list, just been lurking so far but this caught my eye. 
> 
> Rob
> 
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Andrew
Sullivan
> Sent: Friday, October 31, 2003 8:36 AM
> To: [EMAIL PROTECTED]
> Subject: Re: [PERFORM] Pg+Linux swap use
> 
> On Fri, Oct 31, 2003 at 12:03:59PM -0200, alexandre :: aldeia digital
wrote:
> 
>>Scott, Jeff and Shridhar:
>>
>>1 GB RAM :)
>>
>>The stock kernels are not the same, HyperThreading enabled. 80
> 
> 
> Some people have reported that things actually slow down with HT
> enabled.  Have you tried turning it off?
> 
> A
> 


-- 
Bill Moran
Potential Technologies
http://www.potentialtech.com


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Where to start for performance problem?

2003-11-25 Thread Rob Fielding
The problems with giving suggestions about increasing performance is 
that one persons increase is another persons decrease.

having said that, there are a few general suggestions :

Set-up some shared memory, about a tenth of your available RAM, and 
configure shared_memory and max_clients correctly. I've used the 
following formula, ripped off the net from somewhere. It's not entirely 
acurate, as other settings steal a little shared memory, but it works 
for the most part :

((1024*RAM_SIZE) - (14.2 * max_connections) - 250) / 8.2

as I say, it should get you a good value, otherwise lower it bit by bit 
if you have trouble starting your db.

Increase effective_cache (50%-70% avail ram) and sort_mem (about 1/20th 
ram) and lower you random_page_cost to around 2 or less (as low as 0.3) 
if you have fast SCSI drives in a RAID10 set-up - this was a big speedup ;)

But this might not be the answer though. The values detailed above are 
when tuning an already stable setup.

Perhaps you need to look at your system resource usage. If you're 
degrading performance over time it sounds to me like you are slowly 
running out of memory and swap ?

Generall if I take something over, I'll try and get it onto my terms. 
Have you tried importing the DB to a fresh installation, one where you 
know sensible defaults are set, so you aren't inheriting any cruft from 
the previous sysadmin.

To be honest tho, I've never run pg so that it actually shutdown because 
it was running so badly - i just wouldn't think it would do that.

--

Rob Fielding
[EMAIL PROTECTED]
www.dsvr.co.uk  Development Designer Servers Ltd

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] tuning questions

2003-12-04 Thread Rob Fielding

I've tweaked shared buffers to 8192, pushed sort memory to 2048, vacuum
memory to 8192, and effective cache size to 1.
/proc/sys/kernel/shmmax is set to 16 and /proc/sys/fs/file-max
is set to 65536. Ulimit -n 3192.
Your sharedmemory is too high, and not even being used effectivey. Your 
other settings are too low.

Ball park guessing here, but I'd say first read (and understand) this:

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

Then make shared memory about 10-20% available ram, and set:

((shmmax/1024) - ( 14.2 * max_connections ) - 250 ) / 8.2 = shared_buffers

decrease random_page_cost to 0.3 and wack up sort mem by 16 times, 
effective cache size to about 50% RAM (depending on your other settings) 
and try that for starters.

--

Rob Fielding
[EMAIL PROTECTED]
www.dsvr.co.uk  Development Designer Servers Ltd

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] Pgbench results

2003-12-06 Thread Rob Sell
Greetings all, 

I'm wondering is there a website where people can submit their pgbench
results along with their hardware and configuration's? If so where are they
at? I have yet to find any. I think this could be a very useful tool not
only for people looking at setting up a new server but for people trying to
tune their db...

Thanks
Rob 


---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] WAL Optimisation - configuration and usage

2004-02-28 Thread Rob Fielding
  48468
$ vmstat 5
  procs  memoryswap  io 
system cpu
r  b  w   swpd   free   buff  cache  si  sobibo   incs  us  
sy  id
0  7  1  29588  10592  15700 809060   1   097750   103  
13   9  79
3  8  0  29588  11680  15736 807620   0   0  3313   438 1838  3559  19  
13  68
2 13  1  29588  12808  15404 800328   0   0  4470   445 1515  1752   
7   7  86
0  9  1  29588  10992  15728 806476   0   0  2933   781 1246  2686  14  
10  76
2  5  1  29588  11336  15956 807884   0   0  3354   662 1773  5211  27  
17  57
4  5  0  29696  13072  16020 813872   0  24  4282   306 2632  7862  45  
25  31
4  6  1  29696  10400  16116 815084   0   0  5086   314 2668  7893  47  
26  27
9  2  1  29696  13060  16308 814232  27   0  3927   748 2586  7836  48  
29  23
3  8  1  29696  10444  16232 812816   3   0  4015   433 2443  7180  47  
28  25
8  4  0  29696  10904  16432 812488   0   0  4537   500 2616  8418  46  
30  24
4  6  2  29696  11048  16320 810276   0   0  6076   569 1893  3919  20  
14  66
0  5  0  29696  10480  16600 813788   0   0  4595   435 2400  6215  33  
21  46
3  6  0  29696  10536  16376 812248   0   0  3802   504 2417  7921  43  
25  32
1  6  1  29696  11236  16500 809636   0   0  3691   357 2171  5199  24  
15  61
0 14  1  29696  10228  16036 801368   0   0  4038   561 1566  3288  16  
12  72

Sorry it's so long but I thought some brief info would be better than 
not. Thanks for reading,

--

Rob Fielding
Development
Designer Servers Ltd
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] WAL Optimisation - configuration and usage

2004-02-29 Thread Rob Fielding
Rod Taylor wrote:

random_page_cost = 0.5
   

Try a value of 2 for a while.

 

OK thanks Richard and Rod. I've upped this to 2. I think I left this 
over from a previous play with setttings on my IDE RAID 0 workstation. 
It seemed to have a good effect being set as a low float so it stuck.

I've set it to 2.

From another post off list, I've also bumped up

max_fsm_relations = 1000   # min 10, fsm
max_fsm_pages = 2  # min 1000, fs
vacuum_mem = 32768  # min 1024
as they did seem a little low. I'm hesitant to set them too high at this 
stage as I'd prefer to keep as much RAM available for runtime at this time.

I'm still hoping that perhaps the uber-pgadmin Mr Lane might reply about 
my WAL issue :) however I'm getting the feeling now the server is 
running with a much higher level of performance than it has been. Won't 
know until tomorrow thought.

Cheers,

--
Rob Fielding
Development
Designer Servers Ltd
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] WAL Optimisation - configuration and usage

2004-03-01 Thread Rob Fielding
Further update to my WAL experimentation. pg_xlog files have increased 
to 81, and checking today up to 84. Currently nothing much going on with 
the server save a background process running a select every 30 seconds 
with almost no impact (according to IO from vmstats).

This in itself is a good sign - an improvement on running last week, but 
I'd still like to get clarification on WAL file usage if possible.

Log file tailing has nothing more interesting than a whole set of 
"recycled transaction log file" entries :

2004-03-01 16:01:55 DEBUG:  recycled transaction log file 00710017
2004-03-01 16:07:01 DEBUG:  recycled transaction log file 00710018
2004-03-01 16:17:14 DEBUG:  recycled transaction log file 00710019
2004-03-01 16:22:20 DEBUG:  recycled transaction log file 0071001A
2004-03-01 16:32:31 DEBUG:  recycled transaction log file 0071001B
2004-03-01 16:37:36 DEBUG:  recycled transaction log file 0071001C
2004-03-01 16:47:48 DEBUG:  recycled transaction log file 0071001D
2004-03-01 16:52:54 DEBUG:  recycled transaction log file 0071001E
2004-03-01 17:03:05 DEBUG:  recycled transaction log file 0071001F
Looks kinda automated, but the times aren't quite even at around 6-10 
minutes apart.

cheers,
--
Rob Fielding
[EMAIL PROTECTED]
www.dsvr.co.uk  Development Designer Servers Ltd

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Time to put theory to the test?

2011-04-25 Thread Rob Wultsch
On Mon, Apr 25, 2011 at 12:30 PM, J Sisson  wrote:
> machines, and virtually every MySQL machine has required data cleanup
> and table scans and tweaks to get it back to "production" status.

Tip from someone that manages thousands of MySQL servers: Use InnoDB
when using MySQL. Using a crash unsafe product will yield undesirable
results when a server crashes. It is also faster for many use cases.

InnoDB is crash safe. It is just that simple.

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware advice for scalable warehouse db

2011-07-15 Thread Rob Wultsch
On Fri, Jul 15, 2011 at 11:49 AM, chris r.  wrote:
> Hi list,
>
> Thanks a lot for your very helpful feedback!
>
>> I've tested MD1000, MD1200, and MD1220 arrays before, and always gotten
>> seriously good performance relative to the dollars spent
> Great hint, but I'm afraid that's too expensive for us. But it's a great
> way to scale over the years, I'll keep that in mind.
>
> I had a look at other server vendors who offer 4U servers with slots for
> 16 disks for 4k in total (w/o disks), maybe that's an even
> cheaper/better solution for us. If you had the choice between 16 x 2TB
> SATA vs. a server with some SSDs for WAL/indexes and a SAN (with SATA
> disk) for data, what would you choose performance-wise?
>
> Again, thanks so much for your help.
>
> Best,
> Chris

SATA drives can easily flip bits and postgres does not checksum data,
so it will not automatically detect corruption for you. I would steer
well clear of SATA unless you are going to be using a fs like ZFS
which checksums data. I would hope that a SAN would detect this for
you, but I have no idea.


-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] TCP Overhead on Local Loopback

2012-04-01 Thread Rob Wultsch
On Sun, Apr 1, 2012 at 1:24 PM, Ofer Israeli  wrote:
> Hi all,
>
> We are running performance tests using PG 8.3 on a Windows 2008 R2 machine
> connecting locally over TCP.

8.3 will be not supported in under a year. Time to start testing upgrades.

http://www.postgresql.org/support/versioning/

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] effective_cache_size on 32-bits postgres

2013-03-18 Thread Rob Wultsch
On Mon, Mar 18, 2013 at 10:53 AM, Rodrigo Barboza
 wrote:
> Hi guys, I am worried about the effective_cache_size.
> I run a 32-bits postgres installation on a machine with 64 bits kernel.
> Should I limit effective_cache_size to a maximum of 2.5gb?

That variables refers to fs cache, so 32 bit pg should not matter.
Shared buffers and similar variables will be another matter.

Why the heck are you running 32 bit pg on a 64 bit system? You are
almost certainly doing it wrong.



--
Rob Wultsch
wult...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM]

2013-04-21 Thread Rob Wultsch
On Sun, Apr 21, 2013 at 5:46 AM, sunil virmani  wrote:

> My DB version is little old - 8.1.18.
>

Your db is exceptionally old and very much unsupported. Vacuum has
massively improved since 8.1 .

See http://www.postgresql.org/support/versioning/ regarding supported
versions.


[PERFORM] Deleting Rows From Large Tables

2013-05-17 Thread Rob Emery
Hi All,

We've got 3 quite large tables that due to an unexpected surge in
usage (!) have grown to about 10GB each, with 72, 32 and 31 million
rows in. I've been tasked with cleaning out about half of them, the
problem I've got is that even deleting the first 1,000,000 rows seems
to take an unreasonable amount of time. Unfortunately this is on quite
an old server (Dell 2950 with a RAID-10 over 6 disks) running Postgres
8.4; which serves other things like our logging systems.

If I run a sustained (more than about 5 minutes) delete it'll have a
detrimental effect on the other services. I'm trying to batch up the
deletes into small chunks of approximately 1 month of data ; even this
seems to take too long, I originally reduced this down to a single
day's data and had the same problem. I can keep decreasing the size of
the window I'm deleting but I feel I must be doing something either
fundamentally wrong or over-complicating this enormously. I've
switched over to retrieving a list of IDs to delete, storing them in
temporary tables and deleting based on the primary keys on each of the
tables with something similar to this:

BEGIN TRANSACTION;

CREATE TEMPORARY TABLE table_a_ids_to_delete (id INT);
CREATE TEMPORARY TABLE table_b_ids_to_delete (id INT);

INSERT INTO table_a_ids_to_delete
SELECT id FROM table_a WHERE purchased ='-infinity' AND created_at
< '2007-01-01T00:00:00';

INSERT INTO table_b_ids_to_delete
SELECT table_b_id FROM table_a_table_b_xref
INNER JOIN table_a_ids_to_delete ON (table_a_ids_to_delete.id =
table_a_table_b.quote_id);

DELETE FROM table_a_table_b_xref USING table_a_ids_to_delete
WHERE table_a_table_b_xref.table_a_id = table_a_ids_to_delete.id;

DELETE FROM table_b USING table_b_ids_to_delete
WHERE table_b.id = table_b_ids_to_delete.id;

DELETE FROM table_a USING table_a_ids_to_delete
WHERE table_a.id =  table_a_ids_to_delete.id;

COMMIT;

There're indices on table_a on the queried columns, table_b's primary
key is it's id, and table_a_table_b_xref has an index on (table_a_id,
table_b_id). There're FK defined on the xref table, hence why I'm
deleting from it first.

Does anyone have any ideas as to what I can do to make the deletes any
faster? I'm running out of ideas!

Thanks in advance,

--
Rob Emery


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Performant queries on table with many boolean columns

2016-04-20 Thread Rob Imig
Hey all,

New to the lists so please let me know if this isn't the right place for
this question.

I am trying to understand how to structure a table to allow for optimal
performance on retrieval. The data will not change frequently so you can
basically think of it as static and only concerned about optimizing reads
from basic SELECT...WHERE queries.

The data:

   - ~20 million records
   - Each record has 1 id and ~100 boolean properties
   - Each boolean property has ~85% of the records as true


The retrieval will always be something like "SELECT id FROM  WHERE
.

 will be some arbitrary set of the ~100 boolean columns and you
want the ids that match all of the conditions (true for each boolean
column). Example:
WHERE prop1 AND prop18 AND prop24


The obvious thing seems to make a table with ~100 columns, with 1 column
for each boolean property. Though, what type of indexing strategy would one
use on that table? Doesn't make sense to do BTREE. Is there a better way to
structure it?


Any and all advice/tips/questions appreciated!

Thanks,
Rob


Re: [PERFORM] Performant queries on table with many boolean columns

2016-04-21 Thread Rob Imig
Hey all,

Lots of interesting suggestions! I'm loving it.

Just came back to this a bit earlier today and made a sample table to see
what non-index performance would be. Constructed data just like above (used
12M rows and 80% true for all 100 boolean columns)

Here's an analyze for what I'd expect to be the types of queries that I'll
be handling from the frontend. I would expect around 40-70 properties per
query.

Now I'm going to start experimenting with some ideas above and other
tuning. This isn't as bad as I thought it would be, though would like to
get this under 200ms.

rimig=# explain analyze select count(*) from bloomtest where prop0 AND
prop1 AND prop2 AND prop3 AND prop4 AND prop5 AND prop6 AND prop7 AND prop8
AND prop9 AND prop10 AND prop11 AND prop12 AND prop13 AND prop14 AND prop15
AND prop16 AND prop17 AND prop18 AND prop19 AND prop20 AND prop21 AND
prop22 AND prop23 AND prop24 AND prop25 AND prop26 AND prop27 AND prop28
AND prop29 AND prop30 AND prop31 AND prop32 AND prop33 AND prop34 AND
prop35 AND prop36 AND prop37 AND prop38 AND prop39 AND prop40 AND prop41
AND prop42 AND prop43 AND prop44 AND prop45 AND prop46 AND prop47 AND
prop48 AND prop49 AND prop50 AND prop51 AND prop52 AND prop53 AND prop54
AND prop55 AND prop56 AND prop57 AND prop58 AND prop59 AND prop60 AND
prop61 AND prop62 AND prop63 AND prop64;

 Aggregate  (cost=351563.03..351563.04 rows=1 width=0) (actual
time=2636.829..2636.829 rows=1 loops=1)

   ->  Seq Scan on bloomtest  (cost=0.00..351563.02 rows=3 width=0) (actual
time=448.200..2636.811 rows=9 loops=1)

 Filter: (prop0 AND prop1 AND prop2 AND prop3 AND prop4 AND prop5
AND prop6 AND prop7 AND prop8 AND prop9 AND prop10 AND prop11 AND prop12
AND prop13 AND prop14 AND prop15 AND prop16 AND prop17 AND prop18 AND
prop19 AND prop20 AND prop21 AND prop22 AND prop23 AND prop24 AND prop25
AND prop26 AND prop27 AND prop28 AND prop29 AND prop30 AND prop31 AND
prop32 AND prop33 AND prop34 AND prop35 AND prop36 AND prop37 AND prop38
AND prop39 AND prop40 AND prop41 AND prop42 AND prop43 AND prop44 AND
prop45 AND prop46 AND prop47 AND prop48 AND prop49 AND prop50 AND prop51
AND prop52 AND prop53 AND prop54 AND prop55 AND prop56 AND prop57 AND
prop58 AND prop59 AND prop60 AND prop61 AND prop62 AND prop63 AND prop64)

 Rows Removed by Filter: 1191

 Total runtime: 2636.874 ms

On Thu, Apr 21, 2016 at 12:45 PM, Jeff Janes  wrote:

> On Wed, Apr 20, 2016 at 11:54 AM, Teodor Sigaev  wrote:
> >>
> >> The obvious thing seems to make a table with ~100 columns, with 1 column
> >> for each boolean property. Though, what type of indexing strategy would
> >> one use on that table? Doesn't make sense to do BTREE. Is there a better
> >> way to structure it?
> >>
> > looks like a deal for contrib/bloom index in upcoming 9.6 release
>
> Not without doing a custom compilation with an increased INDEX_MAX_KEYS:
>
> ERROR:  cannot use more than 32 columns in an index
>
> But even so, I'm skeptical this would do better than a full scan.  It
> would be interesting to test that.
>
> Cheers,
>
> Jeff
>


Re: [PERFORM] Performant queries on table with many boolean columns

2016-04-22 Thread Rob Imig
.595..9359.566 rows=9 loops=1)

   Filter: (((bitstr)::"bit" &
B'1'::"bit")
=
B'1'::"bit")

   Rows Removed by Filter: 1191

 Total runtime: 9359.593 ms

(4 rows)


*Time: 9360.072 ms*


On Thu, Apr 21, 2016 at 3:34 PM, Rob Imig  wrote:

> Hey all,
>
> Lots of interesting suggestions! I'm loving it.
>
> Just came back to this a bit earlier today and made a sample table to see
> what non-index performance would be. Constructed data just like above (used
> 12M rows and 80% true for all 100 boolean columns)
>
> Here's an analyze for what I'd expect to be the types of queries that I'll
> be handling from the frontend. I would expect around 40-70 properties per
> query.
>
> Now I'm going to start experimenting with some ideas above and other
> tuning. This isn't as bad as I thought it would be, though would like to
> get this under 200ms.
>
> rimig=# explain analyze select count(*) from bloomtest where prop0 AND
> prop1 AND prop2 AND prop3 AND prop4 AND prop5 AND prop6 AND prop7 AND prop8
> AND prop9 AND prop10 AND prop11 AND prop12 AND prop13 AND prop14 AND prop15
> AND prop16 AND prop17 AND prop18 AND prop19 AND prop20 AND prop21 AND
> prop22 AND prop23 AND prop24 AND prop25 AND prop26 AND prop27 AND prop28
> AND prop29 AND prop30 AND prop31 AND prop32 AND prop33 AND prop34 AND
> prop35 AND prop36 AND prop37 AND prop38 AND prop39 AND prop40 AND prop41
> AND prop42 AND prop43 AND prop44 AND prop45 AND prop46 AND prop47 AND
> prop48 AND prop49 AND prop50 AND prop51 AND prop52 AND prop53 AND prop54
> AND prop55 AND prop56 AND prop57 AND prop58 AND prop59 AND prop60 AND
> prop61 AND prop62 AND prop63 AND prop64;
>
>  Aggregate  (cost=351563.03..351563.04 rows=1 width=0) (actual
> time=2636.829..2636.829 rows=1 loops=1)
>
>->  Seq Scan on bloomtest  (cost=0.00..351563.02 rows=3 width=0)
> (actual time=448.200..2636.811 rows=9 loops=1)
>
>  Filter: (prop0 AND prop1 AND prop2 AND prop3 AND prop4 AND prop5
> AND prop6 AND prop7 AND prop8 AND prop9 AND prop10 AND prop11 AND prop12
> AND prop13 AND prop14 AND prop15 AND prop16 AND prop17 AND prop18 AND
> prop19 AND prop20 AND prop21 AND prop22 AND prop23 AND prop24 AND prop25
> AND prop26 AND prop27 AND prop28 AND prop29 AND prop30 AND prop31 AND
> prop32 AND prop33 AND prop34 AND prop35 AND prop36 AND prop37 AND prop38
> AND prop39 AND prop40 AND prop41 AND prop42 AND prop43 AND prop44 AND
> prop45 AND prop46 AND prop47 AND prop48 AND prop49 AND prop50 AND prop51
> AND prop52 AND prop53 AND prop54 AND prop55 AND prop56 AND prop57 AND
> prop58 AND prop59 AND prop60 AND prop61 AND prop62 AND prop63 AND prop64)
>
>  Rows Removed by Filter: 1191
>
>  Total runtime: 2636.874 ms
>
> On Thu, Apr 21, 2016 at 12:45 PM, Jeff Janes  wrote:
>
>> On Wed, Apr 20, 2016 at 11:54 AM, Teodor Sigaev  wrote:
>> >>
>> >> The obvious thing seems to make a table with ~100 columns, with 1
>> column
>> >> for each boolean property. Though, what type of indexing strategy would
>> >> one use on that table? Doesn't make sense to do BTREE. Is there a
>> better
>> >> way to structure it?
>> >>
>> > looks like a deal for contrib/bloom index in upcoming 9.6 release
>>
>> Not without doing a custom compilation with an increased INDEX_MAX_KEYS:
>>
>> ERROR:  cannot use more than 32 columns in an index
>>
>> But even so, I'm skeptical this would do better than a full scan.  It
>> would be interesting to test that.
>>
>> Cheers,
>>
>> Jeff
>>
>
>