Re: [PERFORM] Partial index slower than regular index

2011-04-06 Thread Thom Brown
On 6 April 2011 05:44, Tom Lane  wrote:
> Thom Brown  writes:
>> The index doesn't get used.  There's probably a logical explanation,
>> which is what I'm curious about.
>
> Er ... it's broken?
>
> It looks like the index predicate expression isn't getting the right
> collation assigned, so predtest.c decides the query doesn't imply the
> index's predicate.  Too tired to look into exactly why right now, but
> it's clearly bound up in all the recent collation changes.

Testing it again with very explicit collations, it still has issues:

CREATE INDEX indextest_stuff ON indextest(stuff COLLATE "en_GB.UTF-8")
WHERE stuff COLLATE "en_GB.UTF-8" = 'bark' COLLATE "en_GB.UTF-8";

postgres=# explain analyze select * from indextest where stuff collate
"en_GB.UTF-8" = 'bark' collate "en_GB.UTF-8";
QUERY PLAN
---
 Seq Scan on indextest  (cost=0.00..143387.00 rows=8312 width=9)
(actual time=163.759..1308.316 rows=8000 loops=1)
   Filter: ((stuff)::text = 'bark'::text COLLATE "en_GB.UTF-8")
 Total runtime: 1308.821 ms
(3 rows)

But I'm possibly missing the point here.

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partial index slower than regular index

2011-04-06 Thread Thom Brown
On 6 April 2011 00:02, Kenneth Marshall  wrote:
> The stats seem off. Are you certain that an analyze has run?
>
> Cheers,
> Ken
>

Yes, an ANALYZE was definitely run against the table.

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Performance Tuning

2011-04-06 Thread Ákos Gábriel
On Apr 5, 2011, at 9:33 AM, Adarsh Sharma wrote:

> Now I have to start more queries on Database Server and issue new connections 
> after some time. Why the cached memory is not freed.

It's freed on-demand.

> Flushing the cache memory is needed & how it could use so much if I set

Why would forced flushing be needed? And why would it be useful? It is not.

> effective_cache_size = 4096 MB.

Watch the "cached" field of free's output and set effective_cache_size to that 
amount (given that your server is running postgres only, has no major other 
tasks)

> I think if i issue some new select queries on large set of data, it will use 
> Swap Memory & degrades Performance.

Have you ever tried that? Will not. 

> Please correct if I'm wrong.

You seem to know very little about Unix/Linux memory usage and how to interpret 
the tools' output.
Please read some (very basic) documentation for sysadmins regarding these 
subjects.
It will help you a lot to understand how things work.

-- 
Akos Gabriel
General Manager
Liferay Hungary Ltd.
Liferay Hungary Symposium, May 26, 2011 | Register today: 
http://www.liferay.com/hungary2011


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] help speeding up a query in postgres 8.4.5

2011-04-06 Thread Maria L. Wilson
Would really appreciate someone taking a look at the query below  
Thanks in advance!



this is on a linux box...
Linux dsrvr201.larc.nasa.gov 2.6.18-164.9.1.el5 #1 SMP Wed Dec 9 
03:27:37 EST 2009 x86_64 x86_64 x86_64 GNU/Linux


explain analyze
select MIN(IV.STRTDATE), MAX(IV.ENDDATE)
from GRAN_VER GV
left outer join INVENTORY IV on GV.GRANULE_ID = IV.GRANULE_ID, INVSENSOR 
INVS
where IV.INV_ID='65' and GV.GRANULE_ID = INVS.granule_id and 
INVS.sensor_id='13'



"Aggregate  (cost=736364.52..736364.53 rows=1 width=8) (actual 
time=17532.930..17532.930 rows=1 loops=1)"
"  ->  Hash Join  (cost=690287.33..734679.77 rows=336949 width=8) 
(actual time=13791.593..17323.080 rows=924675 loops=1)"

"Hash Cond: (invs.granule_id = gv.granule_id)"
"->  Seq Scan on invsensor invs  (cost=0.00..36189.41 
rows=1288943 width=4) (actual time=0.297..735.375 rows=1277121 loops=1)"

"  Filter: (sensor_id = 13)"
"->  Hash  (cost=674401.52..674401.52 rows=1270865 width=16) 
(actual time=13787.698..13787.698 rows=1270750 loops=1)"
"  ->  Hash Join  (cost=513545.62..674401.52 rows=1270865 
width=16) (actual time=1998.702..13105.578 rows=1270750 loops=1)"

"Hash Cond: (gv.granule_id = iv.granule_id)"
"->  Seq Scan on gran_ver gv  (cost=0.00..75224.90 
rows=4861490 width=4) (actual time=0.008..1034.885 rows=4867542 loops=1)"
"->  Hash  (cost=497659.81..497659.81 rows=1270865 
width=12) (actual time=1968.918..1968.918 rows=1270750 loops=1)"
"  ->  Bitmap Heap Scan on inventory iv  
(cost=24050.00..497659.81 rows=1270865 width=12) (actual 
time=253.542..1387.957 rows=1270750 loops=1)"

"Recheck Cond: (inv_id = 65)"
"->  Bitmap Index Scan on inven_idx1  
(cost=0.00..23732.28 rows=1270865 width=0) (actual time=214.364..214.364 
rows=1270977 loops=1)"

"  Index Cond: (inv_id = 65)"
"Total runtime: 17533.100 ms"

some additional info.
the table inventory is about 4481 MB and also has postgis types.
the table gran_ver is about 523 MB
the table INVSENSOR is about 217 MB

the server itself has 32G RAM with the following set in the postgres conf
shared_buffers = 3GB
work_mem = 64MB
maintenance_work_mem = 512MB
wal_buffers = 6MB

let me know if I've forgotten anything!  thanks a bunch!!

Maria Wilson
NASA/Langley Research Center
Hampton, Virginia
m.l.wil...@nasa.gov



*
*


Re: [PERFORM] help speeding up a query in postgres 8.4.5

2011-04-06 Thread Szymon Guz
On 5 April 2011 21:25, Maria L. Wilson  wrote:

>  Would really appreciate someone taking a look at the query below
> Thanks in advance!
>
>
> this is on a linux box...
> Linux dsrvr201.larc.nasa.gov 2.6.18-164.9.1.el5 #1 SMP Wed Dec 9 03:27:37
> EST 2009 x86_64 x86_64 x86_64 GNU/Linux
>
> explain analyze
> select MIN(IV.STRTDATE), MAX(IV.ENDDATE)
> from GRAN_VER GV
> left outer join INVENTORY IV on GV.GRANULE_ID = IV.GRANULE_ID, INVSENSOR
> INVS
> where IV.INV_ID='65' and GV.GRANULE_ID = INVS.granule_id and
> INVS.sensor_id='13'
>
>
> "Aggregate  (cost=736364.52..736364.53 rows=1 width=8) (actual
> time=17532.930..17532.930 rows=1 loops=1)"
> "  ->  Hash Join  (cost=690287.33..734679.77 rows=336949 width=8) (actual
> time=13791.593..17323.080 rows=924675 loops=1)"
> "Hash Cond: (invs.granule_id = gv.granule_id)"
> "->  Seq Scan on invsensor invs  (cost=0.00..36189.41 rows=1288943
> width=4) (actual time=0.297..735.375 rows=1277121 loops=1)"
> "  Filter: (sensor_id = 13)"
> "->  Hash  (cost=674401.52..674401.52 rows=1270865 width=16)
> (actual time=13787.698..13787.698 rows=1270750 loops=1)"
> "  ->  Hash Join  (cost=513545.62..674401.52 rows=1270865
> width=16) (actual time=1998.702..13105.578 rows=1270750 loops=1)"
> "Hash Cond: (gv.granule_id = iv.granule_id)"
> "->  Seq Scan on gran_ver gv  (cost=0.00..75224.90
> rows=4861490 width=4) (actual time=0.008..1034.885 rows=4867542 loops=1)"
> "->  Hash  (cost=497659.81..497659.81 rows=1270865
> width=12) (actual time=1968.918..1968.918 rows=1270750 loops=1)"
> "  ->  Bitmap Heap Scan on inventory iv
> (cost=24050.00..497659.81 rows=1270865 width=12) (actual
> time=253.542..1387.957 rows=1270750 loops=1)"
> "Recheck Cond: (inv_id = 65)"
> "->  Bitmap Index Scan on inven_idx1
> (cost=0.00..23732.28 rows=1270865 width=0) (actual time=214.364..214.364
> rows=1270977 loops=1)"
> "  Index Cond: (inv_id = 65)"
> "Total runtime: 17533.100 ms"
>
> some additional info.
> the table inventory is about 4481 MB and also has postgis types.
> the table gran_ver is about 523 MB
> the table INVSENSOR is about 217 MB
>
> the server itself has 32G RAM with the following set in the postgres conf
> shared_buffers = 3GB
> work_mem = 64MB
> maintenance_work_mem = 512MB
> wal_buffers = 6MB
>
> let me know if I've forgotten anything!  thanks a bunch!!
>
> Maria Wilson
> NASA/Langley Research Center
> Hampton, Virginia
> m.l.wil...@nasa.gov
>
>
>
Hi,
could you show us indexes that you have on all tables from this query? Have
you tried running vacuum analyze on those tables? Do you have autovacuum
active?

regards
Szymon


Re: [PERFORM] help speeding up a query in postgres 8.4.5

2011-04-06 Thread tv
> some additional info.
> the table inventory is about 4481 MB and also has postgis types.
> the table gran_ver is about 523 MB
> the table INVSENSOR is about 217 MB
>
> the server itself has 32G RAM with the following set in the postgres conf
> shared_buffers = 3GB
> work_mem = 64MB
> maintenance_work_mem = 512MB
> wal_buffers = 6MB

Not sure how to improve the query itself - it's rather simple and the
execution plan seems reasonable. You're dealing with a lot of data, so it
takes time to process.

Anyway, I'd try to bump up the shared buffers a bit (the tables you've
listed have about 5.5 GB, so 3GB of shared buffers won't cover it). OTOH
most of the data will be in pagecache maintained by the kernel anyway.

Try to increase the work_mem a bit, that might speed up the hash joins
(the two hash joins consumed about 15s, the whole query took 17s). This
does not require a restart, just do

set work_mem = '128MB'

(or 256MB) and then run the query in the same session. Let's see if that
works.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] help speeding up a query in postgres 8.4.5

2011-04-06 Thread Maria L. Wilson

Autovacuum is not running - but regular vacuums are being done twice daily.

indexes on inventory:

CREATE INDEX inven_idx1
  ON inventory
  USING btree
  (inv_id);

CREATE UNIQUE INDEX inven_idx2
  ON inventory
  USING btree
  (granule_id);

indexes on gran_ver:
CREATE UNIQUE INDEX granver_idx1
  ON gran_ver
  USING btree
  (granule_id);

indexes on sensor
CREATE INDEX invsnsr_idx2
  ON invsensor
  USING btree
  (sensor_id);




On 4/6/11 7:41 AM, Szymon Guz wrote:



On 5 April 2011 21:25, Maria L. Wilson > wrote:


Would really appreciate someone taking a look at the query
below  Thanks in advance!


this is on a linux box...
Linux dsrvr201.larc.nasa.gov 
2.6.18-164.9.1.el5 #1 SMP Wed Dec 9 03:27:37 EST 2009 x86_64
x86_64 x86_64 GNU/Linux

explain analyze
select MIN(IV.STRTDATE), MAX(IV.ENDDATE)
from GRAN_VER GV
left outer join INVENTORY IV on GV.GRANULE_ID = IV.GRANULE_ID,
INVSENSOR INVS
where IV.INV_ID='65' and GV.GRANULE_ID = INVS.granule_id and
INVS.sensor_id='13'


"Aggregate  (cost=736364.52..736364.53 rows=1 width=8) (actual
time=17532.930..17532.930 rows=1 loops=1)"
"  ->  Hash Join  (cost=690287.33..734679.77 rows=336949 width=8)
(actual time=13791.593..17323.080 rows=924675 loops=1)"
"Hash Cond: (invs.granule_id = gv.granule_id)"
"->  Seq Scan on invsensor invs  (cost=0.00..36189.41
rows=1288943 width=4) (actual time=0.297..735.375 rows=1277121
loops=1)"
"  Filter: (sensor_id = 13)"
"->  Hash  (cost=674401.52..674401.52 rows=1270865
width=16) (actual time=13787.698..13787.698 rows=1270750 loops=1)"
"  ->  Hash Join  (cost=513545.62..674401.52
rows=1270865 width=16) (actual time=1998.702..13105.578
rows=1270750 loops=1)"
"Hash Cond: (gv.granule_id = iv.granule_id)"
"->  Seq Scan on gran_ver gv 
(cost=0.00..75224.90 rows=4861490 width=4) (actual

time=0.008..1034.885 rows=4867542 loops=1)"
"->  Hash  (cost=497659.81..497659.81
rows=1270865 width=12) (actual time=1968.918..1968.918
rows=1270750 loops=1)"
"  ->  Bitmap Heap Scan on inventory iv 
(cost=24050.00..497659.81 rows=1270865 width=12) (actual

time=253.542..1387.957 rows=1270750 loops=1)"
"Recheck Cond: (inv_id = 65)"
"->  Bitmap Index Scan on
inven_idx1  (cost=0.00..23732.28 rows=1270865 width=0) (actual
time=214.364..214.364 rows=1270977 loops=1)"
"  Index Cond: (inv_id = 65)"
"Total runtime: 17533.100 ms"

some additional info.
the table inventory is about 4481 MB and also has postgis types.
the table gran_ver is about 523 MB
the table INVSENSOR is about 217 MB

the server itself has 32G RAM with the following set in the
postgres conf
shared_buffers = 3GB
work_mem = 64MB
maintenance_work_mem = 512MB
wal_buffers = 6MB

let me know if I've forgotten anything!  thanks a bunch!!

Maria Wilson
NASA/Langley Research Center
Hampton, Virginia
m.l.wil...@nasa.gov 



Hi,
could you show us indexes that you have on all tables from this query? 
Have you tried running vacuum analyze on those tables? Do you have 
autovacuum active?


regards
Szymon


Re: [PERFORM] help speeding up a query in postgres 8.4.5

2011-04-06 Thread Maria L. Wilson
thanks for the reply, Tomas.  I'll test bumping up work_mem and see how 
that helps.


thanks again,  Maria Wilson

On 4/6/11 9:16 AM, t...@fuzzy.cz wrote:

some additional info.
the table inventory is about 4481 MB and also has postgis types.
the table gran_ver is about 523 MB
the table INVSENSOR is about 217 MB

the server itself has 32G RAM with the following set in the postgres conf
shared_buffers = 3GB
work_mem = 64MB
maintenance_work_mem = 512MB
wal_buffers = 6MB

Not sure how to improve the query itself - it's rather simple and the
execution plan seems reasonable. You're dealing with a lot of data, so it
takes time to process.

Anyway, I'd try to bump up the shared buffers a bit (the tables you've
listed have about 5.5 GB, so 3GB of shared buffers won't cover it). OTOH
most of the data will be in pagecache maintained by the kernel anyway.

Try to increase the work_mem a bit, that might speed up the hash joins
(the two hash joins consumed about 15s, the whole query took 17s). This
does not require a restart, just do

set work_mem = '128MB'

(or 256MB) and then run the query in the same session. Let's see if that
works.

regards
Tomas



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] help speeding up a query in postgres 8.4.5

2011-04-06 Thread Kevin Grittner
"Maria L. Wilson"  wrote:
 
> Autovacuum is not running - but regular vacuums are being done
> twice daily.
 
Is the ANALYZE keyword used on those VACUUM runs?  What version of
PostgreSQL is this.  If it's enough to need fsm settings, do you run
with the VERBOSE option and check the end of the output to make sure
they are set high enough?
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] help speeding up a query in postgres 8.4.5

2011-04-06 Thread Tomas Vondra
Dne 6.4.2011 17:33, Kevin Grittner napsal(a):
> "Maria L. Wilson"  wrote:
>  
>> Autovacuum is not running - but regular vacuums are being done
>> twice daily.
>  
> Is the ANALYZE keyword used on those VACUUM runs?  What version of
> PostgreSQL is this.  If it's enough to need fsm settings, do you run
> with the VERBOSE option and check the end of the output to make sure
> they are set high enough?

Why do you think the problem is related to stale stats? It seems to me
fairly accurate - see the explain analyze in the first post). All the
nodes are less than 1% off (which is great), except for the last hash
join that returns 336949 rows instead of 924675 expected rows.

Maybe I'm missing something, but the stats seem to be quite accurate and
there is just very little dead tuples I guess.

regards
Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] very long updates very small tables

2011-04-06 Thread Tomas Vondra
Dne 4.4.2011 16:32, Kevin Grittner napsal(a):
> Nothing there makes a write glut on checkpoint less likely to be the
> cause.  Without a BBU write-back cache it is actually *more* likely,
> and having enough RAM to hold the whole database makes it *more*
> likely.  If you haven't placed your pg_xlog directory on a separate
> file system, it is also more likely.
>  
> Turning on logging of checkpoint activity and checking whether that
> correlates with your problem times is strongly indicated.
>  
> -Kevin

Checkpoints would be my first guess too, but the whole database is just
500MB. Lars, how did you get this number? Did you measure the amount of
disk space occupied or somehow else?

BTW how much memory is there (total RAM and dedicated to shared
buffers)? How many checkpoint segments are there?

Have you monitored the overall behavior of the system (what processes
are running etc.) when the problems occur? I don't have much experience
with Windows but tools from sysinternals are reasonable.

And yet another idea - have you tried to use the stats collected by
PostgreSQL? I mean the pg_stat_ tables, especially pg_stat_bgwriter and
maybe pg_stat_all_tables. Those numbers are cummulative, so do two
snapshot when the problems are happening and subtract them to get an
idea of what's going on.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Intel SSDs that may not suck

2011-04-06 Thread Scott Carey
I have generation 1 and 2 Intel MLC drives in production (~150+).  Some
have been around for 2 years.

None have died.  None have hit the write cycle limit.  We do ~ 75GB of
writes a day.

The data and writes on these are not transactional (if one dies, we have
copies).  But the reliability has been excellent.  We had the performance
degradation issues in the G1's that required a firmware update, and have
had to do a secure-erase a on some to get write performance back to
acceptable levels on a few.

I could care less about the 'fast' sandforce drives.  They fail at a high
rate and the performance improvement is BECAUSE they are using a large,
volatile write cache.  If I need higher sequential transfer rate, I'll
RAID some of these together.  A RAID-10 of 6 of these will make a simple
select count(1) query be CPU bound anyway.

I have some G3 SSD's I'll be doing power-fail testing on soon for database
use (currently, we only use the old ones for indexes in databases or
unimportant clone db's).

I have had more raid cards fail in the last 3 years (out of a couple
dozen) than Intel SSD's fail (out of ~150).  I do not trust the Intel 510
series yet -- its based on a non-Intel controller and has worse
random-write performance anyway.



On 3/28/11 9:13 PM, "Merlin Moncure"  wrote:

>On Mon, Mar 28, 2011 at 7:54 PM, Andy  wrote:
>> This might be a bit too little too late though. As you mentioned there
>>really isn't any real performance improvement for the Intel SSD.
>>Meanwhile, SandForce (the controller that OCZ Vertex is based on) is
>>releasing its next generation controller at a reportedly huge
>>performance increase.
>>
>> Is there any benchmark measuring the performance of these SSD's (the
>>new Intel vs. the new SandForce) running database workloads? The
>>benchmarks I've seen so far are for desktop applications.
>
>The random performance data is usually a rough benchmark.  The
>sequential numbers are mostly useless and always have been.  The
>performance of either the ocz or intel drive is so disgustingly fast
>compared to a hard drives that the main stumbling block is life span
>and write endurance now that they are starting to get capactiors.
>
>My own experience with MLC drives is that write cycle expectations are
>more or less as advertised. They do go down (hard), and have to be
>monitored. If you are writing a lot of data this can get pretty
>expensive although the cost dynamics are getting better and better for
>flash. I have no idea what would be precisely prudent, but maybe some
>good monitoring tools and phased obsolescence at around 80% duty cycle
>might not be a bad starting point.  With hard drives, you can kinda
>wait for em to pop and swap em in -- this is NOT a good idea for flash
>raid volumes.
>
>merlin
>
>-- 
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Intel SSDs that may not suck

2011-04-06 Thread Andy

--- On Wed, 4/6/11, Scott Carey  wrote:


> I could care less about the 'fast' sandforce drives. 
> They fail at a high
> rate and the performance improvement is BECAUSE they are
> using a large,
> volatile write cache.  

The G1 and G2 Intel MLC also use volatile write cache, just like most SandForce 
drives do.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Background fsck

2011-04-06 Thread Ireneusz Pluta

Hello,

I saw some recommendations from people on the net not to use background fsck when running PostgreSQL 
on FreeBSD. As I recall, these opinions were just thoughts of people which they shared with the 
community, following their bad experience caused by using background fsck. So, not coming any deeper 
with underatanding why not, I use that as a clear recommendation for myself and keep background fsck 
turned off on all my machines, regardless how much faster a server could come up after a crash.


But waiting so much time (like now) during foreground fsck of a large data filesystem after unclean 
shutdown, makes me to come to this group to ask whether I really need to avoid background fsck on a 
PostgreSQL machine? Could I hear your opinions?


Thanks

Irek.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Background fsck

2011-04-06 Thread Scott Marlowe
On Wed, Apr 6, 2011 at 4:33 PM, Ireneusz Pluta  wrote:
> Hello,
>
> I saw some recommendations from people on the net not to use background fsck
> when running PostgreSQL on FreeBSD. As I recall, these opinions were just
> thoughts of people which they shared with the community, following their bad
> experience caused by using background fsck. So, not coming any deeper with
> underatanding why not, I use that as a clear recommendation for myself and
> keep background fsck turned off on all my machines, regardless how much
> faster a server could come up after a crash.
>
> But waiting so much time (like now) during foreground fsck of a large data
> filesystem after unclean shutdown, makes me to come to this group to ask
> whether I really need to avoid background fsck on a PostgreSQL machine?
> Could I hear your opinions?

Shouldn't a journaling file system just come back up almost immediately?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Background fsck

2011-04-06 Thread Ireneusz Pluta

But waiting so much time (like now) during foreground fsck of a large data

filesystem after unclean shutdown, makes me to come to this group to ask
whether I really need to avoid background fsck on a PostgreSQL machine?
Could I hear your opinions?

Shouldn't a journaling file system just come back up almost immediately?


it's ufs2 with softupdates in my case. That's why I am asking abot background 
fsck.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Intel SSDs that may not suck

2011-04-06 Thread gnuoytr
Not for user data, only controller data.



 Original message 
>Date: Wed, 6 Apr 2011 14:11:10 -0700 (PDT)
>From: pgsql-performance-ow...@postgresql.org (on behalf of Andy 
>)
>Subject: Re: [PERFORM] Intel SSDs that may not suck  
>To: Merlin Moncure ,Scott Carey 
>Cc: "pgsql-performance@postgresql.org" ,Greg 
>Smith 
>
>
>--- On Wed, 4/6/11, Scott Carey  wrote:
>
>
>> I could care less about the 'fast' sandforce drives. 
>> They fail at a high
>> rate and the performance improvement is BECAUSE they are
>> using a large,
>> volatile write cache.  
>
>The G1 and G2 Intel MLC also use volatile write cache, just like most 
>SandForce drives do.
>
>-- 
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Intel SSDs that may not suck

2011-04-06 Thread Scott Carey


On 3/29/11 7:16 AM, "Jeff"  wrote:

>
>The write degradation could probably be monitored looking at svctime
>from sar. We may be implementing that in the near future to detect
>when this creeps up again.


For the X25-M's, overcommit.  Do a secure erase, then only partition and
use 85% or so of the drive (~7% is already hidden).  This helps a lot with
the write performance over time.  The Intel rep claimed that the new G3's
are much better at limiting the occasional write latency, by splitting
longer delays into slightly more frequent smaller delays.

Some of the benchmark reviews have histograms that demonstrate this
(although the authors of the review only note average latency or
throughput, the deviations have clearly gone down in this generation).

I'll know more for sure after some benchmarking myself.


>
>
>--
>Jeff Trout 
>http://www.stuarthamm.net/
>http://www.dellsmartexitin.com/
>
>
>
>
>-- 
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Intel SSDs that may not suck

2011-04-06 Thread Scott Carey


On 3/29/11 7:32 AM, "Jeff"  wrote:

>
>On Mar 29, 2011, at 10:16 AM, Jeff wrote:
>
>> Now that all sounds awful and horrible until you get to overall
>> performance, especially with reads - you are looking at 20k random
>> reads per second with a few disks.  Adding in writes does kick it
>> down a noch, but you're still looking at 10k+ iops. That is the
>> current trade off.
>>
>
>We've been doing a burn in for about 4 days now on an array of 8
>x25m's behind a p812 controller: here's a sample of what it is
>currently doing (I have 10 threads randomly seeking, reading, and 10%
>of the time writing (then fsync'ing) out, using my pgiosim tool which
>I need to update on pgfoundry)

Your RAID card is probably disabling the write cache on those.  If not, it
isn't power failure safe.

When the write cache is disabled, the negative effects of random writes on
longevity and performance are significantly amplified.

For the G3 drives, you can force the write caches on and remain power
failure safe.  This will significantly decrease the effects of the below.
You can also use a newer linux version with a file system that supports
TRIM/DISCARD which will help as long as your raid controller passes that
through.  It might end up that for many workloads with these drives, it is
faster to use software raid than hardware raid + raid controller.


>
>that was from a simple dd, not random writes. (since it is in
>production, I can't really do the random write test as easily)
>
>theoretically, a nice rotation of disks would remove that problem.
>annoying, but it is the price you need to pay
>
>--
>Jeff Trout 
>http://www.stuarthamm.net/
>http://www.dellsmartexitin.com/
>
>
>
>
>-- 
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Intel SSDs that may not suck

2011-04-06 Thread Scott Carey


On 4/6/11 2:11 PM, "Andy"  wrote:

>
>--- On Wed, 4/6/11, Scott Carey  wrote:
>
>
>> I could care less about the 'fast' sandforce drives.
>> They fail at a high
>> rate and the performance improvement is BECAUSE they are
>> using a large,
>> volatile write cache.
>
>The G1 and G2 Intel MLC also use volatile write cache, just like most
>SandForce drives do.

1. People are complaining that the Intel G3's aren't as fast as the
SandForce drives (they are faster than the 1st gen SandForce, but not the
yet-to-be-released ones like Vertex 3).  From a database perspective, this
is complete BS.

2. 256K versus 64MB write cache.   Power + time to flush a cache matters.

3. None of the performance benchmarks of drives are comparing the
performance with the cache _disabled_ which is required when not power
safe.  If the SandForce drives are still that much faster with it
disabled, I'd be shocked.  Disabling a 256K write cache will affect
performance less than disabling a 64MB one.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Intel SSDs that may not suck

2011-04-06 Thread Scott Carey


On 4/6/11 4:03 PM, "gnuo...@rcn.com"  wrote:

>Not for user data, only controller data.
>

False.  I used to think so, but there is volatile write cache for user
data -- its on the 256K chip SRAM not the DRAM though.

Simple power failure tests demonstrate that you lose data with these
drives unless you disable the cache.  Disabling the cache roughly drops
write performance by a factor of 3 to 4 on G1 drives and significantly
hurts wear-leveling and longevity (I haven't tried G2's).

>
>
> Original message 
>>Date: Wed, 6 Apr 2011 14:11:10 -0700 (PDT)
>>From: pgsql-performance-ow...@postgresql.org (on behalf of Andy
>>)
>>Subject: Re: [PERFORM] Intel SSDs that may not suck
>>To: Merlin Moncure ,Scott Carey
>>
>>Cc: "pgsql-performance@postgresql.org"
>>,Greg Smith 
>>
>>
>>--- On Wed, 4/6/11, Scott Carey  wrote:
>>
>>
>>> I could care less about the 'fast' sandforce drives.
>>> They fail at a high
>>> rate and the performance improvement is BECAUSE they are
>>> using a large,
>>> volatile write cache.
>>
>>The G1 and G2 Intel MLC also use volatile write cache, just like most
>>SandForce drives do.
>>
>>-- 
>>Sent via pgsql-performance mailing list
>>(pgsql-performance@postgresql.org)
>>To make changes to your subscription:
>>http://www.postgresql.org/mailpref/pgsql-performance
>
>-- 
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Intel SSDs that may not suck

2011-04-06 Thread Scott Carey

On 4/5/11 7:07 AM, "Merlin Moncure"  wrote:

>On Mon, Apr 4, 2011 at 8:26 PM, Greg Smith  wrote:
>>
>> If you really don't need more than 120GB of storage, but do care about
>> random I/O speed, this is a pretty easy decision now--presuming the
>>drive
>> holds up to claims.  As the claims are reasonable relative to the
>> engineering that went into the drive now, that may actually be the case.
>
>One thing about MLC flash drives (which the industry seems to be
>moving towards) is that you have to factor drive lifespan into the
>total system balance of costs. Data point: had an ocz vertex 2 that
>burned out in ~ 18 months.  In the post mortem, it was determined that
>the drive met and exceeded its 10k write limit -- this was a busy
>production box.

What OCZ Drive?  What controller?  Indilinx? SandForce?  Wear-leveling on
these vary quite a bit.

Intel claims write lifetimes in the single digit PB sizes for these 310's.
 They are due to have an update to the X25-E line too at some point.
Public roadmaps say this will be using "enterprise" MLC.  This stuff
trades off write endurance for data longevity -- if left without power for
too long the data will be lost.  This is a tradeoff for all flash -- but
the stuff that is optimized for USB sticks is quite different than the
stuff optimized for servers.

>
>merlin
>
>-- 
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Intel SSDs that may not suck

2011-04-06 Thread David Rees
On Wed, Apr 6, 2011 at 5:42 PM, Scott Carey  wrote:
> On 4/5/11 7:07 AM, "Merlin Moncure"  wrote:
>>One thing about MLC flash drives (which the industry seems to be
>>moving towards) is that you have to factor drive lifespan into the
>>total system balance of costs. Data point: had an ocz vertex 2 that
>>burned out in ~ 18 months.  In the post mortem, it was determined that
>>the drive met and exceeded its 10k write limit -- this was a busy
>>production box.
>
> What OCZ Drive?  What controller?  Indilinx? SandForce?  Wear-leveling on
> these vary quite a bit.

SandForce SF-1200

-Dave

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Intel SSDs that may not suck

2011-04-06 Thread Greg Smith

On 04/06/2011 08:22 PM, Scott Carey wrote:

Simple power failure tests demonstrate that you lose data with these
drives unless you disable the cache.  Disabling the cache roughly drops
write performance by a factor of 3 to 4 on G1 drives and significantly
hurts wear-leveling and longevity (I haven't tried G2's).
   


Yup.  I have a customer running a busy system with Intel X25-Es, and 
another with X25-Ms, and every time there is a power failure at either 
place their database gets corrupted.  That those drives are worthless 
for a reliable database setup has been clear for two years now:  
http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/ 
and sometimes I even hear reports about those drives getting corrupted 
even when the write cache is turned off.  If you aggressively replicate 
the data to another location on a different power grid, you can survive 
with Intel's older drives.  But odds are you're going to lose at least 
some transactions no matter what you do, and the risk of "database won't 
start" levels of corruption is always lingering.


The fact that Intel is making so much noise over the improved write 
integrity features on the new drives gives you an idea how much these 
problems have hurt their reputation in the enterprise storage space.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Intel SSDs that may not suck

2011-04-06 Thread Greg Smith

Here's the new Intel 3rd generation 320 series drive:

$ sudo smartctl -i /dev/sdc
Device Model: INTEL SSDSA2CW120G3
Firmware Version: 4PC10302
User Capacity:120,034,123,776 bytes
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4

Since I have to go chant at the unbelievers next week (MySQL Con), don't 
have time for a really thorough look here.  But I made a first pass 
through my usual benchmarks without any surprises.


bonnie++ meets expectations with 253MB/s reads, 147MB/s writes, and 3935 
seeks/second:


Version 1.03e   --Sequential Output-- --Sequential Input- 
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  
/sec %CP
toy  32144M   147180   7 77644   3   253893   5  
3935  15


Using sysbench to generate a 100GB file and randomly seek around it 
gives a similar figure:


Extra file open flags: 0
100 files, 1Gb each
100Gb total file size
Block size 8Kb
Number of random requests for random IO: 1
Read/Write ratio for combined random IO test: 1.50
Using synchronous I/O mode
Doing random read test
Threads started!
Done.

Operations performed:  1 reads, 0 writes, 0 Other = 1 Total
Read 78.125Mb  Written 0b  Total transferred 78.125Mb  (26.698Mb/sec)
 3417.37 Requests/sec executed

So that's the basic range of performance:  up to 250MB/s on reads, but 
potentially as low as 3400 IOPS = 27MB/s on really random workloads.  I 
can make it do worse than that as you'll see in a minute.


At a database scale of 500, I can get 2357 TPS:

postgres@toy:~$ /usr/lib/postgresql/8.4/bin/pgbench -c 64 -T 300 pgbench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 500
query mode: simple
number of clients: 64
duration: 300 s
number of transactions actually processed: 707793
tps = 2357.497195 (including connections establishing)
tps = 2357.943894 (excluding connections establishing)

This is basically the same performance as the 4-disk setup with 256MB 
battery-backed write controller I profiled at 
http://www.2ndquadrant.us/pgbench-results/index.htm ; there XFS got as 
high as 2332 TPS, albeit with a PostgreSQL patched for better 
performance than I used here.  This system has 16GB of RAM, so this is 
exercising write speed only without needing to read anything from disk; 
not too hard for regular drives to do.  Performance holds at a scale of 
1000 however:


postgres@toy:~$ /usr/lib/postgresql/8.4/bin/pgbench -c 64 -T 300 -l pgbench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1000
query mode: simple
number of clients: 64
duration: 300 s
number of transactions actually processed: 586043
tps = 1953.006031 (including connections establishing)
tps = 1953.399065 (excluding connections establishing)

Whereas my regular drives are lucky to hit 350 TPS here.  So this is the 
typical sweet spot for SSD:  workload is bigger than RAM, but not so 
much bigger than RAM that reads & writes become completely random.


If I crank the scale way up, to 4000 = 58GB, now I'm solidly in 
seek-bound behavior, which does about twice as fast as my regular drive 
array here (that's around 200 TPS on this test):


postgres@toy:~$ /usr/lib/postgresql/8.4/bin/pgbench -T 1800 -c 64 -l pgbench
starting vacuum...end.

transaction type: TPC-B (sort of)
scaling factor: 4000
query mode: simple
number of clients: 64
duration: 1800 s
number of transactions actually processed: 731568
tps = 406.417254 (including connections establishing)
tps = 406.430713 (excluding connections establishing)

Here's a snapshot of typical drive activity when running this:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   2.290.001.30   54.800.00   41.61

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sdc   0.00   676.67  443.63  884.00 7.9012.25
31.0941.77   31.45   0.75  99.93


So we're down to around 20MB/s, just as sysbench predicted a seek-bound 
workload would be on these drives.


I can still see checkpoint spikes here where sync times go upward:

2011-04-06 20:40:58.969 EDT: LOG:  checkpoint complete: wrote 2959 
buffers (9.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; 
write=147.300 s, sync=32.885 s, total=181.758 s


But the drive seems to never become unresponsive for longer than a second:

postgres@toy:~$ cat pgbench_log.4585 | cut -d" " -f 6 | sort -n | tail
41
52
56
59
60
70
77
84
92
94

Power-plug pull tests with diskchecker.pl and a write-heavy database 
load didn't notice anything funny about the write cache:


[witness]
$ wget http://code.sixapart.com/svn/tools/trunk/diskchecker.pl
$ chmod +x ./diskchecker.pl
$ ./diskchecker.pl -l

[server with SSD]
$ wget http://code.sixapart.com/svn/tools/trunk/diskchecker.pl
$ chmod +x ./diskchecker.pl
$ diskchecker.pl -s

Re: [PERFORM] Intel SSDs that may not suck

2011-04-06 Thread David Boreham
Had to say a quick thanks to Greg and the others who have posted 
detailed test results on SSDs here.
For those of us watching for the inflection point where we can begin the 
transition from mechanical to solid state storage, this data and 
experience is invaluable. Thanks for sharing it.


A short story while I'm posting : my Dad taught electronics engineering 
and would often visit the local factories with groups of students. I 
remember in particular after a visit to a disk drive manufacturer 
(Burroughs), in 1977 he came home telling me that he'd asked the plant 
manager what their plan was once solid state storage made their products 
obsolete. The manager looked at him like he was form another planet...


So I've been waiting patiently 34 years for this hopefully 
soon-to-arrive moment ;)




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Intel SSDs that may not suck

2011-04-06 Thread gnuoytr
SSDs have been around for quite some time.  The first that I've found is Texas 
Memory.  Not quite 1977, but not flash either, although they've been doing so 
for a couple of years.  

http://www.ramsan.com/company/history

 Original message 
>Date: Wed, 06 Apr 2011 20:56:16 -0600
>From: pgsql-performance-ow...@postgresql.org (on behalf of David Boreham 
>)
>Subject: Re: [PERFORM] Intel SSDs that may not suck  
>To: pgsql-performance@postgresql.org
>
>Had to say a quick thanks to Greg and the others who have posted 
>detailed test results on SSDs here.
>For those of us watching for the inflection point where we can begin the 
>transition from mechanical to solid state storage, this data and 
>experience is invaluable. Thanks for sharing it.
>
>A short story while I'm posting : my Dad taught electronics engineering 
>and would often visit the local factories with groups of students. I 
>remember in particular after a visit to a disk drive manufacturer 
>(Burroughs), in 1977 he came home telling me that he'd asked the plant 
>manager what their plan was once solid state storage made their products 
>obsolete. The manager looked at him like he was form another planet...
>
>So I've been waiting patiently 34 years for this hopefully 
>soon-to-arrive moment ;)
>
>
>
>-- 
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Intel SSDs that may not suck

2011-04-06 Thread David Boreham

On 4/6/2011 9:19 PM, gnuo...@rcn.com wrote:

SSDs have been around for quite some time.  The first that I've found is Texas 
Memory.  Not quite 1977, but not flash either, although they've been doing so 
for a couple of years.
Well, I built my first ram disk (which of course I thought I had 
invented, at the time) in 1982.
But today we're seeing solid state storage seriously challenging 
rotating media across all applications, except at the TB and beyond 
scale. That's what's new.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Intel SSDs that may not suck

2011-04-06 Thread Jesper Krogh

On 2011-03-28 22:21, Greg Smith wrote:
Some may still find these two cheap for enterprise use, given the use 
of MLC limits how much activity these drives can handle.  But it's 
great to have a new option for lower budget system that can tolerate 
some risk there.



Drifting of the topic slightly..  Has anyone opinions/experience with:
http://www.ocztechnology.com/ocz-z-drive-r2-p88-pci-express-ssd.html

They seem to be "like" the FusionIO drives just quite a lot cheaper,
wonder what the state of those 512MB is in case of a power-loss.


--
Jesper

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Intel SSDs that may not suck

2011-04-06 Thread Greg Smith

On 04/07/2011 12:27 AM, Jesper Krogh wrote:

On 2011-03-28 22:21, Greg Smith wrote:
Some may still find these two cheap for enterprise use, given the use 
of MLC limits how much activity these drives can handle.  But it's 
great to have a new option for lower budget system that can tolerate 
some risk there.



Drifting of the topic slightly..  Has anyone opinions/experience with:
http://www.ocztechnology.com/ocz-z-drive-r2-p88-pci-express-ssd.html

They seem to be "like" the FusionIO drives just quite a lot cheaper,
wonder what the state of those 512MB is in case of a power-loss.


What I do is assume that if the vendor doesn't say outright how the 
cache is preserved, that means it isn't, and the card is garbage for 
database use.  That rule is rarely wrong.  The available soon Z-Drive R3 
includes a Sandforce controller and supercap for preserving writes:  
http://hothardware.com/News/OCZ-Unveils-RevoDrive-X3-Vertex-3-and-Other-SSD-Goodness/


Since they're bragging about it there, the safe bet is that the older R2 
unit had no such facility.


I note that the Z-Drive R2 is basically some flash packed on top of an 
LSI 1068e controller, mapped as a RAID0 volume.  It's possible they left 
the battery-backup unit on that card exposed, so it may be possible to 
do better with it.  The way they just stack those card layers together, 
the thing is practically held together with duct tape though.  That's 
not a confidence inspiring design to me.  The R3 drives are much more 
cleanly integrated.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partial index slower than regular index

2011-04-06 Thread Tom Lane
Thom Brown  writes:
> On 6 April 2011 05:44, Tom Lane  wrote:
>> It looks like the index predicate expression isn't getting the right
>> collation assigned, so predtest.c decides the query doesn't imply the
>> index's predicate.  Too tired to look into exactly why right now, but
>> it's clearly bound up in all the recent collation changes.

> Testing it again with very explicit collations, it still has issues:

Yeah, any sort of collation-sensitive operator in an index WHERE clause
was just plain broken.  Fixed now.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance