o be jumping through your data in strides
of stripe*# disks available (accounting for RAID type) as all your reads
will end up hitting the same disk.
David Lang
---(end of broadcast)---
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
On Thu, 29 Mar 2007, Matt Smiley wrote:
Hi David,
Thanks for your feedback! I'm rather a newbie at this, and I do appreciate the
critique.
First, let me correct myself: The formulas for the risk of loosing data when
you loose 2 and 3 disks shouldn't have included the first
ur
primary concern)
If you care about how often you'll have to replace a failed drive, then the
SCSI option no question, although check the cases for hot-swapability.
note that the CMU and Google studies both commented on being surprised at
the lack of difference between the reliabi
tually get much better reliability then RAID 10
David Lang
---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get thro
e a _very_ good
improvement, but ext3 has very definiate limits that it inherited from
ext2.
David Lang
---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings
the vendors did the burn-in before delivering the drives
to the sites that installed them, there was no 'infant mortality' spike on
the drives (both studies commented on this, they expected to find one)
David Lang
---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster
typing of possible new calls, if a
sequence of syscalls end up being common enough the kernel devs will look
at makeing a new, combined, syscall (for example lock, write, unlock could
be made into one if it's common enough and there's enough of a performance
table spaces it would be an
interesting experiment to be able to define a tablespace that used a raw
device instead of a filesystem to see if there are any noticable
performance gains
David Lang
---(end of broadcast)---
TIP 7: You can help suppo
rates.
this I will agree with.
Server class drives are designed with a longer lifespan in mind.
Server class hard drives are rated at higher temperatures than desktop
drives.
these two I question.
David Lang
---(end of broadcast)---
TIP 6: explain an
these two I question.
David Lang
Both statements are the literal truth. Not that I would suggest abusing your
server class HDs just because they are designed to live longer and in more
demanding environments.
Overheating, nasty electrical phenomenon, and abusive physical shocks will
trash a ser
whereas with the filesystem cache you can just write/read part of a
page in memory and let the cache decide when it needs to write and read from
disk. If there are other caveats to direct i/o on Linux I'd love to hear
them.
other then bad interactions with "normal" utilities not
On Fri, 6 Apr 2007, Ron wrote:
Bear in mind that Google was and is notorious for pushing their environmental
factors to the limit while using the cheapest "PoS" HW they can get their
hands on.
Let's just say I'm fairly sure every piece of HW they were using for those
studies was operating outs
than grade.
this I will agree with fully.
David Lang
---(end of broadcast)---
TIP 6: explain analyze is your friend
will admit that I've removed to cover from my tivo to allow it to run
cooler, and I'm still on the origional drive + 100G drive I purchased way
back when (7+ years ago) before I removed the cover I did have times when
the tivo would die from the heat (Los Angeles area in the summer with
work)
it's not that the people who disagree with you don't care about their
data, it's that they have different experiances then you do (experiances
that come close to matching the studies where they tracked hundereds of
thousands of drives of different types), and as a result believ
her there's a name for it or not, there's definitely an important
distinction between each of anecdote, professional opinion, and study result.
the line between an anecdote and a professional opinion is pretty blury,
and hard to see without wasting a lot of time getting everyone to giv
when every performance question is answered with "did you change the
defaults? they are way too low for modern hardware, raise them by 2 orders
of magnatude and then we'll start investigating"
David Lang
---(end of broadcast)---
TIP
ust being naive?
unfortunantly yes.
across different linux distros they are fairly standard (however different
kernel versions will change them)
however different kernels need drasticly different tools to get the info
from them.
David Lang
---(end of broadcast)--
would be useful. the problem is that the defaults are _so_
far off that no sane incremental program is going to be able to converge
on the right answer rapidly.
David Lang
Take two, those parameters pass their sanity tests, but
since you're actually running at a reasonable speed now t
o this a few times until
you get a loop that takes long enough (a few seconds) on a fast processor
David Lang
---(end of broadcast)---
TIP 6: explain analyze is your friend
--- doesn't PG already know that information? I mean,
./configure does gather that information --- does it not?
we're not talking about comiling PG, we're talking about getting sane
defaults for a pre-compiled binary. if it's a 32 bit binary assume a 32
bit cpu, if
t would go for the binaries as well --- we're pretty much saying the
same thing :-)
I'm thinking along the lines of a script or pre-compiled binary (_not_
initdb) that you could run and have it generate a new config file that has
values that are at within about an order of magnatud
ms then I
agree that we should use the most accurate clock we can get. I just don't
think we have that.
David Lang
---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
s
will average out quite a bit.
the key is to be going for a reasonable starting point. after that then
the full analysis folks can start in with all their monitoring and
tuneing, but the 80/20 rule really applies here. 80% of the gain is from
getting 'fairly close' to the right values,
On Tue, 8 May 2007, Chris wrote:
David Levy wrote:
Hi,
I am about to order a new server for my Postgres cluster. I will
probably get a Dual Xeon Quad Core instead of my current Dual Xeon.
Which OS would you recommend to optimize Postgres behaviour (i/o
access, multithreading, etc) ?
I
ournal your data then you avoid the problems above, but in a
crash you may find that you lost data, even though the filesystem is
'intact' according to fsck.
David Lang
Steve Atkins wrote:
On May 7, 2007, at 2:55 PM, David Levy wrote:
> Hi,
>
> I am about to order a n
d userspace tools are pushing people to
create similar tools for Linux (without needeing to combine the vairous
pieces in the kernel)
David Lang
---(end of broadcast)---
TIP 6: explain analyze is your friend
On Tue, 8 May 2007, Trygve Laugstøl wrote:
currently ZFS is only available on Solaris, parts of it have been released
under GPLv2, but it doesn't look like enough of it to be ported to Linux
(enough was released for grub to be able to access it read-only, but not
the full filesystem). there
year or
two or you end up working with obsolete data.
David Lang
---(end of broadcast)---
TIP 6: explain analyze is your friend
nd on it, I
suspect that either the SD or CFS schedulers will clean things up for you.
David Lang
---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so
e off by a huge amount
(especially if you have never done a vaccum analyse after loading your
table) then it's very likely that postgres will be slow becouse it's doing
expensive operations that it thinks are cheap.
David Lang
---(e
endering
(i.e. it may reduce the stutters, but won't eliminate them)
a single query will confine itself to one core, but if you have a vaccum
or autovaccum run it will affect the second core.
I don't know what you can do on windows beyond this though.
David Lang
P.S. make s
?
if this is limited to 1, what happens when a vaccum run hits (or
autovaccum)
David Lang
---(end of broadcast)---
TIP 6: explain analyze is your friend
rpm drive ~150 transactions/sec sounds about right
to really speed things up you want to get a disk controller with a battery
backed cache so that the writes don't need to hit the disk to be safe.
that should get your speeds up to (and possibly above) what you got by
turning fsy
his analysis.
I've seen similar comments from some of the linux kernel devs, they've
used low-end raid controllers with small processors on them and think that
a second core/socket in the main system to run software raid on is better.
David Lang
---(end of b
, I'm absolutly sure that doing so won't satidfy people (wnd would
generate howles of outrage from some parts), but having watched other
groups try and get things into the kernel that the kernel devs felt were
layering violations I think that it's wat will ultimatly happen.
David
antly faster then RAM,
calculating a checksum is free if the CPU has to touch the data anyway
(cycles where it would be waiting for a cache miss are spent doing the
calculations)
if you don't believe me, hack the source to remove the checksum and see if
you can measure any difference.
[8] sdm1[7] sdl1[6] sdk1[5] sdj1[4] sdi1[3]
sdh1[2] sdg1[9] sdf1[1] sde1[11](S) sdd1[0]
896 blocks [10/10] [UU]
David Lang
---(end of broadcast)---
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
able down to some
50-100 thousand rows, with smaller indexes to match.
what if he doesn't use the postgres internal partitioning, but instead
makes his code access the tables named responsesN where N is the
id of the customer?
this is what it sounded like he was asking initially.
e data to find things that _may_ be relavent would have a noticable cost
in and of itself.
David Lang
---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings
ime in the application
which 'partition' you care about it would seem that searching through
significantly less data should be a win.
David Lang
---(end of broadcast)---
TIP 6: explain analyze is your friend
idual partitions
as read-only, thus bypassing MVCC and allowing queries to be satisfied using
indexes only?
Not that I think I've seen it on the TODO... :-)
now that's a very interesting idea, especially when combined with
time-based data where the old times will never chang
the replication solution offered by the
bigname vendors.
the database folks won't comment much on this either way, but they don't
seem enthusiastic to combine all the data togeather.
I think they can see the fecal matter heading towards the rotational cooling
device on this one. I
of an advantage going from 64G to 128G as
it does going from 32G to 64G, but that depends on many variables as
others have been asking.
David Lang
---(end of broadcast)---
TIP 4: Have you searched our list archives?
http
you have
an answer even for the deeply buried isolated headless servers.
David Lang
On Mon, 18 Jun 2007,
Campbell, Lance wrote:
I am a Java Software architect, DBA, and project manager for the
University of Illinois, Department of Web Services. We use PostgreSQL
to serve about 2 million
rations, etc is so
variable that I strongly believe that the right answer is going to be
something along the lines of 'run this tool and then enter the number(s)
that the tool reports' and then let the tool measure the end result of all
the variables rather then
available for postgres
3. needed by the OS
it's not clear if what you are asking is #2 or a combination of #2 and #3
IMHO you should ask for #2 and #3, possibly along the lines of "how much
memory is in the machine that isn't already used by other applications"
David Lang
-
just about everyone (we just can't decide
what better defaults would be)
this horrible starting point gives people a bad first impression that a
simple tool like what's being discussed can go a long way towards solving.
David Lang
---(e
king you need to do
depends on how large the OLTP database ends up being
David Lang
---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at
http://www.postgresql.org/about/donate
postgres instances
is started up towards the same disk (due to some sysadmin mistake), it
could be disasterous.
when you are useing a SAN for a database the SAN vendor will have you
allocate complete disks to each box, so you don't have multiple boxes
hitting the same drive, but you also don&
rare for one to fail.
not nessasarily. direct attached doesn't mean in the same chassis,
external drive shelves attached via SCSI are still DAS
you can even have DAS attached to a pair of machines, with the second box
configured to mount the drives only if the first one dies.
David Lang
eting them, is really slow on
XFS.
Databases don't care about that.
accessing lots of small files works really well on XFS compared to ext* (I
use XFS with a cyrus mail server which keeps each message as a seperate
file and XFS vastly outperforms ext2/3 there). deleting is slow as you s
drive, unless the OS issues a trim
command the drive can't know that the data there is worthless and can be
ignored, it has to try and preserve that data, which makes doing the wear
leveling harder and slower.
David Lang
--
Sent via pgsql-performance mailing list (pgsql-performance
On Sat, 24 Jul 2010, David Boreham wrote:
Do you guys have any more ideas to properly 'feel this disk at its teeth' ?
While an 'end-to-end' test using PG is fine, I think it would be easier to
determine if the drive is behaving correctly by using a simple test progra
ck space, powering
and cooling for big arrays full of spinning disks to flash based solutions
the price comparison evens itself out even more.
check your SSD specs, some of the high performance ones draw quite a bit
of power.
David Lang
--
Sent via pgsql-performance mailing list (pgsql-
;s promises.
an this is just on linux
on BSD or solaris (or with out-of-kernel patches) you also have ZFS, which
some people swear by, and other people swear at.
David Lang
As it happens, this is a system where all of the heavy workload is in the
form of sequential scan type load. The
based count in the
face of other transactions, wouldn't you run into the same issues if you
are doing a sequential scan with the same transactions in process?
David Lang
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
for is cases wher the criteria of the search is
unpredictable, i.e.
ask the user for input
select count(*) from table where field=$input;
David Lang
Another more all-DB approach is to create a statistics tables into which you
place aggregated statistics rows (num deleted, num inserted, totals
ng different for postgres than they do for other databases is
raising a barrier against postgres untilization in anything.
David Lang
My
assessment that there is a problem with sequential scan was wrong. Now, let's
again take Oracle as the measure.
Someone asked me about caching the data
everything
else down.
complexity probably, although given how complex the planner is already is
this significant?
as far as slowing everything else down, why would it do that? (beyond the
simple fact that any new thing the planner can do makes the planner take a
little longer)
David Lang
The
uld fit in RAM.
Neil, can you just double the size of your initial test to make sure that
it's too large to fit in ram to start with?
David Lang
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org
;m not sure
there's any advantage to it. Managing address space mappings is a
pain in the butt.
keep in mind that you have no way of knowing what order the data in the
mmap region gets written out to disk.
David Lang
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
, it could be
written out immediatly without you doing _anything_ (and thanks to
multiple cores/threads, it could get written out while you are still in
the middle of updating it). When you update an internal buffer and then
write that, you know that nothing will hit the disk before you issue
and reloading the data? or would you loose
irreplaceable data?
David Lang
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
it helps to separate your indexes.
now, if you pull the data from this main table into a smaller table for
analysis, you may want to do more interesting things with the drives that
you use for this smaller table as you are going to be loading data into
them more frequently.
David Lang
On
esn't make sense for updates to existing databases, but the use
case of loading a bunch of data and then querying it right away isn't
_that_ uncommon.
David Lang
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
essentially making the data feed into the Analyze run be a fork of the
data comeing out of the insert run going to disk. So the Analyze run
doesn't do any I/O and isn't going to complete until the insert is
complete. At which time it will have seen one copy of the entire table.
David
. :-)
I will point out that 1% of a very large table can still be a lot of disk
I/O that is avoided (especially if it's random I/O that's avoided)
David Lang
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http:
;t be used for the next X time
while we wait for analyze to finish running"
I don't understand why the algorithm would have to be so different than
what's done today, surely the analyze thread could easily be tweaked to
ignore the rest of the data (assuming we don't have the thr
ou get a known-bad transfer that you can abort or resume.
David Lang
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
used
for the load, but cores/chip are still climbing rapidly so I expect that
it's still pretty easy to end up with enough CPU to handle the extra load.
David Lang
And you can't start after some threshold of data passed by since you may
loose significant information (like minimal val
not be able to
optimize them as effectivly.
David Lang
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
.
but with software raid you will have updates some part of the array and
not others. this will result in a corrupted stripe in the array.
David Lang
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
significantly slower than raid 10 is when you are doing
small random writes. Also many the performance variation between raid
controllers will be much higher with raid 6 than with raid 10
David Lang
Cheers
Dave
On Sun, Feb 13, 2011 at 2:12 PM, sergey wrote:
Hello,
I got a disk array appliance of 8
mal operations,
but when you need to do an expensive query you can change it for that
query.
David Lang
---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster
consistant with each
other.
will a plan with a estimated cost of 1,000,000 take approximatly 100 times
as long as one with a cost of 10,000?
or more importantly, will a plan with an estimated cost of 2000 reliably
take longer then one with an estimated cost of 1000?
David Lang
Any brigh
ives which don't have noticable seek
times, things are completely different ;-)
David Lang
---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster
On Wed, 26 Dec 2007, Fernando Hevia wrote:
David Lang Wrote:
with only four drives the space difference between raid 1+0 and raid 5
isn't that much, but when you do a write you must write to two drives (the
drive holding the data you are changing, and the drive that holds the
parity dat
read operation. 8-(
I could see a raid 1 array not doing consistancy checking (after all, it
has no way of knowing what's right if it finds an error), but since raid
5/6 can repair the data I would expect them to do the checking each time.
David Lang
---(end of
imary purpose.
that said, I have heard of raid1 setups where it only reads off of one of
the drives, but I have not heard of higher raid levels doing so.
David Lang
---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore
On Wed, 26 Dec 2007, Mark Mielke wrote:
[EMAIL PROTECTED] wrote:
Thanks for the explanation David. It's good to know not only what but also
why. Still I wonder why reads do hit all drives. Shouldn't only 2 disks be
read: the one with the data and the parity disk?
no, becouse the pa
d into the question of it detecting bad data.
David Lang
---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
if
they disagree it returns a read error rather then possibly invalid data
(it's up to the admin to figure out which drive is bad at that point).
no, I don't remember which card this was. I've been playing around with
things in this space for quite a while.
David
how
would you find out what it is?
DNS doesn't report the netmask,and it's arguably the biggest database of
IP addresses around ;-)
one of the biggest reasons for storing IP addresses in a SQL database is
as part of log analysis.
David Lang
I believe this sort of detail is why s
this area that (at
least on initial reports) is showing significant improvements.
I haven't had the time to test this out yet, so I can't add personal
experiance, but it's definantly something to look at on a test system.
David Lang
---
ly it is to make the right decision.
David Lang
---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
es (depending on your workload and controller
cache)
David Lang
---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
features.
going from 7.x to 8.0 is the most painful step. going on from 8.0 to 8.1
to 8.2 to 8.3 are relativly minor steps in comparison.
David Lang
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref
out of date.
by the time the next stable release is made you will be running on ancient
versions unless you make use of backports.org (or compile it yourself,
which is useually my choice).
David Lang
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make chan
rts.org and you will be pretty safe.
David Lang
You can get 8.3 from backports: http://www.backports.org/ - it's a
debian project to get more up to date versions for existing stable
releases (they package everything exactly the same way).
--
Postgresql & php tutorials
http://www.designma
should use raid6 (allowing for dual failures)
David Lang
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
re then one drives worth of
redundancy). you can have hot-spare drives, but thats a different thing.
what controller/software lets you do this?
David Lang
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.po
wer outage. one of
the dead drives ended up being the hot-spare, and another drive that acted
up worked well enough to let me eventually recover all the data (seek
errors), but it was a very scary week while I worked through this.
David Lang
--
Sent via pgsql-performance mailing list (pgsql-
and the reports are significantly better
activity on a tuned system.
David Lang
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
dy to test
fixes. the kernel developers base sucess or failure on the results of
tests. if the only people providing test results are MySQL people, how
would they know there is a problem?
David Lang
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make chang
ol on both boxes and use heartbeat (from http://linux-ha.org ) to
move an IP address from one box to the other. clients connect to this
virtual IP and then pgpool will distribute the connections to both systems
from there.
David Lang
--
Sent via pgsql-performance mailing list (pgsql-perfor
ritten synchronously and sequentially so the journal doesn't help you),
but for the data partition you may benifit from the journal.
David Lang
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
n with the WAL on it you may as well do ext2 (the WAL is
written synchronously and sequentially so the journal doesn't help you), but
for the data partition you may benifit from the journal.
a fairly recent article on the subject
http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performa
ain families I
am aware of are the md and dm ones)
David Lang
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
s.
other filesystems are much less sensitive to file (and directory) sizes.
my suggestion would be to first make sure you are doing async writes to
syslog, and then try putting the logfiles on different filesystems to see
how they differ. personally I use XFS most of the time where I expect lots
1 - 100 of 994 matches
Mail list logo