Re: [PERFORM] Poor plan choice in prepared statement

2009-01-30 Thread Scott Carey
>da...@lang.hm writes:
>> the poster who started this thread had a query where the parsing phase
>> took significantly longer than the planning stage.

> That was an anecdote utterly unsupported by evidence.
regards, tom lane

The issue of prepared statements having atrocious query plans has hit me again. 
 I feel very strongly about this topic and the need for Postgres to have an 
option that allows for prepared statements to re-plan based on the inputs that 
works and is user friendly.  Pardon my bluntness, but in the current situation 
the system is brain dead in many important use cases and data sets.

I believe my statement referenced above was about parsing time to the remaining 
time, not parsing compared to planning.  But either way, its a minor detail, 
and its not important to justify the need for the enhancement here.
Yeah, its anecdotal from your perspective.  Go ahead and ignore all that if you 
wish.
 I am making several points in this message that are independent of such 
evidence, IMHO.
I have tried to rearrange this message so that the anecdotal narrative is at 
the end, after the first dashed line.


Unnamed prepared statements do solve much of the problem in theory, since the 
most common issue is typically poor execution plans or a lack of ability to 
cleanly deal with SQL injection and write less bug prone client code.  Parsing 
being very expensive is more rare.  But there IS a performance savings that is 
not insignificant for many workloads to be had by avoiding the utterly 
avoidable parsing.

HOWEVER:
What is overlooked WRT unnamed prepared statements, is that they are hard to 
use and changing client code or behavior is difficult, error prone, and 
sometimes impossible.  Not all client APIs play nice with them at the moment 
(see Postgres' JDBC).  The behavior has some global tweaks, but these are 
useless in many situations where you need behavior that varies.

Every time the answer to a problem is to change the client behavior, I ask 
myself if the DB could have a better default or configuration parameter so that 
clients don't have to change.  Some database instances have dozens of 
applications and hundreds of client types including ad-hoc usage.  Some client 
code is legacy code that simply can't be changed.
Changing the clients is a lot harder than changing a db parameter, or 
configuring a new default for a particular db user.  If the client must change, 
adding something like SET prepare_level = 'parse_only' is the least intrusive 
and easiest to test - but I stress again that in many real-world cases the 
client is not flexible.

A session-level parameter that controls prepared statement behavior defaults 
(never cache by default?  parse cache only? parse cache and plan cache?) would 
be a blessing.  A DBA could administer a fix to certain problems without having 
to force clients to change behavior or wait for new client API versions with 
fixes.

That reminds me, isn't there a global parameter that can force no prepared 
statements to be cached, does that make them all behave as if they are unnamed? 
 Or are they simply re-created each time?  I believe I tried this in the past 
and the prepared statements were unable to use the parameter values for 
partition table selection, suggesting the latter.

Typically, I run into the issue with queries in a back-end process that 
operates on large data sets or partitioned tables.  Prepared statements 
essentially kill performance by several orders of magnitude (think, scan 1 
versus scan 5000 partition tables). However, my recent issue is brutally simple.
I need to have an auto-complete web form, and thus need a query with a portion 
like
WHERE name LIKE 'userenteredtext%'
Thus, I make an index with varchar_pattern_ops and off we go! ... Or not.  
Works fine with explicit queries, but not a prepared query.  Unfortunately, 
this is highly prone to SQL injection, and the industry standard way to deal 
with this is by parameterization.
http://www.owasp.org/index.php/Guide_to_SQL_Injection
(that site is a weath of information, tools, and tests on the topic, for 
example: http://www.owasp.org/index.php/Testing_for_SQL_Injection).

Prepared statements are a blessing from a client code perspective, preventing 
all sorts of bugs and catching many others early, on the client side.  Not 
being able to use them because it causes the database to execute very bad query 
plans is a flaw in the database, not the client.


Unnamed prepared statements did not work for me when I tried them as a solution 
(8.3.2, supposedly after the fix).  I was in a hurry to fix my issue, and just 
moved on when they were plainly not working.  It is possible I did something 
wrong back then.  They are also poorly supported by many client APIs -- when 
they did not work for me, I supposed it was a JDBC issue or perhaps user error, 
but maybe it was ser

Re: [PERFORM] Using multiple cores for index creation?

2009-01-30 Thread Scott Carey
It may not be that possible with your schema, but when I had to restore a 2.5TB 
database (with plenty fast I/O, it was never I/O bound) I used 3 or 4 copies of 
pg_restore operating on different tables.

With the -Fc option, like a plain dump you can have it restore just certain 
schemas or tables per command.  A bit of manual work but you don't have to 
modify the dump file(s), just make various differing commands that operate on 
different sections of the database.  How hard that is depend on the schema.  In 
my case, we have most of the data in partitioned tables and can use a regex to 
peel off chunks of them by date to restore in different processes once the 
parent tables and schema are in place.

It still took all day though, and it wasn't I/O bound.

On 1/29/09 12:21 PM, "henk de wit"  wrote:

Hi,

When I try to restore a database dump on PostgreSQL 8.3 that's approximately 
130GB in size and takes about 1 hour, I noticed index creation makes up the 
bulk of that time. I'm using a very fast I/O subsystem (16 Mtron Pro 7535 SSDs 
using a dual 1.2Ghz IOP/4GB cache RAID controller), fast CPUs (2 quad core 
C2Q's at 2.6Ghz) and 32GB RAM. From monitoring the restore process, I learned 
that only 10 minutes is spend doing IO, while the rest of the time is spend on 
creating the indexes. Index creation seems to be completely CPU bound.

The problem is that only 1 CPU core is used. My other 7 cores are just sitting 
there doing nothing. It seems to me that creating each index, especially for 
different tables, is something that can be done independently.

Is there some way I can let PostgreSQL use multiple cores for creating the 
indexes?

Thanks in advance


Express yourself instantly with MSN Messenger! MSN Messenger 



Re: [PERFORM] SSD performance

2009-01-30 Thread Scott Carey



On 1/23/09 3:35 AM, "da...@lang.hm"  wrote:
http://techreport.com/articles.x/15931/1

one thing that both of these reviews show is that if you are doing a
significant amount of writing the X-25M is no better than a normal hard
drive (and much of the time in the middle to bottom of the pack compared
to normal hard drives)

David Lang


The X-25-M may not have write STR rates that high compared to normal disks, but 
for write latency, it is FAR superior to a normal disk, and for random writes 
will demolish most small and medium sized raid arrays by itself.  It will push 
30MB to 60MB /sec of random 8k writes, or ~2000 to 12000 8k fsyncs/sec.  The -E 
is definitely a lot better, but the -M can get you pretty far.

For any postgres installation where you don't expect to write to a WAL log at 
more than 30MB/sec (the vast majority), it is good enough to use (mirrored) as 
a WAL device, without a battery back up, with very good performance.  A normal 
disk cannot do that.

Also, it can be used very well for the OS swap, and some other temp space to 
prevent swap storms from severely impacting the system.

For anyone worried about the X 25-M's ability to withstand lots of write cycles 
... Calculate how long it would take you to write 800TB to the drive at a 
typical rate.  For most use cases that's going to be > 5 years.  For the 160GB 
version, it will take 2x as much data and time to wear it down.

Samsung, SanDisk, Toshiba, Micron, and several others are expected to have low 
random write latency, next gen SSD's this year.  A few of these are claiming > 
150MB/sec for the writes, even for MLC based drives.

A RAM based device is intriguing, but an ordinary SSD will be enough to make 
most Postgres databases CPU bound, and with those there is no concern about 
data loss on power failure.  The Intel X 25 series does not even use the RAM on 
it for write cache! (it uses some SRAM on the controller chip for that, and its 
fsync safe) The RAM is working memory for the controller chip to cache the LBA 
to Physical flash block mappings and other data needed for the wear leveling, 
contrary to what many reviews may claim.