[PERFORM] MIT benchmarks pgsql multicore (up to 48)performance

2010-10-04 Thread Hakan Kocaman
Hi,

for whom it may concern:
http://pdos.csail.mit.edu/mosbench/

They tested with 8.3.9, i wonder what results 9.0 would give.

Best regards and keep up the good work

Hakan


Re: [PERFORM] Issue for partitioning with extra check constriants

2010-10-04 Thread Josh Berkus

> And your point is?  The design center for the current setup is maybe 5
> or 10 partitions.  We didn't intend it to be used for more partitions
> than you might have spindles to spread the data across.

Where did that come from?  It certainly wasn't anywhere when the feature
was introduced.  Simon intended for this version of partitioning to
scale to 100-200 partitions (and it does, provided that you dump all
other table constraints), and partitioning has nothing to do with
spindles.  I think you're getting it mixed up with tablespaces.

The main reason for partitioning is ease of maintenance (VACUUM,
dropping partitions, etc.) not any kind of I/O optimization.

I'd like to add the following statement to our docs on partitioning, in
section 5.9.4:

=

Constraint exclusion is tested for every CHECK constraint on the
partitions, even CHECK constraints which have nothing to do with the
partitioning scheme.  This can add siginficant extra planner time,
especially if your partitions have CHECK constraints which are costly to
evaluate.  For performance, it can be a good idea to eliminate all extra
CHECK constraints on partitions or to re-implement them as triggers.

=

>In case you haven't noticed, we have very finite
> amounts of manpower that's competent to do planner surgery.

Point.


-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Issue for partitioning with extra check constriants

2010-10-04 Thread Joshua D. Drake
On Mon, 2010-10-04 at 11:34 -0700, Josh Berkus wrote:
> > And your point is?  The design center for the current setup is maybe 5
> > or 10 partitions.  We didn't intend it to be used for more partitions
> > than you might have spindles to spread the data across.
> 
> Where did that come from? 

Yeah that is a bit odd. I don't recall any discussion in regards to such
a weird limitation.

>  It certainly wasn't anywhere when the feature
> was introduced.  Simon intended for this version of partitioning to
> scale to 100-200 partitions (and it does, provided that you dump all
> other table constraints), and partitioning has nothing to do with
> spindles.  I think you're getting it mixed up with tablespaces.

Great! that would be an excellent addition.


> 
> The main reason for partitioning is ease of maintenance (VACUUM,
> dropping partitions, etc.) not any kind of I/O optimization.

Well that is certainly "a" main reason but it is not "the" main reason.
We have lots of customers using it to manage very large amounts of data
using the constraint exclusion features (and gaining from the smaller
index sizes).


Jd

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [HACKERS] MIT benchmarks pgsql multicore (up to 48)performance

2010-10-04 Thread Josh Berkus
Dan,

(btw, OpenSQL Confererence is going to be at MIT in 2 weeks.  Think
anyone from the MOSBENCH team could attend?
http://www.opensqlcamp.org/Main_Page)

> The big takeaway for -hackers, I think, is that lock manager
> performance is going to be an issue for large multicore systems, and
> the uncontended cases need to be lock-free. That includes cases where
> multiple threads are trying to acquire the same lock in compatible
> modes.

Yes; we were aware of this due to work Jignesh did at Sun on TPC-E.

> Currently even acquiring a shared heavyweight lock requires taking out
> an exclusive LWLock on the partition, and acquiring shared LWLocks
> requires acquiring a spinlock. All of this gets more expensive on
> multicores, where even acquiring spinlocks can take longer than the
> work being done in the critical section.

Certainly, the question has always been how to fix it without breaking
major features and endangering data integrity.

> Note that their implementation of the lock manager omits some features
> for simplicity, like deadlock detection, 2PC, and probably any
> semblance of portability. (These are the sort of things we're allowed
> to do in the research world! :-)

Well, nice that you did!  We'd never have that much time to experiment
with non-production stuff as a group in the project.  So, now we have a
theoretical solution which we can look at maybe implementing parts of in
some watered-down form.

> The other major bottleneck they ran into was a kernel one: reading from
> the heap file requires a couple lseek operations, and Linux acquires a
> mutex on the inode to do that. The proper place to fix this is
> certainly in the kernel but it may be possible to work around in
> Postgres.

Or we could complain to Kernel.org.  They've been fairly responsive in
the past.  Too bad this didn't get posted earlier; I just got back from
LinuxCon.

So you know someone who can speak technically to this issue? I can put
them in touch with the Linux geeks in charge of that part of the kernel
code.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MIT benchmarks pgsql multicore (up to 48)performance

2010-10-04 Thread Scott Marlowe
On Mon, Oct 4, 2010 at 8:44 AM, Hakan Kocaman  wrote:
> Hi,
> for whom it may concern:
> http://pdos.csail.mit.edu/mosbench/
> They tested with 8.3.9, i wonder what results 9.0 would give.
> Best regards and keep up the good work

They mention that these tests were run on the older 8xxx series
opterons which has much slower memory speed and HT speed as well.  I
wonder how much better the newer 6xxx series magny cours would have
done on it...  When I tested some simple benchmarks like pgbench, I
got scalability right to 48 processes on our 48 core magny cours
machines.

Still, lots of room for improvement in kernel and pgsql.

-- 
To understand recursion, one must first understand recursion.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How does PG know if data is in memory?

2010-10-04 Thread Cédric Villemain
2010/10/4 Greg Smith :
> Craig Ringer wrote:
>>
>> If some kind of cache awareness was to be added, I'd be interested in
>> seeing a "hotness" measure that tracked how heavily a given relation/index
>> has been accessed and how much has been read from it recently. A sort of
>> age-scaled blocks-per-second measure that includes both cached and uncached
>> (disk) reads. This would let the planner know how likely parts of a given
>> index/relation are to be cached in memory without imposing the cost of
>> tracking the cache in detail. I'm still not sure it'd be all that useful,
>> though...
>
> Yup, that's one of the design ideas scribbled in my notes, as is the idea of
> what someone dubbed a "heat map" that tracked which parts of the relation
> where actually the ones in RAM, the other issue you mentioned.  The problem
> facing a lot of development possibilities in this area is that we don't have
> any continuous benchmarking of complicated plans going on right now.  So if
> something really innovative is done, there's really no automatic way to test
> the result and then see what types of plans it improves and what it makes
> worse.  Until there's some better performance regression work like that
> around, development on the optimizer has to favor being very conservative.

* tracking specific block is not very easy because of readahead. You
end-up measuring exactly if a block was in memory at the moment you
requested it physicaly, not at the moment the first seek/fread happen.
It is still interesting stat imho.

I wonder how that can add value to the planner.

* If the planner knows more about the OS cache it can guess the
effective_cache_size on its own, which is probably already nice to
have.

Extract from postgres code:
 * We use an approximation proposed by Mackert and Lohman, "Index Scans
 * Using a Finite LRU Buffer: A Validated I/O Model", ACM Transactions
 * on Database Systems, Vol. 14, No. 3, September 1989, Pages 401-424.

Planner use that in conjunction with effective_cache_size to guess if
it is interesting to scan the index.
All is to know if this model is still valid in front of a more precise
knowledge of the OS page cache... and also if it matches how different
systems like windows and linux handle page cache.

Hooks around cost estimation should help writing a module to rethink
that part of the planner and make it use the statistics about cache. I
wonder if adding such hooks to core impact its  performances ? Anyway
doing that is probably the easier and shorter way to test the
behavior.


>
> --
> Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
> PostgreSQL Training, Services and Support  www.2ndQuadrant.us
> Author, "PostgreSQL 9.0 High Performance"    Pre-ordering at:
> https://www.packtpub.com/postgresql-9-0-high-performance/book
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How does PG know if data is in memory?

2010-10-04 Thread Jeremy Harris

On 10/04/2010 04:22 AM, Greg Smith wrote:

I had a brain-storming session on this subject with a few of the hackers in the community in this area a 
while back I haven't had a chance to do something with yet (it exists only as a pile of scribbled notes 
so far). There's a couple of ways to collect data on what's in the database and OS cache, and a couple 
of ways to then expose that data to the optimizer. But that needs to be done very carefully, almost 
certainly as only a manual process at first, because something that's producing cache feedback all of 
the time will cause plans to change all the time, too. Where I suspect this is going is that we may end 
up tracking various statistics over time, then periodically providing a way to export a mass of 
"typical % cached" data back to the optimizer for use in plan cost estimation purposes. But 
the idea of monitoring continuously and always planning based on the most recent data available has some 
stability issues, both from a "too many unpredictable plan changes" and a "ba

d

short-term feedback loop" perspective, as mentioned by Tom and Kevin already.


Why not monitor the distribution of response times, rather than "cached" vs. 
not?

That a) avoids the issue of discovering what was a cache hit  b) deals neatly 
with
multilevel caching  c) feeds directly into cost estimation.

Cheers,
   Jeremy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Issue for partitioning with extra check constriants

2010-10-04 Thread Tom Lane
Josh Berkus  writes:
>> And your point is?  The design center for the current setup is maybe 5
>> or 10 partitions.  We didn't intend it to be used for more partitions
>> than you might have spindles to spread the data across.

> Where did that come from?  It certainly wasn't anywhere when the feature
> was introduced.  Simon intended for this version of partitioning to
> scale to 100-200 partitions (and it does, provided that you dump all
> other table constraints), and partitioning has nothing to do with
> spindles.  I think you're getting it mixed up with tablespaces.

[ shrug... ]  If Simon thought that, he obviously hadn't done any
careful study of the planner's performance.  You can maybe get that far
as long as the partitions have just very simple constraints, but
anything nontrivial won't scale.  As you found out.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance