On 02/25/2012 06:54 PM, Stefan Keller wrote:
Hi,

Recently Mike Stonebraker identified four areas where "old elephants"
lack performance [1]:

1. Buffering/paging
2. Locking/Multithreading
3. WAL logging
4. Latches (aka memory locks for concurrent access of btree structures
in buffer pool?).

He claims having solved these issues while retaining SQL and ACID.
But the only issue I understood is #1 by loading all tuples in-memory.
=>  Are there any ideas on how to tell Postgres to aggressively load
all data into memory (issue #1)? All remaining issues make me wonder.
I actually doubt that there are alternatives even theoretically.
=>  Can anyone help explaining me issues 2,3 and 4, their solutions,
and why Postgres would be unable to resolve them?


1. Buffering/paging

PG, and your operating system, already do this for reads.  It also keeps things that are hit harder 
and lets things go that are not hit as much.  On the writing side, you can configure it PG from 
"write it! write it NOW!", to "running with scissors" depending on how safe you 
want to feel.


2. Locking/Multithreading

PG does have some internal structures that it needs to lock (and anytime you 
read lock, think single user access, or one at a time, or slow).  Any time you 
hear about lock contention, it's multiple processes waiting in a line for a 
lock.  If you only had one client, then you really would not need locks.  There 
is where multithreading comes from, but in PG we use multi-process instead of 
multi-thread, but its the same thing.  Two (or more) people are needing to lock 
something so they can really screw with it.  PG does not need as many locks as 
other db's however.  It uses an MVCC architecture so under normal operations 
(insert, update, select, delete) people dont block eacth other. (ie readers 
dont block writers and visa versa).

I don't see locking going away, but there are not many loads that are lock 
bound.  Most database loads are IO bound, and then you'd probably be CPU bound 
before you are lock bound.  (although it can be hard to tell if its a spin lock 
that's making you cpu bound).  I'm sure there are loads that hit lock 
contention, but there are probably ways to mitigate it.  Say you have a process 
that alters the table and adds a new column every two seconds, thing updates a 
single row to add data to the new column just added.  I can see that being lock 
bound.  And a really stupid implementation.

3. WAL logging

PG writes a transaction twice.  Once to WAL and once to the DB.  WAL is a simple and 
quick write, and is only ever used if your computer crashes and PG has to re-play 
transactions to get the db into a good/known state.  Its a safety measure that doesn't 
really take much time, and I don't think I've heard of anyone being WAL bound.  Although 
it does increase IO ops, it's not the biggest usage of IO.  This one falls under 
"lets be safe" which is something NoSQL did away with.  Its not something I 
want to give up, personally.  I like using a net.

4. Latches

I can only guess at this one.  Its similar to locks I think.  Data structures 
come in different types.  In the old days we only had single user access to 
data structures, then when we wanted two users to access it we just locked it 
to serialize access (one at a time mode), but that does not scale well at all, 
so we invented two new types: lock free and wait free.

An index is stored as a btree.  To insert a new record into the index you have 
to reorganize it (rotate it, sort it, add/delete nodes, etc), and while one 
client is doing that it can make it hard for another to try and search it.  
Lock free (and wait free) let multiple people work on a btree at the same time 
with much less contention.  Wikipedia does a better job of explaining them than 
I could:

http://en.wikipedia.org/wiki/Non-blocking_algorithm

I have no idea if PG uses single user locks or some kind of lock free structure 
for its internals.  I can see different parts of the internals needing 
different levels.

Maybe I'm old fashioned, but I don't see how you'll get rid of these.  You have 
to insert a record.  You have to have 12 people hitting the db at the same 
time.  You have to organize that read/write access somehow so they dont blow 
each other up.

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to