Re: [HACKERS] Proposal for CSN based snapshots

2013-06-07 Thread Markus Wanner
ple processes may need to
do such a conversion, all starting at the same point in time.

> Rolling back a transaction
> --
> 
> Rolling back a transaction is basically the same as committing, but
> the CSN slots need to be stamped with a AbortedCSN.

Is that really necessary? After all, an aborted transaction behaves
pretty much the same as a transaction in progress WRT visibility: it's
simply not visible.

Or why do you need to tell apart aborted from in-progress transactions
by CSN?

> Sizing the CSN maps
> ---
> 
> CSN maps need to sized to accomodate the number of backends.
> 
> Dense array size should be picked so that most xids commit before
> being evicted from the dense map and sparse array will contain slots
> necessary for either long running transactions or for long snapshots
> not yet converted to XID based snapshots. I did a few quick
> simulations to measure the dynamics. If we assume a power law
> distribution of transaction lengths and snapshots for the full length
> of transactions with no think time, then 16 slots per backend is
> enough to make the probability of eviction before commit less than
> 1e-6 and being needed at eviction due to a snapshot about 1:1. In
> the real world very long transactions are more likely than predicted
> model, but at least the normal variation is mostly buffered by this
> size. 16 slots = 128bytes per backend ends up at a 12.5KB buffer for
> the default value of 100 backends, or 125KB for 1000 backends.

Sounds reasonable to me.

> Sparse buffer needs to be at least big enough to fit CSN slots for the
> xids of all active transactions and non-overflowed subtransactions. At
> the current level PGPROC_MAX_CACHED_SUBXIDS=64, the minimum comes out
> at 16 bytes * (64 + 1) slots * 100 =  backends = 101.6KB per buffer,
> or 203KB total in the default configuration.

A CSN is 8 bytes, the XID 4, resulting in 12 bytes per slot. So I guess
the given 16 bytes includes alignment to 8 byte boundaries. Sounds good.

> Performance discussion
> --
> 
> Taking a snapshot is extremely cheap in this scheme, I think the cost
> will be mostly for publishing the csnmin and rechecking validity after
> it. Taking snapshots in the shadow of another snapshot (often the case
> for the proposed MVCC catalog access) will be even cheaper as we don't
> have to advertise the new snapshot. The delayed XID based snapshot
> construction should be unlikely, but even if it isn't the costs are
> similar to GetSnapshotData, but we don't need to hold a lock.
> 
> Visibility checking will also be simpler as for the cases where the
> snapshot is covered by the dense array it only requires two memory
> lookups and comparisons.

Keep in mind, though, that both of these lookups are into shared memory.
Especially the dense ring buffer may well turn into a point of
contention. Or at least the cache lines holding the most recent XIDs
within that ring buffer.

Where as currently, the snapshot's xip array resides in process-local
memory. (Granted, often enough, the proc array also is a point of
contention.)

> At this point I don't see any major issues with this approach. If the
> ensuing discussion doesn't find any major showstoppers then I will
> start to work on a patch bit-by-bit.

Bit-by-bit? Reminds me of punched cards. I don't think we accept patches
in that format.  :-)

> we have a small one in the family.

Congratulations on that one.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for CSN based snapshots

2013-06-10 Thread Markus Wanner
Ants,

the more I think about this, the more I start to like it.

On 06/07/2013 02:50 PM, Ants Aasma wrote:
> On Fri, Jun 7, 2013 at 2:59 PM, Markus Wanner  wrote:
>> Agreed. Postgres-R uses a CommitOrderId, which is very similar in
>> concept, for example.
> 
> Do you think having this snapshot scheme would be helpful for Postgres-R?

Yeah, it could help to reduce patch size, after a rewrite to use such a CSN.

>> Or why do you need to tell apart aborted from in-progress transactions
>> by CSN?
> 
> I need to detect aborted transactions so they can be discared during
> the eviction process, otherwise the sparse array will fill up. They
> could also be filtered out by cross-referencing uncommitted slots with
> the procarray. Having the abort case do some additional work to make
> xid assigment cheaper looks like a good tradeoff.

I see.

>>> Sparse buffer needs to be at least big enough to fit CSN slots for the
>>> xids of all active transactions and non-overflowed subtransactions. At
>>> the current level PGPROC_MAX_CACHED_SUBXIDS=64, the minimum comes out
>>> at 16 bytes * (64 + 1) slots * 100 =  backends = 101.6KB per buffer,
>>> or 203KB total in the default configuration.
>>
>> A CSN is 8 bytes, the XID 4, resulting in 12 bytes per slot. So I guess
>> the given 16 bytes includes alignment to 8 byte boundaries. Sounds good.
> 
> 8 byte alignment for CSNs is needed for atomic if not something else.

Oh, right, atomic writes.

> I think the size could be cut in half by using a base value for CSNs
> if we assume that no xid is active for longer than 2B transactions as
> is currently the case. I didn't want to include the complication in
> the first iteration, so I didn't verify if that would have any
> gotchas.

In Postgres-R, I effectively used a 32-bit order id which wraps around.

In this case, I guess adjusting the base value will get tricky. Wrapping
could probably be used as well, instead.

> The number of times each cache line can be invalidated is
> bounded by 8.

Hm.. good point.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Change authentication error message (patch)

2013-06-18 Thread Markus Wanner
On 06/16/2013 06:02 PM, Joshua D. Drake wrote:
> Instead of pushing extra info to the logs I decided that we could
> without giving away extra details per policy. I wrote the error message
> in a way that tells the most obvious problems, without admitting to any
> of them. Please see attached:

+1 for solving this with a bit of word-smithing.

However, the proposed wording doesn't sound like a full sentence to my
ears, because a password or username cannot fail per-se.

How about:
"password authentication failed or account expired for user \"%s\""

It's a bit longer, but sounds more like a full sentence, no?

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Change authentication error message (patch)

2013-06-19 Thread Markus Wanner
This probably is nit-picking, but it interests me in terms of how the
language is used and understood.

On 06/19/2013 08:55 PM, Joshua D. Drake wrote:
> I believe it actually can. The error message that is returned for a bad
> password, bad user or expired password is all the same. Which is why I
> put the username in there.

Sure, the authentication can fail for all these reasons. What I stumbled
over was the formulation of a "failed username". If an engine fails, it
might literally fall apart. The username itself - even if it doesn't
pass authentication - is not falling apart in the same sense. But does
the username (or the password) fail if authentication with it (in
combination with password and account expiration time) is not possible?
After all, it might still a valid and complete username for another
cluster or another service.

You can probably say: "that username failed" when you actually mean it
"failed to authenticate together with the provided password". Or how do
English native speakers perceive this?

> "Authentication failed or password has expired for user \"%s\""
> 
> Authentication failed covers any combination of a username/password
> being wrong and obviously password expired covers the other.

Works for me. Considering the password to be the thing that expires
(rather than the account) is probably more accurate as well.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Change authentication error message (patch)

2013-06-19 Thread Markus Wanner
On 06/20/2013 12:51 AM, Jeff Janes wrote:
> I think we need to keep the first "password".  "Password authentication"
> is a single thing, it is the authentication method attempted.  It is the
> password method (which includes MD5) which failed, as opposed to the
> LDAP method or the Peer method or one of the other methods.

That's against the rule of not revealing any more knowledge than a
potential attacker already has, no? For that reason, I'd rather go with
just "authentication failed".

> Without this level of explicitness, it might be hard to figure out which
> row in pg_hba.conf was the one that PostgreSQL glommed onto to use for
> authentication.

As argued before, that should go into the logs for diagnosis by the
sysadmin, but should not be revealed to an attacker.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Change authentication error message (patch)

2013-06-20 Thread Markus Wanner
On 06/20/2013 12:27 PM, Marko Tiikkaja wrote:
> My understanding is that the attacker would already have that
> information since the server would have sent an
> AuthenticationMD5Password message to get to the error in the first
> place.  And we still reveal the authentication method to the frontend in
> all other cases ("peer authentication failed", for example).

Oh, right, I wasn't aware of that. Never mind, then.

+1 for keeping it mention "password authentication" explicitly.

However, thinking about this a bit more: Other authentication methods
may also provide password (or even account) expiration times. And may
fail to authenticate a user for entirely different reasons.

Given that, I wonder if "password expired" is such a special case worth
mentioning in case of the "password auth" method. If we go down that
path, don't we also have to include "auth server unreachable" as a
possible cause for authentication failure for methods that use an
external server?

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic background workers

2013-06-20 Thread Markus Wanner
Robert,

On 06/14/2013 11:00 PM, Robert Haas wrote:
> Parallel query, or any subset of that project such as parallel sort,
> will require a way to start background workers on demand.

thanks for continuing this, very much appreciated. Postgres-R and thus
TransLattice successfully use a similar approach for years, now.

I only had a quick glance over the patch, yet. Some comments on the design:

> This requires some communication channel from ordinary workers to the
> postmaster, because it is the postmaster that must ultimately start
> the newly-registered workers.  However, that communication channel has
> to be designed pretty carefully, lest a shared memory corruption take
> out the postmaster and lead to inadvertent failure to restart after a
> crash.  Here's how I implemented that: there's an array in shared
> memory of a size equal to max_worker_processes.  This array is
> separate from the backend-private list of workers maintained by the
> postmaster, but the two are kept in sync.  When a new background
> worker registration is added to the shared data structure, the backend
> adding it uses the existing pmsignal mechanism to kick the postmaster,
> which then scans the array for new registrations.

That sounds like a good simplification. Even if it's an O(n) operation,
the n in question here has relatively low practical limits. It's
unlikely to be much of a concern, I guess.

Back then, I solved it by having a "fork request slot". After starting,
the bgworker then had to clear that slot and register with a coordinator
process (i.e. the original requestor), so that one learned its fork
request was successful. At some point I expanded that to multiple
request slots to better handle multiple concurrent fork requests.
However, it was difficult to get right and requires more IPC than your
approach.

On the pro side: The shared memory area used by the postmaster was very
small in size and read-only to the postmaster. These were my main goals,
which I'm not sure were the best ones, now that I read your concept.

> I have attempted to
> make the code that transfers the shared_memory state into the
> postmaster's private state as paranoid as humanly possible.  The
> precautions taken are documented in the comments.  Conversely, when a
> background worker flagged as BGW_NEVER_RESTART is considered for
> restart (and we decide against it), the corresponding slot in the
> shared memory array is marked as no longer in use, allowing it to be
> reused for a new registration.

Sounds like the postmaster is writing to shared memory. Not sure why
I've been trying so hard to avoid that, though. After all, it can hardly
hurt itself *writing* to shared memory.

> Since the postmaster cannot take locks, synchronization between the
> postmaster and other backends using the shared memory segment has to
> be lockless.  This mechanism is also documented in the comments.  An
> lwlock is used to prevent two backends that are both registering a new
> worker at about the same time from stomping on each other, but the
> postmaster need not care about that lwlock.
> 
> This patch also extends worker_spi as a demonstration of the new
> interface.  With this patch, you can CREATE EXTENSION worker_spi and
> then call worker_spi_launch(int4) to launch a new background worker,
> or combine it with generate_series() to launch a bunch at once.  Then
> you can kill them off with pg_terminate_backend() and start some new
> ones.  That, in my humble opinion, is pretty cool.

It definitely is. Thanks again.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic background workers

2013-06-20 Thread Markus Wanner
On 06/20/2013 04:41 PM, Robert Haas wrote:
> The constant factor is also very small.  Generally, I would expect
> num_worker_processes <~ # CPUs

That assumption might hold for parallel querying, yes. In case of
Postgres-R, it doesn't. In the worst case, i.e. with a 100% write load,
a cluster of n nodes, each with m backends performing transactions, all
of them replicated to all other (n-1) nodes, you end up with ((n-1) * m)
bgworkers. Which is pretty likely to be way above the # CPUs on any
single node.

I can imagine other extensions or integral features like autonomous
transactions that might possibly want many more bgworkers as well.

> and scanning a 32, 64, or even 128
> element array is not a terribly time-consuming operation.

I'd extend that to say scanning an array with a few thousand elements is
not terribly time-consuming, either. IMO the simplicity is worth it,
ATM. It's all relative to your definition of ... eh ... "terribly".

.oO( ... premature optimization ... all evil ... )

> We might
> need to re-think this when systems with 4096 processors become
> commonplace, but considering how many other things would also need to
> be fixed to work well in that universe, I'm not too concerned about it
> just yet.

Agreed.

> One thing I think we probably want to explore in the future, for both
> worker backends and regular backends, is pre-forking.  We could avoid
> some of the latency associated with starting up a new backend or
> opening a new connection in that way.  However, there are quite a few
> details to be thought through there, so I'm not eager to pursue that
> just yet.  Once we have enough infrastructure to implement meaningful
> parallelism, we can benchmark it and find out where the bottlenecks
> are, and which solutions actually help most.

Do you mean pre-forking and connecting to a specific database? Or really
just the forking?

> I do think we need a mechanism to allow the backend that requested the
> bgworker to know whether or not the bgworker got started, and whether
> it unexpectedly died.  Once we get to the point of calling user code
> within the bgworker process, it can use any number of existing
> mechanisms to make sure that it won't die without notifying the
> backend that started it (short of a PANIC, in which case it won't
> matter anyway).  But we need a way to report failures that happen
> before that point.  I have some ideas about that, but decided to leave
> them for future passes.  The remit of this patch is just to make it
> possible to dynamically register bgworkers.  Allowing a bgworker to be
> "tied" to the session that requested it via some sort of feedback loop
> is a separate project - which I intend to tackle before CF2, assuming
> this gets committed (and so far nobody is objecting to that).

Okay, sounds good. Given my background, I considered that a solved
problem. Thanks for pointing it out.

>> Sounds like the postmaster is writing to shared memory. Not sure why
>> I've been trying so hard to avoid that, though. After all, it can hardly
>> hurt itself *writing* to shared memory.
> 
> I think there's ample room for paranoia about postmaster interaction
> with shared memory, but all it's doing is setting a flag, which is no
> different from what CheckPostmasterSignal() already does.

Sounds good to me.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash partitioning.

2013-06-26 Thread Markus Wanner
On 06/25/2013 11:52 PM, Kevin Grittner wrote:
> At least until we have parallel
> query execution.  At *that* point this all changes.

Can you elaborate on that, please? I currently have a hard time
imagining how partitions can help performance in that case, either. At
least compared to modern RAID and read-ahead capabilities.

After all, RAID can be thought of as hash partitioning with a very weird
hash function. Or maybe rather range partitioning on an internal key.

Put another way: ideally, the system should take care of optimally
distributing data across its physical storage itself. If you need to do
partitioning manually for performance reasons, that's actually a
deficiency of it, not a feature.

I certainly agree that manageability may be a perfectly valid reason to
partition your data. Maybe there even exist other good reasons. I don't
think performance optimization is one. (It's more like giving the system
a hint. And we all dislike hints, don't we? *ducks*)

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash partitioning.

2013-06-26 Thread Markus Wanner
On 06/26/2013 04:10 PM, Yuri Levinsky wrote:
> You typically don't want to use b-tree index when yo select
> more when ~1-2% of your data. 

Agreed. Indices on columns with very low selectivity don't perform well.
(Postgres knows that and uses a sequential scan based on selectivity
estimates. Being able to eliminate entire partitions from such a seq
scan would certainly be beneficial, yes.)

In the Postgres world, though, I think CLUSTERing might be the better
approach to solve that problem. (Note: this has nothing to do with
distributed systems in this case.) I'm not sure what the current status
of auto clustering or optimized scans on such a permanently clustered
table is, though.

The minmax indices proposed for 9.4 might be another feature worth
looking at.

Both of these approaches may eventually provide a more general and more
automatic way to speed up scans on large portions of a relation, IMO.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash partitioning.

2013-06-26 Thread Markus Wanner
On 06/26/2013 04:01 PM, k...@rice.edu wrote:
> I think he is referring to the fact that with parallel query execution,
> multiple partitions can be processed simultaneously instead of serially
> as they are now with the resulting speed increase.

Processing simultaneously is the purpose of parallel query execution,
yes. But I see no reason for that not to work equally well for
unpartitioned tables.

Disk I/O is already pretty well optimized and parallelized, I think.
Trying to parallelize a seq scan on the Postgres side is likely to yield
far inferior performance.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Kudos for Reviewers -- straw poll

2013-06-26 Thread Markus Wanner
On 06/25/2013 08:26 PM, Andres Freund wrote:
> It's not about the reviewers being less. It's a comparison of
> effort. The effort for a casual review simply isn't comparable with the
> effort spent on developing a nontrivial patch.

Remember: "Debugging is twice as hard as writing the code in the first
place. ..." (Brian Kernighan)

IMO, the kind of reviews we need are of almost "debug level" difficulty.
(To the point where the reviewer becomes a co-author or even takes over
and submits a completely revamped patch instead.)

I agree that the casual review is several levels below that, so your
point holds. I doubt we need more reviews of that kind, though.

Thus, I'm in the AAB camp. The remaining question being: What's the
criterion for becoming a co-author (and thus getting mentioned in the
release notes)?

If at all, we should honor quality work with a "prize". Maybe a price
for the best reviewer per CF? Maybe even based on votes from the general
public on who's been the best, so reviews gain attention that way...
"Click here to vote for my review." ... Maybe a crazy idea.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash partitioning.

2013-06-26 Thread Markus Wanner
On 06/26/2013 05:46 PM, Heikki Linnakangas wrote:
> We could also allow a large query to search a single table in parallel.
> A seqscan would be easy to divide into N equally-sized parts that can be
> scanned in parallel. It's more difficult for index scans, but even then
> it might be possible at least in some limited cases.

So far reading sequentially is still faster than hopping between
different locations. Purely from the I/O perspective, that is.

For queries where the single CPU core turns into a bottle-neck and which
we want to parallelize, we should ideally still do a normal, fully
sequential scan and only fan out after the scan and distribute the
incoming pages (or even tuples) to the multiple cores to process.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash partitioning.

2013-06-27 Thread Markus Wanner
On 06/27/2013 11:12 AM, Nicolas Barbier wrote:
> Imagine that there are a lot of indexes, e.g., 50. Although a lookup
> (walking one index) is equally fast, an insertion must update al 50
> indexes. When each index requires one extra I/O (because each index is
> one level taller), that is 50 extra I/Os. In the partitioned case,
> each index would require the normal smaller amount of I/Os. Choosing
> which partition to use must only be done once: The result “counts” for
> all indexes that are to be updated.

I think you're underestimating the cost of partitioning. After all, the
lookup of what index to update for a given partition is a a lookup in
pg_index via pg_index_indrelid_index - a btree index.

Additionally, the depth of an index doesn't directly translate to the
number of I/O writes per insert (or delete). I'd rather expect the avg.
number of I/O writes per insert into a b-tree to be reasonably close to
one - depending mostly on the number of keys per page, not depth.

> Additionally: Imagine that the data can be partitioned along some
> column that makes sense for performance reasons (e.g., some “date”
> where most accesses are concentrated on rows with more recent dates).
> The other indexes will probably not have such a performance
> distribution. Using those other indexes (both for look-ups and
> updates) in the non-partitioned case, will therefore pull a huge
> portion of each index into cache (because of the “random distribution”
> of the non-date data). In the partitioned case, more cache can be
> spent on the indexes that correspond to the “hot partitions.”

That's a valid point, yes. I'd call this index partitioning. And with
partial indices, Postgres already has something that gets pretty close,
I think. Though, I don't consider this to be related to how the tuples
of the relation are laid out on disk.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash partitioning.

2013-06-27 Thread Markus Wanner
On 06/27/2013 06:35 PM, Nicolas Barbier wrote:
> I am assuming that this (comparatively very small and super-hot) index
> is cached all the time, while for the other indexes (that are
> supposedly super-huge) only the top part stays cached.
> 
> I am mostly just trying to find out where Yuri’s “partitioning is
> needed for super-huge tables” experience might come from, and noting
> that Heikki’s argument might not be 100% valid.

I think the OP made that clear by stating that his index has relatively
low selectivity. That seems to be a case that Postgres doesn't handle
very well.

> I think that the
> “PostgreSQL-answer” to this problem is to somehow cluster the data on
> the “hotness column” (so that all hot rows are close to one another,
> thereby improving the efficiency of the caching of relation blocks) +
> partial indexes for the hot rows (as first mentioned by Claudio; to
> improve the efficiency of the caching of index blocks).

Agreed, sounds like a sane strategy.

> My reasoning was: To determine which index block to update (typically
> one in both the partitioned and non-partitioned cases), one needs to
> walk the index first, which supposedly causes one additional (read)
> I/O in the non-partitioned case on average, because there is one extra
> level and the lower part of the index is not cached (because of the
> size of the index). I think that pokes a hole in Heikki’s argument of
> “it really doesn’t matter, partitioning == using one big table with
> big non-partial indexes.”

Heikki's argument holds for the general case, where you cannot assume a
well defined hot partition. In that case, the lowest levels of all the
b-trees of the partitions don't fit in the cache, either. A single index
performs better in that case, because it has lower overhead.

I take your point that in case you *can* define a hot partition and
apply partitioning, the hot(test) index(es) are more likely to be cached
and thus require less disk I/O.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash partitioning.

2013-06-27 Thread Markus Wanner
On 06/27/2013 11:13 PM, Jeff Janes wrote:
> Wouldn't any IO system being used on a high-end system be fairly good
> about making this work through interleaved read-ahead algorithms?

To some extent, certainly. It cannot possibly get better than a fully
sequential load, though.

> That sounds like it would be much more susceptible to lock contention,
> and harder to get bug-free, than dividing into bigger chunks, like whole
> 1 gig segments.  

Maybe, yes. Splitting a known amount of work into equal pieces sounds
like a pretty easy parallelization strategy. In case you don't know the
total amount of work or the size of each piece in advance, it gets a bit
harder. Choosing chunks that turn out to be too big certainly hurts.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Review: extension template

2013-07-05 Thread Markus Wanner
e place. However,
that's reasonably far down the wish-list.

[ Also note the nifty difference between "extension already available"
vs "extension already exists". The former seems to mean the "template"
exists, while the latter refers to the "instantiation". ]

However, the other way around cannot be prevented by Postgres: Files
representing an extension (template, if you want) can always be created
alongside a pre-existing system catalog installation. Postgres has no
way to prevent that (other than ignoring the files, maybe, but...). It
seems the file system variant takes precedence over anything
pre-existing in the system catalogs. This should be documented.

The documentation for ALTER TEMPLATE FOR EXTENSION says: "Currently, the
only supported functionality is to change the template's default
version." I don't understand what that's supposed to mean. You can
perfectly well rename and alter the template's code.

I didn't look too deep into the code, but it seems Jaime and Hitoshi
raised some valid points.

Assorted very minor nit-picks:

In catalog/objectaddress.c, get_object_address_unqualified(): the case
OBJECT_TABLESPACE is being moved down. That looks like an like an
unintended change. Please revert.

src/backend/commands/event_trigger.c, definition of
event_trigger_support: several unnecessary whitespaces added. These make
it hard(er) than necessary to review the patch.

Regards

Markus Wanner


[1]: CF: Extension Templates
https://commitfest.postgresql.org/action/patch_view?id=1032

[2]: Dimitri's github branch 'tmpl4':
https://github.com/dimitri/postgres/tree/tmpl4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposal: template-ify (binary) extensions

2013-07-05 Thread Markus Wanner
Hi,

let me elaborate on an idea I had to streamline extension templates. As
I wrote in my recent review [1], I didn't have the mental model of a
template in mind for extensions, so far.

However, writing the review, I came to think of it. I certainly agree
that extensions which do not carry a binary executable can be considered
templates. The SQL scripts can be deleted after creation of the
extension in the database, because the objects created with them are an
instantiation in the database itself. They do not depend on the template.

That's not the case for compiled, executable code that usually lives in
a shared library on the filesystem. There, the "instantiation" of the
"template" merely consists of a link to the library on the file-system.
If you remove that file, you break the extension.

One way to resolve that - and that seems to be the direction Dimitri's
patch is going - is to make the extension depend on its template. (And
thereby breaking the mental model of a template, IMO. In the spirit of
that mental model, one could argue that code for stored procedures
shouldn't be duplicated, but instead just reference its ancestor.)

The other possible resolution is to make all extensions fit the template
model, i.e. make extensions independent of their templates.

This obviously requires Postgres to internalize the compiled binary code
of the library upon instantiation. (Storing the binary blobs by hash in
a shared catalog could easily prevent unwanted duplication between
databases.)

Advantages:
 - Consistent mental model: template
 - Closes the gap between how SQL scripts and binary parts of
   an extension are handled. (Or between binary and non-binary
   extensions.)
 - Compatibility and co-existence between file-system and
   system catalog based extension templates is simplified.
 - Independence of extensions from library files shipped by
   3rd party extensions (those would only ship a template, not
   the extension per se).
 - Paves the way to uploading extensions that carry a binary,
   executable payload via libpq.

Challenges:
 - CPU Architecture must be taken into account for backups. Restoring
   a backup on a different architecture would still require the
   (external) binary code for that specific architecture, because we
   cannot possibly store binaries for all architectures.
 - A bit of a mind shift for how binary extensions work.


Regards

Markus Wanner



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: extension template

2013-07-05 Thread Markus Wanner
On 07/05/2013 09:05 PM, Markus Wanner wrote:
> The patch has already
> been marked as 'returned with feedback', and I can support that
> resolution (for this CF).

Oops.. I just realize it's only set to "waiting on author", now. I guess
I confused the two states. Please excuse my glitch.

Dimitri, do you agree to resolve with "returned with feedback" for this
CF? Or how would you like to proceed?

Josh, thanks for linking the review in the CF.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: extension template

2013-07-07 Thread Markus Wanner
elling point for this feature?
> 
> The main issue to fix when you want to have that feature, which I want
> to have, is how to define a sane pg_dump policy around the thing. As we
> couldn't define that in the last rounds of reviews, we decided not to
> allow the case yet.

I bet that's because people have different mental models in mind. But I
probably stressed that point enough by now...

> I think that's a fair remark that we want to get there eventually, and
> that like the superuser only limitation, that's something for a future
> patch and not this one.

I agree to defer a specific implementation. I disagree in that we should
*not* commit something that follows a mixture of mental models
underneath and sets in stone a strange API we later need to be
compatible with.

Specifically, I request to either follow the template model more closely
(accompanied by a separate patch to adjust binary, out-of-line
templates) or follow the link model more closely. The current naming
doesn't match any of the two, so renaming seems inevitable.

> Yeah, we might want to find some better naming for the on-file extension
> "templates". We can't just call them extension templates though because
> that is the new beast implemented in that patch and it needs to be part
> of your pg_dump scripts, while the main feature of the file based kind
> of templates is that they are *NOT* part of your dump.

That's one issue where the mixture of concepts shines through, yeah. I
fear simply (re)naming things is not quite enough, at this point.
Especially because the thing on the file-system is closer to being a
template than the system catalog based variant.

[ Maybe we could even go with "template extensions" being the
file-system ones vs. "inline extensions" being the system catalog
ones...  In that case, they would follow different mental models. Which
might even be a reasonable solution. However, we need to be aware of
that difference and document it properly, so that users have a chance to
grasp the nifty difference. I'd rather prefer to eliminate such nifty
differences, though. ]

> It's just something I forgot to cleanup when completing the feature set.
> Cleaned up in my git branch.

Great!

>> src/backend/commands/event_trigger.c, definition of
>> event_trigger_support: several unnecessary whitespaces added. These make
>> it hard(er) than necessary to review the patch.
> 
> Here with the following command I see no problem, please advise:
> 
> git diff --check --color-words postgres/master.. -- 
> src/backend/commands/event_trigger.c

That's why I personally don't trust git. Please look at the
patch you submitted.

> Markus Wanner  writes:
>> Dimitri, do you agree to resolve with "returned with feedback" for this
>> CF? Or how would you like to proceed?
> 
> I'd like to proceed, it's the 3rd development cycle that I'm working on
> this idea

Okay, I'm certainly not opposed to that and welcome further development.
I didn't change the status, so it should still be "waiting on author",
which seems reasonable to me, ATM.

> (used to be called "Inline Extensions", I also had a selective
> pg_dump patch to allow dumping some extension scripts, etc). I really
> wish we would be able to sort it out completely in this very CF and
> that's why I'm not proposing any other patch this time.

I understand it's tedious work, sometimes. Please look at this as
incremental progress. I would not have been able to reason about mental
models without your patch. Thanks for that, it was a necessary and good
step from my point of view. Please keep up the good work!

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: extension template

2013-07-07 Thread Markus Wanner
On 07/06/2013 10:30 PM, Dimitri Fontaine wrote:
> I still think that we shouldn't allow creating a template for an
> extension that is already installed, though. Do you have any suggestions
> for a better error message?

Oh, I just realize that pg_extension_{template,control,uptmpl} are not
SHARED catalogs, but you need to install the template per-database and
then need to enable it - per-database *again*. Why is that?

If you want to just upload extensions to a database via libpq, that
should be a single step (maybe rather just CREATE EXTENSION ... AS ...)
If you want "templates", those should be applicable to all databases, no?

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: extension template

2013-07-07 Thread Markus Wanner
On 07/07/2013 02:55 PM, Markus Wanner wrote:
> If you want to just upload extensions to a database via libpq..

Let's rephrase this in a (hopefully) more constructive way: I get the
impression you are trying to satisfy many different needs. Way more that
you need to scratch your own itch. To the point that I had trouble
figuring out what exactly the goal of your patch is.

My advice would be: Be brave! Dare to put down any request for
"templates" (including mine) and go for the simplest possible
implementation that allows you to create extensions via libpq. (Provided
that really is the itch you want to scratch. I'm still not quite sure I
got that right.)

As it stands, I get the impression the patch is trying to sell me a
feature that it doesn't really provide. If you stripped the template
stuff, including the CREATE TEMPLATE FOR EXTENSION command, but just
sold me this patch as a simple way to create an extension via libpq,
i.e. something closer to CREATE EXTENSION AS .. , I would immediately
buy that. Currently, while allowing an upload, it seems far from simple,
but adds quite a bit of unwanted complexity. If all I want is to upload
code for an extension via libpq, I don't want to deal with nifty
distinctions between templates and extensions.

Just my opinion, though. Maybe I'm still missing something.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: extension template

2013-07-07 Thread Markus Wanner
Hello Dimitri,

On 07/07/2013 09:51 PM, Dimitri Fontaine wrote:
> Markus Wanner  writes:
>> Oh, I just realize that pg_extension_{template,control,uptmpl} are not
>> SHARED catalogs, but you need to install the template per-database and
>> then need to enable it - per-database *again*. Why is that?
> 
> Because the current model is not serving us well enough, with a single
> module version per major version of PostgreSQL. Meaning for all the
> clusters on the server, and all the databases in them.

That's not an excuse for letting the user duplicate the effort of
installing templates for every version of his extension in every database.

> We want to be able to have postgis 1.5 and 2.x installed in two
> different databases in the same cluster, don't we?

The extension, yes. The template versions, no. There's utterly no point
in having different 2.0 versions of the same extension in different
databases.

> Well the current patch we still can't because of the dynamically shared
> object side of things, but that's not a good reason to impose the same
> limitation on to the "template" idea.

Without a dynamically shared object, you can well have different
versions of an extension at work in different databases already.

> After playing around with several ideas around that in the past two
> development cycles, the community consensus clearly is that "extensions"
> are *NEVER* going to be part of your dump scripts.

Sounds strange to me. If you want Postgres to manage extensions, it
needs the ability to backup and restore them. Otherwise, it doesn't
really ... well ... manage them.

> Now when using templates you have no other source to install the
> extensions from at pg_restore time, given what I just said.
> 
> The whole goal of the "template" idea is to offer a way to dump and
> restore the data you need for CREATE EXTENSION to just work at restore
> time, even when you sent the data over the wire.

Which in turn violates the above cited community consensus, then. You
lost me here. What's your point? I thought the goal of your patch was
the ability to upload an extension via libpq.

How does that address my concerns about usability and understandability
of how these things work? Why the strange dependencies between templates
and extensions? Or the ability to rename the template, but not the
extension - while still having the later depend on the former?

These things are what I'm opposing to. And I don't believe it
necessarily needs to be exactly that way for dump and restore to work.
Quite the opposite, in fact. Simpler design usually means simpler backup
and restore procedures.

> Current extension are managed on the file system, the contract is that
> it is the user's job to maintain and ship that, externally to PostgreSQL
> responsibilities. All that PostgreSQL knows about is to issue the CREATE
> EXTENSION command at pg_restore time.
> 
> With templates or in-line extensions, the contract is that the user asks
> PostgreSQL to manage its extensions in the same way it does for the
> other objects on the system.

Understood.

> The design we found to address that is
> called "Extension Templates" and is implemented in the current patch.

I placed my concerns with the proposed implementation. It's certainly
not the only way how Postgres can manage its extensions. And I still
hope we can come up with something that's simpler to use and easier to
understand.

However, I'm not a committer nor have I written code for this. I did my
review and proposed two possible (opposite) directions for clean up and
simplification of the design. I would now like others to chime in.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: extension template

2013-07-08 Thread Markus Wanner
On 06/10/2013 09:43 PM, Hannu Krosing wrote:
> On 07/08/2013 09:26 AM, Heikki Linnakangas wrote:
>> The concept was useful, but not something we want
>> to call an extension, because the distinguishing feature of an
>> extension is that it lives outside the database and is *not* included
>> in pg_dump.
> Either MODULE or PACKAGE would be good name candidates.
> 
> Still, getting this functionality in seems more important than exact
> naming, though naming them "right" would be nice.

Remember that we already have quite a lot of extensions. And PGXN. Are
we really so wedded to the idea of extensions "living" outside of the
database that we need to come up with something different and incompatible?

Or do you envision modules or packages to be compatible with extensions?
Just putting another label on it so we can still claim extensions are
strictly external to the database? Sorry, I don't get the idea, there.

>From a users perspective, I want extensions, modules, or packages to be
managed somehow. Including upgrades, migrations (i.e. dump & restore)
and removal. The approach of letting the distributors handle that
packaging clearly has its limitations. What's so terribly wrong with
Postgres itself providing better tools to manage those?

Inventing yet another type of extension, module or package (compatible
or not) doesn't help, but increases confusion even further. Or how do
you explain to an author of an existing extension, whether or not he
should convert his extension to a module (if you want those to be
incompatible)?

If it's the same thing, just with different loading mechanisms, please
keep calling it the same: an extension. (And maintain compatibility
between the different ways to load it.)

I fully agree with the fundamental direction of Dimitri's patch. I think
Postgres needs to better manage its extensions itself. Including dump
and restore cycles. However, I think the implementation isn't optimal,
yet. I pointed out a few usability issues and gave reasons why
"template" is a misnomer (with the proposed implementation). "Extension"
is not.

(I still think "template" would be a good mental model. See my other
thread...
http://archives.postgresql.org/message-id/51d72c1d.7010...@bluegap.ch)

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: extension template

2013-07-08 Thread Markus Wanner
On 07/08/2013 10:20 AM, Dimitri Fontaine wrote:
> Bypassing the file system entirely in order to install an extension. As
> soon as I figure out how to, including C-coded extensions.

Do I understand correctly that you want to keep the extensions (or their
templates) out of the dump and require the user to "upload" it via libpq
prior to the restor; instead of him having to install them via .deb or .rpm?

This would explain why you keep the CREATE TEMPLATE FOR EXTENSION as a
separate step from CREATE EXTENSION. And why you, too, insist on wanting
templates, and not just a way to create an extension via libpq.

However, why don't you follow the template model more closely? Why
should the user be unable to create a template, if there already exists
an extension of the same name? That's an unneeded and disturbing
limitation, IMO.

My wish: Please drop the pg_depend link between template and extension
and make the templates shared across databases. So I also have to
install the template only once per cluster. Keep calling them templates,
then. (However, mind that file-system extension templates are templates
as well. In-line vs. out-of-line templates, if you want.)

I think you could then safely allow an upgrade of an extension that has
been created from an out-of-line template by an upgrade script that
lives in-line. And vice-versa. Just as an example. It all gets nicer and
cleaner, if the in-line thing better matches the out-of-line one, IMO.

An extension should look and behave exactly the same, independent of
what kind of template it has been created from. And as we obviously
cannot add a pg_depend link to a file on the file system, we better
don't do that for the in-line variant, either, to maintain the symmetry.

> The main feature of the extensions system is its ability to have a clean
> pg_restore process even when you use some extensions. That has been the
> only goal of the whole feature development.

Great! Very much appreciated.

> Let me stress that the most important value in that behavior is to be
> able to pg_restore using a newer version of the extension, the one that
> works with the target major version. When upgrading from 9.2 to 9.3 if
> you depend on keywords that now are reserved you need to install the
> newer version of the extension at pg_restore time.
> 
> The main features I'm interested into beside a clean pg_restore are
> UPDATE scripts for extensions and dependency management, even if that
> still needs improvements. Those improvements will be relevant for both
> ways to make extensions available for your system.

+1

> We can not use the name "module" for anything else, IMNSHO.

Agreed.

> The main goal here is not to have the extension live inside the database
> but rather to be able to bypass using the server's filesystem in order
> to be able to CREATE EXTENSION foo; and then to still have pg_restore do
> the right thing on its own.

Note that with the current, out-of-line approach, the *extension*
already lives inside the database. It's just the *template*, that
doesn't. (Modulo DSO, but the patch doesn't handle those either, yet. So
we're still kind of excluding those.)

Allowing for templates to live inside the database as well is a good
thing, IMO.

> If you want to scratch the new catalogs part, then just say that it's
> expected to be really complex to pg_restore a database using extensions,
> back to exactly how it was before 9.1: create the new database, create
> the extensions your dump depends on in that new database, now pg_restore
> your backup manually filtering away the extensions' objects or ignoring
> the errors when pg_restore tries to duplicate functions you already
> installed in the previous step. No fun.

Definitely not. Nobody wants to go back there. (And as Heikki pointed
out, if you absolutely want to, you can even punish yourself that way.)

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: extension template

2013-07-08 Thread Markus Wanner
Hello Dimitri,

On 07/08/2013 11:49 AM, Dimitri Fontaine wrote:
> Please find attached to this mail version 9 of the Extension Templates
> patch with fixes for the review up to now.

Thanks, cool.

> Markus Wanner  writes:
>>> I still think that we shouldn't allow creating a template for an
>>> extension that is already installed, though. Do you have any suggestions
>>> for a better error message?
>>
>> If we go for the template model, I beg to differ. In that mind-set, you
>> should be free to create (or delete) any kind of template without
>> affecting pre-existing extensions.
> 
> Then what happens at pg_restore time? the CREATE EXTENSION in the backup
> script will suddenly install the other extension's that happen to have
> the same name? I think erroring out is the only safe way here.

Extensions are commonly identified by name (installed ones as well as
available ones, i.e. templates).

Thus I think if a user renames a template, he might have good reasons to
do so. He likely *wants* it to be a template for a different extension.
Likewise when (re-)creating a template with the very same name of a
pre-existing, installed extension.

Maybe the user just wanted to make a "backup" of the template prior to
modifying it. If he then gives the new template the same name as the old
one had, it very likely is similar, compatible or otherwise intended to
replace the former one.

The file-system templates work exactly that way (modulo DSOs). If you
create an extension, then modify (or rename and re-create under the same
name) the template on disk, then dump and restore, you end up with the
new version of it. That's how it worked so far. It's simple to
understand and use.

> It's all about pg_restore really, but it's easy to forget about that and
> get started into other views of the world. I'll try to be better at not
> following those tracks and just hammer my answers with "pg_restore" now.

That's unlikely to be of much help. It's not like pg_restore would stop
to work. It would just work differently. More like the file-system
templates. More like the users already knows and (likely) expects it.
And more like the "template" model that you advertise.

Or how do you think would pg_restore fail, if you followed the mental
model of the template?

>> In any case, I'm arguing this "template" renaming behavior (and the
>> subsequent error) are the wrong thing to do, anyways. Because an
>> extension being linked to a parent of a different name is weird and IMO
>> not an acceptable state.
> 
> Yes, you're right on spot here. So I've amended the patch to implement
> the following behavior (and have added a new regression test):
> 
>   # CREATE TEMPLATE FOR EXTENSION foo VERSION 'v' AS '';
>   # CREATE EXTENSION foo;
>   # ALTER TEMPLATE FOR EXTENSION foo RENAME TO bar;
>   ERROR:  55006: template for extension "foo" is in use
>   DETAIL:  extension "foo" already exists
>   LOCATION:  AlterExtensionTemplateRename, template.c:1040
>   STATEMENT:  ALTER TEMPLATE FOR EXTENSION foo RENAME TO bar;

Okay, good, this prevents the strange state.

However, this also means you restrict the user even further... How can
he save a copy of an existing template, before (re-)creating it with
CREATE TEMPLATE FOR EXTENSION?

On the file system, a simple cp or mv is sufficient before
(re)installing the package from your distribution, for example.

>> I bet that's because people have different mental models in mind. But I
>> probably stressed that point enough by now...
> 
> FWIW, I do agree.

Good. Why do you continue to propose the link model?

> But my understanding is that the community consensus
> is not going that way.

What way? And what community consensus?

Re-reading some of the past discussions, I didn't find anybody voting
for a dependency between the template and the created extension. And at
least Tom pretty clearly had the template model in mind, when he wrote
[1]: "We don't want it to look like manipulating a template has anything
to do with altering an extension of the same name (which might or might
not even be installed)." or [2]: "But conflating this functionality
[i.e. extension templates] with installed extensions is just going to
create headaches."

The closest I found was Robert Haas mentioning [3], that "[he doesn't]
see a problem having more than one kind of extensions". However, please
mind the context. He doesn't really sound enthusiastic, either.

I'm puzzled about some of your words in that thread. In the very message
Robert responded to, you wrote [4]: "Having more than one way to ship an
extension is good, having two different animals with two

Re: [HACKERS] Millisecond-precision connect_timeout for libpq

2013-07-09 Thread Markus Wanner
On 07/08/2013 08:31 PM, ivan babrou wrote:
> Seriously, I don't get why running 150 poolers is easier.

Did you consider running pgbouncer on the database servers?

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Millisecond-precision connect_timeout for libpq

2013-07-09 Thread Markus Wanner
On 07/09/2013 09:15 AM, ivan babrou wrote:
> Database server lost network — boom, 2 seconds delay. What's the point then?

Oh, I see. Good point. It could still improve connection time during
normal operation, though.

None the less, I now agree with you: we recommend a pooler, which may be
capable of millisecond timeouts, but arguably is vastly more complex
than the proposed patch. And it even brings its own set of gotchas (lots
of connections). I guess I don't quite buy the complexity argument, yet.

Sure, gettimeofday() is subject to clock adjustments. But so is time().
And if you're setting timeouts that low, you probably know what you're
doing (or at least care about latency a lot). Or is gettimeofday() still
considerably slower on certain architectures or in certain scenarios?
Where's the complexity?

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Millisecond-precision connect_timeout for libpq

2013-07-09 Thread Markus Wanner
Ian,

On 07/05/2013 07:28 PM, ivan babrou wrote:
> - /*
> -  * Rounding could cause connection to fail; need at 
> least 2 secs
> -  */

You removed this above comment... please check why it's there. The
relevant revision seems to be:

###
commit 2908a838ac2cf8cdccaa115249f8399eef8a731e
Author: Tom Lane 
Date:   Thu Oct 24 23:35:55 2002 +

Code review for connection timeout patch.  Avoid unportable assumption
that tv_sec is signed; return a useful error message on timeout failure;
honor PGCONNECT_TIMEOUT environment variable in PQsetdbLogin; make code
obey documentation statement that timeout=0 means no timeout.
###

> - if (timeout < 2)
> - timeout = 2;
> - /* calculate the finish time based on start + timeout */
> - finish_time = time(NULL) + timeout;
> + gettimeofday(&finish_time, NULL);
> + finish_time.tv_usec += (int) timeout_usec;

I vaguely recall tv_usec only being required to hold values up to
100 by some standard. A signed 32 bit value would qualify, but only
hold up to a good half hour worth of microseconds. That doesn't quite
seem enough to calculate finish_time the way you are proposing to do it.

> + finish_time.tv_sec  += finish_time.tv_usec / 100;
> + finish_time.tv_usec  = finish_time.tv_usec % 100;
>   }
>   }
>  
> @@ -1073,15 +1074,15 @@ pqSocketPoll(int sock, int forRead, int forWrite, 
> time_t end_time)
>   input_fd.events |= POLLOUT;
>  
>   /* Compute appropriate timeout interval */
> - if (end_time == ((time_t) -1))
> + if (end_time == NULL)
>   timeout_ms = -1;
>   else
>   {
> - time_t  now = time(NULL);
> + struct timeval now;
> + gettimeofday(&now, NULL);
>  
> - if (end_time > now)
> - timeout_ms = (end_time - now) * 1000;
> - else
> + timeout_ms = (end_time->tv_sec - now.tv_sec) * 1000 + 
> (end_time->tv_usec - now.tv_usec) / 1000;

I think that's incorrect on a platform where tv_sec and/or tv_usec is
unsigned. (And the cited commit above indicates there are such platforms.)


On 07/09/2013 02:25 PM, ivan babrou wrote:
> There's no complexity here :)

Not so fast, cowboy...  :-)

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: extension template

2013-07-09 Thread Markus Wanner
Dimitri,

leaving the template vs link model aside for a moment, here are some
other issues I run into. All under the assumption that we want the link
model.

On 07/08/2013 11:49 AM, Dimitri Fontaine wrote:
> Please find attached to this mail version 9 of the Extension Templates
> patch with fixes for the review up to now.

First of all, I figured that creation of a template of a newer version
is prohibited in case an extension exists:

> db1=# CREATE TEMPLATE FOR EXTENSION foo VERSION '0.0' AS $foo$ SELECT 1; 
> $foo$;
> CREATE TEMPLATE FOR EXTENSION
> db1=# CREATE EXTENSION foo;
> CREATE EXTENSION
> db1=# CREATE TEMPLATE FOR EXTENSION foo VERSION '0.1' AS $foo$ SELECT 2; 
> $foo$;
> ERROR:  extension "foo" already exists

Why is that?

I then came to think of the upgrade scripts... what do we link against
if an extension has been created from some full version and then one or
more upgrade scripts got applied?

Currently, creation of additional upgrade scripts are not blocked. Which
is a good thing, IMO. I don't like the block on newer full versions.
However, the upgrade doesn't seem to change the dependency, so you can
still delete the update script after the update. Consider this:

> db1=# CREATE TEMPLATE FOR EXTENSION foo VERSION '0.0' AS $$ $$;
> CREATE TEMPLATE FOR EXTENSION
> db1=# CREATE EXTENSION foo;
> CREATE EXTENSION
> db1=# CREATE TEMPLATE FOR EXTENSION foo FROM '0.0' TO '0.1' AS $$ $$;
> CREATE TEMPLATE FOR EXTENSION
> db1=# ALTER EXTENSION foo UPDATE TO '0.1';
> ALTER EXTENSION
> db1=# SELECT * FROM pg_extension;
>  extname | extowner | extnamespace | extrelocatable | extversion | extconfig 
> | extcondition 
> -+--+--+++---+--
>  plpgsql |   10 |   11 | f  | 1.0|   
> | 
>  foo |   10 | 2200 | f  | 0.1|   
> | 
> (2 rows)
> 
> db1=# DROP TEMPLATE FOR EXTENSION foo FROM '0.0' TO '0.1';
> DROP TEMPLATE FOR EXTENSION
>> db1=# SELECT * FROM pg_extension;
>>  extname | extowner | extnamespace | extrelocatable | extversion | extconfig 
>> | extcondition 
>> -+--+--+++---+--
>>  plpgsql |   10 |   11 | f  | 1.0|   
>> | 
>>  foo |   10 | 2200 | f  | 0.1|   
>> | 
>> (2 rows)
>> 

In this state, extension foo as of version '0.1' is installed, but
running this through dump & restore, you'll only get back '0.0'.

Interestingly, the following "works" (in the sense that the DROP of the
upgrade script is prohibited):

> db1=# CREATE TEMPLATE FOR EXTENSION foo VERSION '0.0' AS $$ $$;
> CREATE TEMPLATE FOR EXTENSION
> db1=# CREATE TEMPLATE FOR EXTENSION foo FROM '0.0' TO '0.1' AS $$ $$;
> CREATE TEMPLATE FOR EXTENSION
> db1=# CREATE EXTENSION foo VERSION '0.1';
> CREATE EXTENSION
> db1=# DROP TEMPLATE FOR EXTENSION foo FROM '0.0' TO '0.1';
> ERROR:  cannot drop update template for extension foo because other objects 
> depend on it
> DETAIL:  extension foo depends on control template for extension foo
> HINT:  Use DROP ... CASCADE to drop the dependent objects too.

However, in that case, you are free to drop the full version:

> db1=# DROP TEMPLATE FOR EXTENSION foo VERSION '0.0';
> DROP TEMPLATE FOR EXTENSION

This certainly creates a bad state that leads to an error, when run
through dump & restore.


Maybe this one...

> db1=# CREATE TEMPLATE FOR EXTENSION foo VERSION '0.0' AS $$ $$;
> CREATE TEMPLATE FOR EXTENSION
> db1=# CREATE TEMPLATE FOR EXTENSION foo FROM '0.0' TO '0.1' AS $$ $$;
> CREATE TEMPLATE FOR EXTENSION
> db1=# DROP TEMPLATE FOR EXTENSION foo VERSION '0.0';
> DROP TEMPLATE FOR EXTENSION

... should already err out here ...

> db1=# CREATE EXTENSION foo;
> ERROR:  Extension "foo" is not available from 
> "/tmp/pginst/usr/local/pgsql/share/extension" nor as a template
> db1=# CREATE EXTENSION foo VERSION '0.1';
> ERROR:  Extension "foo" is not available from 
> "/tmp/pginst/usr/local/pgsql/share/extension" nor as a template

... and not only here.

I.e. the TO version should probably have a dependency on the FROM
version (that might even be useful in the template model).


Another thing that surprised me is the inability to have an upgrade
script *and* a full version (for the same extension target version).
Even if that's intended behavior, the e

Re: [HACKERS] Review: extension template

2013-07-09 Thread Markus Wanner
Salut Dimitri,

On 07/09/2013 12:40 PM, Dimitri Fontaine wrote:
> Markus Wanner  writes:
>> Or how do you think would pg_restore fail, if you followed the mental
>> model of the template?
> 
>   # create template for extension foo version 'x' as '';
>   # create extension foo;
>   # alter template for extension foo rename to bar;
> 
>   $ pg_dump | psql
> 
> And now it's impossible to CREATE EXTENSION foo, because there's no
> source to install it from available. I think we should actively prevent
> that scenario to happen in the field (current patch prevents it).

I see. You value that property a lot.

However, I don't think the plain ability to create an extension is quite
enough to ensure a consistent restore, though. You'd also have to ensure
you recreate the very *same* contents of the extension that you dumped.

Your patch doesn't do that. It seems to stop enforcing consistency at
some arbitrary point in between. For example, you can ALTER a function
that's part of the extension. Or ALTER TEMPLATE FOR EXTENSION while an
extension of that version is installed.

> Usually what you do when you want to change an extension is that you
> provide an upgrade script then run it with ALTER EXTENSION UPDATE.

As a developer, I often happen to work on one and the same version, but
test multiple modifications. This ability to change an extension
"on-the-fly" seems pretty valuable to me.

> The current file system based extensions allow you to maintain
> separately the files foo--1.0.sql and foo--1.2.sql, and you don't need
> to copy a current version of the whole extension away before hacking the
> new version.
>
> The Extension Template facility in the current patch allows you to do
> much the same

Sure. I'm aware of that ability and appreciate it.

> I see that you've spent extra time and effort to better understand any
> community consensus that might exist around this patch series, and I
> want to say thank you for that!

Thank you for your patience in pursuing this and improving user
experience with extensions. I really appreciate this work.

> Basically what I'm saying in this too long email is that I need other
> contributors to voice-in.

I fully agree. I don't want to hold this patch back any further, if it's
just me.

>> I really recommend to rename the feature (and the
>> commands), in that case, though. We may rather call the existing
>> file-system thingie an "extension template", instead, as it becomes a
>> good differentiator to what you're proposing.
> 
> Any proposals?

"Inline extension" is a bit contradictory. Maybe "managed extension"
describes best what you're aiming for.

In a similar vein, "out-of-line extension" sounds redundant to me, so
I'd rather characterize the file-system thingie as "extension templates"
wherever a clear distinction between the two is needed.

>> How about ALTER EXTENSION ADD (or DROP)? With the link on the
>> "template", you'd have to prohibit that ALTER as well, based on the
>> exact same grounds as the RENAME: The installed extension would
>> otherwise differ from the "template" it is linked to.
> 
> You're "supposed" to be using that from within the template scripts
> themselves. The main use case is the upgrade scripts from "unpackaged".

Agreed. The documentation says it's "mainly useful in extension update
scripts".

> I could see foreclosing that danger by enforcing that creating_extension
> is true in those commands.

For managed extensions only, right? It's not been prohibited for
extensions so far.

>> See how this creates an animal pretty different from the current
>> extensions? And IMO something that's needlessly restricted in many ways.
> 
> Well really I'm not convinced. If you use ALTER EXTENSION ADD against an
> extension that you did install from the file system, then you don't know
> what will happen after a dump and restore cycle, because you didn't
> alter the files to match what you did, presumably.

Yeah. The user learned to work according to the template model. Maybe
that was not the best model to start with. And I certainly understand
your desire to ensure a consistent dump & restore cycle. However...

> If you do the same thing against an extension that you did install from
> a catalog template, you just managed to open yourself to the same
> hazards… 

... I think the current patch basically just moves the potential
hazards. Maybe these moved hazards are less dramatic and justify the
move. Let's recap:

In either case (template model & link model) the patch:

 a) guarantees to restore the templa

Re: [HACKERS] Review: extension template

2013-07-10 Thread Markus Wanner
Peter,

On 07/09/2013 11:04 PM, Peter Eisentraut wrote:
> I think there is an intrinsic conflict here.  You have things inside the
> database and outside.  When they depend on each other, it gets tricky.
> Extensions were invented to copy with that.  They do the job, more or
> less.

I agree. And to extend upon that, I think it's important to distinguish
between the created extension and the available one, i.e. the template.
Only the template lives outside. The created extension itself is firmly
sitting in the database, possibly with multiple dependencies from other
objects. It does not dependent on anything outside of the database
(assuming the absence of a DSO of the extension, which does not follow
that template concept).

And yes, we decided the objects that are part of the extension should
not get dumped with pg_dump. Nobody argues to change that. Note,
however, that this very decision is what raises the "intrinsic conflict"
for pg_restore, because CREATE EXTENSION in the dump depends on the
outside extension. If anything, Dimitri's patch solves that.

> Now you want to take the same mechanism and apply it entirely
> inside the database.  But that wasn't the point of extensions!  That's
> how you get definitional issues like, should extensions be dumped or not.

IMO the point of extensions is to extend Postgres (with code that's not
part of core). Whether their templates (SQL sources, if you want) are
stored on the file system (outside) or within the database is irrelevant
to the concept.

Think of it that way: Take one of those FUSE-Postgres-FS things [1],
which uses Postgres as the backend for a file system and allows you to
store arbitrary files. Mount that to the extensions directory of your
Postgres instance and make your extension templates available there
(i.e. copy them there). CREATE EXTENSION would just work, reading the
templates for the extension to create from itself, via that fuse
wrapper. (If the FUSE wrapper itself was using an extension, you'd get
into an interesting chicken and egg problem, but even that would be
resolvable, because the installed extension doesn't depend on the
template it was created from.)

Think of Dimitri's patch as a simpler and more elegant way to achieve
the very same thing. (Well, modulo our disagreement about the dependency
between extension and templates.) And as opposed to the file system or
fuse approach, you'd even gain transactional safety, consistency (i.e. a
constraint can enforce a full version exists as the basis for an upgrade
script), etc... But who am I to tell you the benefits of storing data in
a database?

Of course, you then also want to be able to backup your templates (not
the extensions) stored in the database. Just like you keep a backup of
your file-system templates. Either by simply making a copy, or maybe by
keeping an RPM or DEB package of it available. Thus, of course,
templates for extensions need to be dumped as well.

> I don't believe the above use case.  (Even if I did, it's marginal.)
> You should always be able to arrange things so that an upgrade of an
> inside-the-database-package is possible before or after pg_restore.

Dimitri's scenario assumes an old and a new version of an extension as
well as an old and a new Postgres major version. Where the old extension
is not compatible with the new Postgres major version. Which certainly
seems like a plausible scenario to me (postgis-2.0 is not compatible
with Postgres-9.3, for example - granted, it carries a DSO, so it's not
really a good example).

Given how extensions work, to upgrade to the new Postgres major version
*and* to the required new version of the extension, you don't ever need
to "upgrade an inside-the-database-package". Instead, you need to:

createdb -> provide templates -> CREATE EXTENSION -> restore data

Now, CREATE EXTENSION and restoring your data has effectively been
merged for the user, as pg_dump emits proper CREATE EXTENSION commands.
"Providing templates" so far meant installing an RPM or DEB. Or copying
the files manually.

But in fact, how and where you provide templates for the extension is
irrelevant to that order. And the possibility to merge the second step
into the 'restore data' step certainly sounds appealing to me.

> pg_dump and pg_restore are interfaces between the database and the
> outside. They should have nothing to do with upgrading things that live
> entirely inside the database.

I don't get your point here. In my view, libpq is intended to modify the
things that live inside the database, including extensions (i.e. ALTER
EXTENSION ADD FUNCTION). Or what kind of "things that live entirely
inside the database" do you have in mind.

> There would be value to inside-the-database package management, but it
> should be a separate concept.

Anything that's

Re: [HACKERS] Proposal: template-ify (binary) extensions

2013-07-15 Thread Markus Wanner
Robert,

thanks for answering. I think you responded to the (or some) idea in
general, as I didn't see any relation to the quoted part. (Otherwise
this is a hint that I've not been clear enough.)

On 07/15/2013 05:49 AM, Robert Haas wrote (in a slightly different order):
> There is a lot of
> (well-founded, IMHO) resistance to the idea of letting users install C
> code via an SQL connection.

Nowhere did I propose anything close to that. I'm in full support of the
resistance. Pitchforks and torches ready to rumble. :-)

> The security problems with this model have been discussed in every
> email thread about the extension template feature.

I read through a lot of these discussions, recently, but mostly found
misunderstanding and failure to distinguish between available extension
(template) and created extension (instantiation). I think this is a new
proposal, which I didn't ever see discussed. It does not have much, if
anything, to do with Dimitri's patch, except for it having made the
point obvious to me, as it continues to mix the two mental models.

I don't see much of a difference security wise between loading the DSO
at extension creation time vs. loading it at every backend start. More
details below.

My major gripe with the current way extensions are handled is that SQL
scripts are treated as a template, where as the DSO is only "pointed
to". Changing the SQL scripts in your extension directory has no effect
to the installed extension. Modifying the DSO file certainly has. That's
the inconsistency that I'd like to get rid of.


A security analysis of the proposal follows.

Granted, the "internalization" of the DSO is somewhat critical to
implement. Running as a non-root user, Postgres has less power than that
and can certainly not protect the internalized DSO from modification by
root. It can, however, store the DSO well protected from other users
(e.g. in a directory within $PGDATA).

Relying on the external DSO only exactly once can provide additional
safety. Consider an extensions directory that's accidentally
world-writable. As it stands, an attacker can modify the DSO and gain
control as soon as a new connection loads it. With my proposal, the
attacker would have to wait until CREATE EXTENSION. A point in time
where the newly created extension is more likely to be tested and
cross-checked.

I'm arguing both of these issues are very minor and negligible, security
wise. Baring other issues, I conclude there's no security impact by this
proposal.

Arguably, internalizing the DSO (together with the SQL) protects way
better from accidental modifications (i.e. removing the DSO by
un-installing the extension template via the distribution's packaging
system while some database still has the extension instantiated). That
clearly is a usability and not a security issue, though.

If nothing else (and beneficial in either case): Postgres should
probably check the permissions on the extension directory and at least
emit a warning, if it's world-writable. Or refuse to create (or even
load) an extension, right away.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: template-ify (binary) extensions

2013-07-15 Thread Markus Wanner
On 07/15/2013 12:05 PM, Andres Freund wrote:
> A superuser can execute native code as postges user. That's it.

Oh, I though Robert meant postgres users, i.e. non-superusers.

I hereby withdraw my pitchforks: I'm certainly not opposing to simplify
the life of superusers, who already have the power.

> Now, I don't think Markus proposal is a good idea on *other* grounds
> though... but that's for another email.

Separation of concerns, huh? Good thing, thanks :-)

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: template-ify (binary) extensions

2013-07-15 Thread Markus Wanner
On 07/15/2013 12:19 PM, Andres Freund wrote:
> On 2013-07-15 12:12:36 +0200, Markus Wanner wrote:
>> Granted, the "internalization" of the DSO is somewhat critical to
>> implement. Running as a non-root user, Postgres has less power than that
>> and can certainly not protect the internalized DSO from modification by
>> root. It can, however, store the DSO well protected from other users
>> (e.g. in a directory within $PGDATA).
> 
> There's heaps of problems with that though. Think noexec mounts or selinux.

Good point.

Note, however, that "internalize" doesn't necessarily mean exactly that
approach. It could be yet another directory, even system wide, which any
distribution should well be able to prepare.

I intentionally left the "internalizing" somewhat undefined. It could
even be more equivalent to what is done with the SQL stuff, i.e. some
system catalog. But that just poses other implementation issues.
(Loading a DSO from memory doesn't sound very portable to me.)

>> Relying on the external DSO only exactly once can provide additional
>> safety.
> 
> Not necessarily. Think of a distribution provided upgrade with a
> security fix in an extension.

Ugh.. only to figure out the patched DSO is incompatible to the old
version of the SQL scripts? And therefore having to re-create the
extension, anyways? That only highlights why this difference in
treatment of SQL and DSO is troublesome.

> On a machine with dozens of clusters. Now
> you need to make sure each and every one of them has the new .so.

Agreed.

So this sounds like the other approach to unification may be more
useful: Linking the SQL scripts better and make them (re-)load from the
extensions directory, just like the DSOs.

> I think protecting against the case where such directories are writable
> is a rather bad argument.

I agree. That's why I'm neglecting he security implications and stated
there are none.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: template-ify (binary) extensions

2013-07-15 Thread Markus Wanner
Robert,

On 07/15/2013 12:12 PM, Markus Wanner wrote:
> On 07/15/2013 05:49 AM, Robert Haas wrote (in a slightly different order):
>> There is a lot of
>> (well-founded, IMHO) resistance to the idea of letting users install C
>> code via an SQL connection.
> 
> Nowhere did I propose anything close to that. I'm in full support of the
> resistance. Pitchforks and torches ready to rumble. :-)

To clarify: In the above agreement, I had the Postgres level ordinary
users in mind, who certainly shouldn't ever be able to upload and run
arbitrary native code.

I'm not quite as enthusiastic if you meant the system level user that
happens to have superuser rights on Postgres. As Andres pointed out,
there are some more holes to plug, if you want to lock down a superuser
account. [1]

I'm not quite clear what kind of "user" you meant in your statement above.

Regards

Markus Wanner


[1]: I see the theoretical benefits of providing yet another layer of
security. Closing the existing holes would require restricting LOAD, but
that seems to suffice (given the sysadmin makes sure he doesn't
accidentally provide any of the harmful extensions).

How about the (optional?) ability to restrict LOAD to directories and
files that are only root writable? Is that enough for a root to disallow
the untrusted Postgres superuser to run arbitrary native code? Or does
that still have other holes? How many real use cases does it break? How
many does it fix?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: template-ify (binary) extensions

2013-07-16 Thread Markus Wanner
On 07/16/2013 01:27 AM, Robert Haas wrote:
> Andres points out that you can install adminpack to obtain
> local filesystem access, and that is true.  But the system
> administrator can also refuse to allow adminpack, and/or use selinux
> or other mechanisms to prevent the postgres binary from writing a file
> with execute permissions.

I think execute permissions (on the FS) are irrelevant. It's about
loading a shared library. The noexec mount option can prevent that, though.

But okay, you're saying we *have* and *want* a guarantee that even a
superuser cannot execute arbitrary native code via libpq (at least in
default installs w/o extensions).

Andres made two contrib-free suggestions: with COPY TO BINARY, you get a
header prepended, which I think is sufficient to prevent a dlopen() or
LoadLibrary(). Text and CSV formats of COPY escape their output, so it's
hard to write \000 or other control bytes. ESCAPE and DELIMITER also
have pretty restrictive requirements. So COPY doesn't seem quite "good"
enough to write a valid DSO.

His second suggestion was tuplesort tapes. tuplesort.c says: "We require
the first "unsigned int" of a stored tuple to be the total size on-tape
of the tuple...". That's kind of a header as well. Writing a proper DSO
certainly does not sound trivial, either.

>From a security perspective, I wouldn't want to rely on that guarantee.
Postgres writes too many files to be sure none of those can be abused to
write a loadable DSO, IMO.

Mounting $PGDATA 'noexec' and allowing the postgres user to write only
to such noexec mounts sounds like a good layer. It's independent, though
- it can be used whether or not the above guarantee holds.

> Things aren't quite so bad if we write the bits to a file first and
> then dynamically load the file.  That way at least noexec or similar
> can provide protection.  But it still seems like a pretty dangerous
> direction.

I agree now. Thanks for elaborating.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: template-ify (binary) extensions

2013-07-17 Thread Markus Wanner
On 07/17/2013 08:52 PM, Dimitri Fontaine wrote:
> the next step of this discussion should be about talking about the
> problems we have and figuring out *if* we want to solve them, now that
> we managed to explicitely say what we want as a project.
> 
>   - per-installation (not even per-cluster) DSO availability
> 
> If you install PostGIS 1.5 on a system, then it's just impossible to
> bring another cluster (of the same PostgreSQL major version), let
> alone database, with PostGIS 2.x, even for migration assessment
> purposes. The "By Design™" part is really hard to explain even to
> security concious users.

On Debian, that should be well possible. Certainly installing Postgres
9.1 w/ postgis-1.5 in parallel to Postgres 9.2 w/ postgis-2.0 is. I
designed it to be.

On distributions that do not allow parallel installation of multiple
Postges major versions, it's certainly not the extension's fault.

I think I'm misunderstanding the problem statement, here.

>   - hot standby and modules (aka DSO)
> 
> As soon as you use some functions in 'language C' you need to
> carefully watch your external dependencies ahead of time. If you do
> CREATE EXTENSION hstore;, create an hstore column and a GiST index
> on it, then query the table on the standby… no luck. You would tell
> me that it's easy enough to do and that it's part of the job, so see
> next point.

Agreed, that's an area where Postgres could do better. I'd argue this
should be possible without relaxing the security guarantees provided,
though. Because there likely are people wanting both.

Can CREATE EXTENSION check if the standbys have the extension installed?
And refuse creation, if they don't?

>   - sysadmin vs dba, or PostgreSQL meets the Cloud
> 
> The current model of operations is intended for places where you
> have separate roles: the sysadmin cares about the OS setup and will
> provide with system packages (compiled extensions and the like), and
> DBA are never root on the OS. They can CREATE EXTENSION and maybe
> use the 'postgres' system account, but that's about it.

I'm sure you are aware that even without this clear separation of roles,
the guarantee means we provide an additional level of security against
attackers.

> Given the current raise of the Cloud environements and the devops
> teams, my understanding is that this model is no longer the only
> one. Depending on who you talk to, in some circles it's not even a
> relevant model anymore: user actions should not require the
> intervention of a "sysadmin" before hand.
> 
> While I appreciate that many companies still want the old behavior
> that used to be the only relevant model of operations, I think we
> should also provide for the new one as it's pervasive enough now for
> us to stop ignoring it with our "I know better" smiling face.

I'd even think it's a minority who actually uses the guarantee we're
talking about. Mostly because of the many and wide spread untrusted PLs
(which undermine the guarantee). And thus even before the rise of the cloud.

None the less, the "safe by default" has served us well, I think.

> Now it should be possible to solve at least some of those items while
> still keeping the restriction above, or with an opt-in mechanism to
> enable the "works by default, but you have to solve the security
> implications yourself" behaviour. A new GUC should do it, boolean,
> defaults false:
> 
>   runs_in_the_cloud_with_no_security_concerns = false

[ I usually associate "cloud" with (increased) "security concerns", but
  that's an entirely different story. ]

> I don't think the relaxed behaviour we're talking about is currently
> possible to develop as an extension, by the way.

It's extensions that undermine the guarantee, at the moment. But yeah,
it depends a lot on what kind of "relaxed behavior" you have in mind.

>> Andres made two contrib-free suggestions: with COPY TO BINARY, you get a
> 
> Well, what about using lo_import()?

That only reads from the file-system. You probably meant lo_export(),
which is writing. Although not on the server's, but only on the (libpq)
client's file-system. No threat to the server.

> Yes it's dangerous. It's also solving real world problems that I see no
> other way to solve apart from bypassing the need to ever load a DSO
> file, that is embedding a retargetable C compiler in the backend.

If the sysadmin wants to disallow arbitrary execution of native code to
postgres (the process), any kind of embedded compiler likely is equally
unwelcome.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: template-ify (binary) extensions

2013-07-21 Thread Markus Wanner
Salut Dimitri,

On 07/20/2013 01:23 AM, Dimitri Fontaine wrote:
> Markus Wanner  writes:
>>>   - per-installation (not even per-cluster) DSO availability
>>>
>>> If you install PostGIS 1.5 on a system, then it's just impossible to
>>> bring another cluster (of the same PostgreSQL major version), let
>> On Debian, that should be well possible. Certainly installing Postgres
>> 9.1 w/ postgis-1.5 in parallel to Postgres 9.2 w/ postgis-2.0 is. I
>> designed it to be.
>>
>> I think I'm misunderstanding the problem statement, here.
> 
> (of the same PostgreSQL major version)

Not sure what the issue is, here, but I agree that should be possible.

>> Can CREATE EXTENSION check if the standbys have the extension installed?
>> And refuse creation, if they don't?
> 
> No, because we don't register standbies so we don't know who they are,
> and also because some things that we see connected and using the
> replication protocol could well be pg_basebackup or pg_receivexlog.

Can the standby check? In any case, these seem to be problems we can
solve without affecting security.

> Also, it's possible that the standby is only there for High Availability
> purposes and runs no user query.

Requiring the sysadmin to install the extensions there, too, seems
justified to me. Sounds like good advice, anyways.

>> I'm sure you are aware that even without this clear separation of roles,
>> the guarantee means we provide an additional level of security against
>> attackers.
> 
> Given lo_import() to upload a file from the client to the server then
> LOAD with the absolute path where the file ended up imported (or any
> untrusted PL really), this argument carries no sensible weight in my
> opinion.

lo_import() won't write a file for LOAD to load. An untrusted PL (or any
other extension allowing the superuser to do that) is currently required
to do that.

Or to put it another way: Trusted PLs exist for a good reason. And some
people just value security a lot and want that separation of roles.

>> None the less, the "safe by default" has served us well, I think.
> 
> That's true. We need to consider carefully the proposal at hand though.
> 
> It's all about allowing the backend to automatically load a file that it
> finds within its own $PGDATA so that we can have per-cluster and
> per-database modules (DSO files).

As someone mentioned previously, $PGDATA may well be mounted noexec, so
that seems to be a bad choice.

> The only difference with before is the location where the file is read
> from, and the main security danger comes from the fact that we used to
> only consider root-writable places and now propose to consider postgres
> bootstrap user writable places.

FWIW, I only proposed to let postgres check write permissions on
libraries it loads. IIUC we don't currently do that, yet. And Postgres
happily loads a world-writable library, ATM.

> Having the modules in different places in the system when it's a
> template and when it's instanciated allows us to solve a problem I
> forgot to list:
> 
>   - upgrading an extension at the OS level
> 
> Once you've done that, any new backend will load the newer module
> (DSO file), so you have to be real quick if installing an hot fix in
> production and the SQL definition must be changed to match the new
> module version…

I agree, that's a problem.

Alternatively, we could solve that problem the other way around: Rather
than template-ify the DSO, we could instead turn the objects created by
the SQL scripts into something that's more linked to the script.
Something that would reload as soon as the file on disk changes.

(Note how this would make out-of-line extensions a lot closer to the
in-line variant your recent patch adds? With the dependency between
"template" and "instantiation"?)

> With the ability to "instanciate" the module in a per-cluster
> per-database directory within $PGDATA the new version of the DSO module
> would only put in place and loaded at ALTER EXTENSION UPDATE time.
> 
> I'm still ok with allowing to fix those problems only when a security
> option that defaults to 'false' has been switched to 'true', by the way,
> so that it's an opt-in,

Okay, good.

For the issues you raised, I'd clearly prefer fixes that maintain
current security standards, though.

> but I will admit having a hard time swallowing
> the threat model we're talking about…

An attacker having access to a libpq connection with superuser rights
cannot currently run arbitrary native code. He can try a DOS by
exhausting system resources, but that's pretty far from being
invisible.

Re: [HACKERS] Proposal: template-ify (binary) extensions

2013-07-22 Thread Markus Wanner
On 07/22/2013 12:11 AM, Hannu Krosing wrote:
>> Dropping this barrier by installing an untrusted PL (or equally insecure
>> extensions), an attacker with superuser rights can trivially gain
>> root.
> Could you elaborate ?
> 
> This is equivalent to claiming that any linux user can trivially gain root.

Uh. Sorry, you're of course right, the attacker can only gain postgres
rights in that case. Thanks for correcting.

The point still holds. It's another layer that an attacker would have to
overcome.

>>> You already mentioned untrusted PL languages, and I don't see any
>>> difference in between offering PL/pythonu and PL/C on security grounds,
>>> really.
>> I agree. However, this also means that any kind of solution it offers is
>> not a good one for the security conscious sysadmin.
> This is usually the case with a "security conscious sysadmin" - they very
> seldom want to install anything.

Exactly. That's why I'm favoring solutions that don't require any
extension and keep the guarantee of preventing arbitrary native code.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: template-ify (binary) extensions

2013-07-22 Thread Markus Wanner
Dimitri,

On 07/22/2013 08:44 PM, Dimitri Fontaine wrote:
> That's the trade-off we currently need to make to be able to ship with
> the current security protections we're talking about.

Anything wrong with shipping postgis-1.5.so and postgis-2.0.so, as I we
for Debian?

> Ok, here's the full worked out example:
> 
>   ~# select lo_import('/path/to/local/on/the/client/adminpack.so');
>lo_import 
>   ---
>82153
>   (1 row)
>   
>   ~# select lo_export(82153, '/tmp/foo.so');
>lo_export 
>   ---
>1
>   (1 row)
>   
>   ~# LOAD '/tmp/foo.so';
>   LOAD
>   
>   ~# CREATE FUNCTION pg_catalog.pg_file_write(text, text, bool)
>  RETURNS bigint AS '/tmp/foo.so', 'pg_file_write'
>  LANGUAGE C VOLATILE STRICT;
>   CREATE FUNCTION

Good point.

Note that these lo_import() and lo_export() methods - unlike the client
counterparts - read and write to the server's file system.

So, your example is incomplete in that SELECT lo_import() doesn't load
data from the client into the database. But there are many other ways to
do that.

None the less, your point holds: lo_export() is a way for a superuser to
write arbitrary files on the server's file-system via libpq.

>> Or to put it another way: Trusted PLs exist for a good reason. And some
>> people just value security a lot and want that separation of roles.
> 
> Yeah, security is important, but it needs to be addressing real threats
> to have any value whatsoever. We'not talking about adding new big holes
> as extensions containing modules are using LANGUAGE C in their script
> and thus are already restricted to superuser. That part would not
> change.

I agree, it's not a big hole. But with security, every additional layer
counts. I think the next step is to clarify whether or not we want to
provide that guarantee.

>> As someone mentioned previously, $PGDATA may well be mounted noexec, so
>> that seems to be a bad choice.
> 
> I don't understand. You want good default security policy in place, and
> you're saying that using PGDATA allows for a really easy policy to be
> implemented by people who don't want the feature. It seems to me to be
> exactly what you want, why would that be a bad choice then?

I'm just saying that mounting $PGDATA with noexec makes $PGDATA unusable
to load libraries from; i.e. pg_ldopen() just out right fails.

And noexec is another protective layer. Building a solution that
requires $PGDATA to be mounted with exec permissions means that very
solution won't work with that protective layer. Which is a bad thing.

>> Alternatively, we could solve that problem the other way around: Rather
>> than template-ify the DSO, we could instead turn the objects created by
>> the SQL scripts into something that's more linked to the script.
>> Something that would reload as soon as the file on disk changes.
> 
> I think that's the wrong way to do it, because you generally want more
> control over those two steps (preparing the template then instanciating
> it) and you're proposing to completely lose the control and have them be
> a single step instead.

Doesn't versioning take care of that? I.e. if you've version 1.0 created
in your databases and then install 1.1 system wide, that shouldn't
immediately "instantiate".

> Use case: shipping a new plproxy bunch of functions to 256 nodes. You
> want to push the templates everywhere then just do the extension update
> as fast as possible so that it appears to be about 'in sync'.
> 
> Pro-tip: you can use a plproxy function that runs the alter extension
> update step using RUN ON ALL facility.

Contrary use case: you actually *want* to *replace* the created version
installed, as it's a plain security fix that's backwards compatible. A
plain 'apt-get upgrade' would do the job.

Anyways, I think this is bike-shedding: The question of what mental
model fits best doesn't matter too much, here.

I'm concerned about being consistent with whatever mental model we
choose and about an implementation that would work with whatever
security standards we want to adhere to.

>> (Note how this would make out-of-line extensions a lot closer to the
>> in-line variant your recent patch adds? With the dependency between
>> "template" and "instantiation"?)
> 
> Those dependencies are almost gone now except for being able to deal
> with DROP TEMPLATE FOR EXTENSION and guaranteeing we can actually
> pg_restore our pg_dump script.

I appreciate the bug fixes. However, there still is at least one
dependency. And that might be a good thing. Depending on what mental
model you follow.

> I've been changing the implementation to be what you proposed because I
> think it's making more sense (thanks!), and regression tests are
> reflecting that. My github branch is up-to-date with the last changes
> and I'm soon to send an updated patch that will be really as ready for
> commit as possible without a commiter review.

Good, thanks.

>> An attacker having access to a libpq connection with superuser 

Re: [HACKERS] Proposal: template-ify (binary) extensions

2013-07-23 Thread Markus Wanner
On 07/16/2013 09:14 PM, I wrote:
> But okay, you're saying we *have* and *want* a guarantee that even a
> superuser cannot execute arbitrary native code via libpq (at least in
> default installs w/o extensions).

I stand corrected and have to change my position, again. For the record:

We do not have such a guarantee. Nor does it seem reasonable to want
one. On a default install, it's well possible for the superuser to run
arbitrary code via just libpq.

There are various ways to do it, but the simplest one I was shown is:
 - upload a DSO from the client into a large object
 - SELECT lo_export() that LO to a file on the server
 - LOAD it

There are a couple other options, so even if we let LOAD perform
permission checks (as I proposed before in this thread), the superuser
can still fiddle with function definitions. To the point that it doesn't
seem reasonable to try to protect against that.

Thus, the argument against the original proposal based on security
grounds is moot. Put another way: There already are a couple of
"backdoors" a superuser can use. By default. Or with plpgsql removed.

Thanks to Dimitri and Andres for patiently explaining and providing
examples.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v3 - comparison to Postgres-R change set format

2012-11-16 Thread Markus Wanner
Andres,

On 11/15/2012 01:27 AM, Andres Freund wrote:
> In response to this you will soon find the 14 patches that currently
> implement $subject.

Congratulations on that piece of work.


I'd like to provide a comparison of the proposed change set format to
the one used in Postgres-R. I hope for this comparison to shed some
light on the similarities and differences of the two projects. As the
author of Postgres-R, I'm obviously biased, but I try to be as neutral
as I can.


Let's start with the representation: I so far considered the Postgres-R
change set format to be an implementation detail and I don't intend it
to be readable by humans or third party tools. It's thus binary only and
doesn't offer a textual representation. The approach presented here
seems to target different formats for different audiences, including
binary representations. More general, less specific.


Next, contents: this proposal is more verbose. In the textual
representation shown, it provides (usually redundant) information about
attribute names and types. Postgres-R doesn't ever transmit attribute
name or type information for INSERT, UPDATE or DELETE operations.
Instead, it relies on attribute numbers and pg_attributes being at some
known consistent state.

Let's compare by example:

> table "replication_example": INSERT: id[int4]:1 somedata[int4]:1 
> text[varchar]:1
> table "replication_example": UPDATE: id[int4]:1 somedata[int4]:-1 
> text[varchar]:1
> table "replication_example": DELETE (pkey): id[int4]:1

In Postgres-R, the change sets for these same operations would carry the
following information, in a binary representation:

> table "replication_example": INSERT: VALUES (1, 1, '1')
> table "replication_example": UPDATE: PKEY(1) COID(77) MODS('nyn') VALUES(-1)
> table "replication_example": DELETE: PKEY(1) COID(78)

You may have noticed that there's an additional COID field. This is an
identifier for the transaction that last changed this tuple. Together
with the primary key, it effectively identifies the exact version of a
tuple (during its lifetime, for example before vs after an UPDATE). This
in turn is used by Postgres-R to detect conflicts.

It may be possible to add that to the proposed format as well, for it to
be able to implement a Postgres-R-like algorithm.


To finish off this comparison, let's take a look at how and where the
change sets are generated: in Postgres-R the change set stream is
constructed directly from the heap modification routines, i.e. in
heapam.c's heap_{insert,update,delete}() methods. Where as the patches
proposed here parse the WAL to reconstruct the modifications and add the
required meta information.

To me, going via the WAL first sounded like a step that unnecessarily
complicates matters. I recently talked to Andres and brought that up.
Here's my current view of things:

The Postgres-R approach is independent of WAL and its format, where as
the approach proposed here clearly is not. Either way, there is a
certain overhead - however minimal it is - which the former adds to the
transaction processing itself, while the later postpones it to a
separate XLogReader process. If there's any noticeable difference, it
might reduce latency in case of asynchronous replication, but can only
increase latency in the synchronous case. As far as I understood Andres,
it was easier to collect the additional meta data from within the
separate process.


In summary, I'd say that Postgres-R is an approach specifically
targeting and optimized for multi-master replication between Postgres
nodes, where as the proposed patches are kept more general.

I hope you found this to be an insightful and fair comparison.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v3 - comparison to Postgres-R change set format

2012-11-17 Thread Markus Wanner
On 11/16/2012 03:05 PM, Andres Freund wrote:
>> I'd like to provide a comparison of the proposed change set format to
>> the one used in Postgres-R.
> 
> Uh, sorry to interrupt you right here, but thats not the "proposed
> format" ;)

Understood. Sorry, I didn't mean to imply that. It's pretty obvious to
me that this is more of a human readable format and that others,
including binary formats, can be implemented. I apologize for the bad
wording of a "proposed format", which doesn't make that clear.

>> The Postgres-R approach is independent of WAL and its format, where as
>> the approach proposed here clearly is not. Either way, there is a
>> certain overhead - however minimal it is - which the former adds to the
>> transaction processing itself, while the later postpones it to a
>> separate XLogReader process. If there's any noticeable difference, it
>> might reduce latency in case of asynchronous replication, but can only
>> increase latency in the synchronous case. As far as I understood Andres,
>> it was easier to collect the additional meta data from within the
>> separate process.
> 
> There also is the point that if you do the processing inside heap_* you
> need to make sure the replication targeted data is safely received &
> fsynced away, in "our" case thats not necessary as WAL already provides
> crash safety, so should the replication connection break you can simply
> start from the location last confirmed as being safely sent.

In the case of Postgres-R, the "safely received" part isn't really
handled at the change set level at all. And regarding the fsync
guarantee: you can well use the WAL to provide that, without basing
change set generation on in. In that regard, Postgres-R is probably the
more general approach: you can run Postgres-R with WAL turned off
entirely - which may well make sense if you take into account the vast
amount of cloud resources available, which don't have a BBWC. Instead of
WAL, you can add more nodes at more different locations. And no, you
don't want your database to ever go down, anyway  :-)

>> In summary, I'd say that Postgres-R is an approach specifically
>> targeting and optimized for multi-master replication between Postgres
>> nodes, where as the proposed patches are kept more general.
> 
> One major aim definitely was optionally be able to replicate into just
> about any target system, so yes, I certainly agree.

I'm glad I got that correct ;-)

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v3 - comparison to Postgres-R change set format

2012-11-17 Thread Markus Wanner
On 11/16/2012 03:14 PM, Andres Freund wrote:
> Whats the data type of the "COID" in -R?

It's short for CommitOrderId, a 32bit global transaction identifier,
being wrapped-around, very much like TransactionIds are. (In that sense,
it's global, but unique only for a certain amount of time).

> In the patchset the output plugin has enough data to get the old xid and
> the new xid in the case of updates (not in the case of deletes, but
> thats a small bug and should be fixable with a single line of code), and
> it has enough information to extract the primary key without problems.

It's the xmin of the old tuple that Postgres-R needs to get the COID.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v3 - comparison to Postgres-R change set format

2012-11-17 Thread Markus Wanner
On 11/17/2012 02:30 PM, Hannu Krosing wrote:
> Is it possible to replicate UPDATEs and DELETEs without a primary key in
> PostgreSQL-R

No. There must be some way to logically identify the tuple. Note,
though, that theoretically any (unconditional) unique key would suffice.
In practice, that usually doesn't matter, as you rarely have one or more
unique keys without a primary.

Also note that the underlying index is useful for remote application of
change sets (except perhaps for very small tables).

In some cases, for example for n:m linking tables, you need to add a
uniqueness key that spans all columns (as opposed to a simple index on
one of the columns that's usually required, anyway). I hope for
index-only scans eventually mitigating this issue.

Alternatively, I've been thinking about the ability to add a hidden
column, which can then be used as a PRIMARY KEY without breaking legacy
applications that rely on SELECT * not returning that primary key.

Are there other reasons to want tables without primary keys that I'm
missing?

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v3 - comparison to Postgres-R change set format

2012-11-17 Thread Markus Wanner
Hannu,

On 11/17/2012 03:40 PM, Hannu Krosing wrote:
> On 11/17/2012 03:00 PM, Markus Wanner wrote:
>> On 11/17/2012 02:30 PM, Hannu Krosing wrote:
>>> Is it possible to replicate UPDATEs and DELETEs without a primary key in
>>> PostgreSQL-R
>> No. There must be some way to logically identify the tuple.
> It can be done as selecting on _all_ attributes and updating/deleting
> just the first matching row
> 
> create cursor ...
> select from t ... where t.* = ()
> fetch one ...
> delete where current of ...

That doesn't sound like it could possibly work for Postgres-R. At least
not when there can be multiple rows with all the same attributes, i.e.
without a unique key constraint over all columns.

Otherwise, some nodes could detect two concurrent UPDATES as a conflict,
while other nodes select different rows and don't handle it as a conflict.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [pgsql-cluster-hackers] Question: Can i cut NON-HOT chain Pointers if there are no concurrent updates?

2012-11-23 Thread Markus Wanner
Henning,

On 11/23/2012 03:17 PM, "Henning Mälzer" wrote:
> Can somebody help me?

Sure, but you might get better answers on the -hackers mailing list. I'm
redirecting there. The cluster-hackers one is pretty low volume and low
subscribers, I think.

> Question:
> What would be the loss if i cut NON-HOT chain Pointers, meaning i set 
> t_ctid=t_self in the case where it points to a tuple on another page?

READ COMMITTED would stop to work correctly in the face of concurrent
transactions, I guess. See the fine manual:

http://www.postgresql.org/docs/devel/static/transaction-iso.html#XACT-READ-COMMITTED

The problem essentially boils down to: READ COMMITTED transactions need
to learn about tuples *newer* than what their snapshot would see.

> I am working on a project based on "postgres (PostgreSQL) 8.5devel" with the 
> code from several master thesises befor me.

Care to elaborate a bit? Can (part of) that code be released under an
open license?

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Review: Extra Daemons / bgworker

2012-11-28 Thread Markus Wanner
r extra
daemons. I don't want the postmaster to turn into a general purpose
watchdog for everything and the kitchen sink.)


# Tests

No additional automated tests are included - hard to see how that could
be tested automatically, given the lack of a proper test harness.


I hope this review provides useful input.

Regards

Markus Wanner


[1]: That was during commit fest 2010-09. Back then, neither extensions,
latches nor the walsender existed. The patches had a dependency on
imessages, which was not accepted. Other than that, it's a working,
pooled bgworker implementation on top of which autovacuum runs just
fine. It lacks extensibility and the extra daemons feature, though. For
those who missed it:

https://commitfest.postgresql.org/action/patch_view?id=339
http://archives.postgresql.org/message-id/4c3c789c.1040...@bluegap.ch
http://git.postgres-r.org/?p=bgworker;a=summary


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-28 Thread Markus Wanner
On 11/28/2012 03:51 PM, Alvaro Herrera wrote:
> I remember your patchset.  I didn't look at it for this round, for no
> particular reason.  I did look at KaiGai's submission from two
> commitfests ago, and also at a patch from Simon which AFAIK was never
> published openly.  Simon's patch merged the autovacuum code to make
> autovac workers behave like bgworkers as handled by his patch, just like
> you suggest.  To me it didn't look all that good; I didn't have the guts
> for that, hence the separation.  If later bgworkers are found to work
> well enough, we can "port" autovac workers to use this framework, but
> for now it seems to me that the conservative thing is to leave autovac
> untouched.

Hm.. interesting to see how the same idea grows into different patches
and gets refined until we end up with something considered committable.

Do you remember anything in particular that didn't look good? Would you
help reviewing a patch on top of bgworker-7 that changed autovacuum into
a bgworker?

> (As an example, we introduced "ilist" some commits back and changed some
> uses to it; but there are still many places where we're using SHM_QUEUE,
> or List, or open-coded lists, which we could port to the new
> infrastructure, but there's no pressing need to do it.)

Well, I usually like cleaning things up earlier rather than later (my
desk clearly being an exception to that rule, though). But yeah, new
code usually brings new bugs with it.

> Sorry about that -- forgot to push to github.  bgworker-7 corresponds to
> commit 0a49a540b which I have just pushed to github.

Thanks.

> Maybe one thing to do in this area would be to ensure that there is a
> certain number of PGPROC elements reserved specifically for bgworkers;
> kind of like autovacuum workers have.  That would avoid having regular
> clients exhausting slots for bgworkers, and vice versa.

Yeah, I think that's mandatory, anyways, see below.

> How are you envisioning that the requests would occur?

Just like av_launcher does it now: set a flag in shared memory and
signal the postmaster (PMSIGNAL_START_AUTOVAC_WORKER).

(That's why I'm so puzzled: it looks like it's pretty much all there,
already. I even remember a discussion about that mechanism probably not
being fast enough to spawn bgworkers. And a proposal to add multiple
such flags, so an avlauncher-like daemon could ask for multiple
bgworkers to be started in parallel. I've even measured the serial
bgworker fork rate back then, IIRC it was in the hundreds of forks per
second...)

>> # Minor technical issues or questions
>>
>> In assign_maxconnections, et al, GetNumRegisteredBackgroundWorkers() is
>> used in relation to MAX_BACKENDS or to calculate MaxBackends. That seems
>> to "leak" the bgworkers that registered with neither
>> BGWORKER_SHMEM_ACCESS nor BGWORKER_BACKEND_DATABASE_CONNECTION set. Or
>> is there any reason to discount such extra daemon processes?
> 
> No, I purposefully let those out, because those don't need a PGPROC.  In
> fact that seems to me to be the whole point of non-shmem-connected
> workers: you can have as many as you like and they won't cause a
> backend-side impact.  You can use such a worker to connect via libpq to
> the server, for example.

Hm.. well, in that case, the shmem-connected ones are *not* counted. If
I create and run an extensions that registers 100 shmem-connected
bgworkers, I cannot connect to a system with max_connections=100
anymore, because bgworkers then occupy all of the connections, already.

Please add the registered shmem-connected bgworkers to the
max_connections limit. I think it's counter intuitive to have autovacuum
workers reserved separately, but extension's bgworkers eat (client)
connections. Or put another way: max_connections should always be the
max number of *client* connections the DBA wants to allow.

(Or, if that's in some way complicated, please give the DBA an
additional GUC akin to max_background_workers. That can be merged with
the current max_autovacuum_workers, once/if we rebase autovaccum onto
bgworkers).

>> The additional contrib modules auth_counter and worker_spi are missing
>> from the contrib/Makefile. If that's intentional, they should probably
>> be listed under "Missing".
>>
>> The auth_counter module leaves worker.bgw_restart_time uninitialized.
>>
>> Being an example, it might make sense for auth_counter to provide a
>> signal that just calls SetLatch() to interrupt WaitLatch() and make
>> auth_counter emit its log line upon request, i.e. show how to use SIGUSR1.
> 
> KaiGai proposed that we remove auth_counter.  I don't see why not; I
> mean, worker_spi is already doing most of what auth

Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-30 Thread Markus Wanner
On 11/30/2012 11:09 AM, Dimitri Fontaine wrote:
> Markus Wanner  writes:
>>> However, as you say, maybe we need more coding examples.
>>
>> Maybe a minimally usable extra daemon example? Showing how to avoid
>> common pitfalls? Use cases, anybody? :-)
> 
> What about the PGQ ticker, pgqd?
> 
>   https://github.com/markokr/skytools/tree/master/sql/ticker
>   https://github.com/markokr/skytools/blob/master/sql/ticker/pgqd.c

AFAICS pgqd currently uses libpq, so I think it would rather turn into
what I call a background worker, with a connection to Postgres shared
memory. I perfectly well see use cases (plural!) for those.

What I'm questioning is the use for what I rather call "extra daemons",
that is, additional processes started by the postmaster without a
connection to Postgres shared memory (and thus without a database
connection).

I was asking for a minimal example of such an extra daemon, similar to
worker_spi, showing how to properly write such a thing. Ideally showing
how to avoid common pitfalls.

> Maybe it would be easier to have a version of GNU mcron as an extension,
> with the abitity to fire PostgreSQL stored procedures directly?  (That
> way the cron specific parts of the logic are already implemented)
> 
>   http://www.gnu.org/software/mcron/

Again, that's something that would eventually require a database
connection. Or at least access to Postgres shared memory to eventually
start the required background jobs. (Something Alvaro's patch doesn't
implement, yet. But which exactly matches what the coordinator and
bgworkers in Postgres-R do.)

For ordinary extra daemons, I'm worried about things like an OOM killer
deciding to kill the postmaster, being its parent. Or (io)niceness
settings. Or Linux cgroups. IMO all of these things just get harder to
administrate for extra daemons, when they move under the hat of the
postmaster.

Without access to shared memory, the only thing an extra daemon would
gain by moving under postmaster is the "knowledge" of the start, stop
and restart (crash) events of the database. And that in a very
inflexible way: the extra process is forced to start, stop and restart
together with the database to "learn" about these events.

Using a normal client connection arguably is a better way to learn about
crash events - it doesn't have the above limitation. And the start and
stop events, well, the DBA or sysadmin is under control of these,
already. We can possibly improve on that, yes. But extra daemons are not
the way to go, IMO.

> Another idea would be to have a pgbouncer extension. We would still need
> of course to have pgbouncer as a separate component so that client
> connection can outlive a postmaster crash, but that would still be very
> useful as a first step into admission control. Let's not talk about the
> feedback loop and per-cluster resource usage monitoring yet, but I guess
> that you can see the drift.

Sorry, I don't. Especially not with something like pgbouncer, because I
definitely *want* to control and administrate that separately.

So I guess this is a vote to disallow `worker.bgw_flags = 0` on the
basis that everything a process, which doesn't need to access Postgres
shared memory, better does whatever it does outside of Postgres. For the
benefit of the stability of Postgres and for ease of administration of
the two.

Or, maybe, rather drop the BGWORKER_SHMEM_ACCESS flag and imply that
every registered process wants to have access to Postgres shared memory.
Then document the gotchas and requirements of living under the
Postmaster, as I've requested before. (If you want a foot gun, you can
still write an extension that doesn't need to access Postgres shared
memory, but hey.. I we can't help those who desperately try to shoot
their foot).

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-30 Thread Markus Wanner
On 11/30/2012 01:40 PM, Dimitri Fontaine wrote:
> I totally missed the need to connect to shared memory to be able to
> connect to a database and query it. Can't we just link the bgworkder
> code to the client libpq library, just as plproxy is doing I believe?

Well, sure you can use libpq. But so can external processes. So that's
no benefit of running under the postmaster.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-30 Thread Markus Wanner
On 11/30/2012 01:59 PM, Andres Freund wrote:
> On 2012-11-30 09:57:20 -0300, Alvaro Herrera wrote:
>> One of the uses for bgworkers that don't have shmem connection is to
>> have them use libpq connections instead.  I don't really see the point
>> of forcing everyone to use backend connections when libpq connections
>> are enough.

Requiring a libpq connection is a good indication for *not* wanting the
process to run under the postmaster, IMO.

>> In particular, they are easier to port from existing code;
>> and they make it easier to share code with systems that still have to
>> support older PG versions.
> 
> They also can get away with a lot more crazy stuff without corrupting
> the database.

Exactly. That's a good reason to *not* tie that to the postmaster, then.
Please keep as much of the potentially dangerous stuff separate (and
advice developers to do so as well, instead of offering them a foot
gun). So that our postmaster can do its job. And do it reliably, without
trying to be a general purpose start/stop daemon. There are better and
well established tools for that.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-30 Thread Markus Wanner
Alvaro,

On 11/30/2012 02:44 PM, Alvaro Herrera wrote:
> So it
> makes easier to have processes that need to run alongside postmaster.

That's where we are in respectful disagreement, then. As I don't think
that's easier, overall, but in my eye, this looks like a foot gun.

As long as things like pgbouncer, pgqd, etc.. keep to be available as
separate daemons, I'm happy, though.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-30 Thread Markus Wanner
On 11/30/2012 03:16 PM, Kohei KaiGai wrote:
> This feature does not enforce them to implement with this new framework.
> If they can perform as separate daemons, it is fine enough.

I'm not clear on what exactly you envision, but if a process needs
access to shared buffers, it sounds like it should be a bgworker. I
don't quite understand why that process also wants a libpq connection,
but that's certainly doable.

> But it is not all the cases where we want background workers being tied
> with postmaster's duration.

Not wanting a process to be tied to postmaster's duration is a strong
indication that it better not be a bgworker, I think.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-11-30 Thread Markus Wanner
On 11/30/2012 03:58 PM, Kohei KaiGai wrote:
> It seemed to me you are advocating that any use case of background-
> worker can be implemented with existing separate daemon approach.

That sounds like a misunderstanding. All I'm advocating is that only
3rd-party processes with a real need (like accessing shared memory)
should run under the postmaster.

> What I wanted to say is, we have some cases that background-worker
> framework allows to implement such kind of extensions with more
> reasonable design.

I absolutely agree to that. And I think I can safely claim to be the
first person to publish a patch that provides some kind of background
worker infrastructure for Postgres.

> Yes, one reason I want to use background-worker is access to shared-
> memory segment. Also, it want to connect databases simultaneously
> out of access controls; like as autovacuum.

Yeah, that's the entire reason for background workers. For clarity and
differentiation, I'd add: .. without having a client connection. That's
what makes them *background* workers. (Not to be confused with the
frontend vs backend differentiation. They are background backends, if
you want).

> It is a reason why I'm saying
> SPI interface should be only an option, not only libpq.

I'm extending that to say extensions should better *not* use libpq.
After all, they have a more direct access, already.

> It also probably means, in case when a process whose duration wants to
> be tied with duration of postmaster, its author can consider to implement
> it as background worker.

I personally don't count that as a real need. There are better tools for
this job; while there clearly are dangers in (ab)using the postmaster to
do it.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-03 Thread Markus Wanner
On 12/03/2012 03:28 PM, Alvaro Herrera wrote:
> I'm not really that interested in it currently ... and there are enough
> other patches to review.  I would like bgworkers to mature a bit more
> and get some heavy real world testing before we change autovacuum.

Understood.

>> Just like av_launcher does it now: set a flag in shared memory and
>> signal the postmaster (PMSIGNAL_START_AUTOVAC_WORKER).
> 
> I'm not sure how this works.  What process is in charge of setting such
> a flag?

The only process that currently starts background workers ... ehm ...
autovacuum workers is the autovacuum launcher. It uses the above
Postmaster Signal in autovacuum.c:do_start_autovacuum_worker() to have
the postmaster launch bg/autovac workers on demand.

(And yes, this kind of is an exception to the rule that the postmaster
must not rely on shared memory. However, these are just flags we write
atomically, see pmsignal.[ch])

I'd like to extend that, so that other processes can request to start
(pre-registered) background workers more dynamically. I'll wait for an
update of your patch, though.

> This is actually a very silly bug: it turns out that guc.c obtains the
> count of workers before workers have actually registered.  So this
> necessitates some reshuffling.

Okay, thanks.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-03 Thread Markus Wanner
On 12/03/2012 04:27 PM, Simon Riggs wrote:
> My understanding was that the patch keep autovac workers and
> background workers separate at this point.

I agree to that, for this patch. However, that might not keep me from
trying to (re-)sumbit some of the bgworker patches in my queue. :-)

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-03 Thread Markus Wanner
On 12/03/2012 04:44 PM, Alvaro Herrera wrote:
> Simon Riggs wrote:
>> Is there anything to be gained *now* from merging those two concepts?
>> I saw that as refactoring that can occur once we are happy it should
>> take place, but isn't necessary.
> 
> IMO it's a net loss in robustness of the autovac implementation.

Agreed.

That's only one aspect of it, though. From the other point of view, it
would be a proof of stability for the bgworker implementation if
autovacuum worked on top of it.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-03 Thread Markus Wanner
Alvaro,

in general, please keep in mind that there are two aspects that I tend
to mix and confuse: one is what's implemented and working for
Postgres-R. The other this is how I envision (parts of it) to be merged
back into Postgres, itself. Sorry if that causes confusion.

On 12/03/2012 04:43 PM, Alvaro Herrera wrote:
> Oh, I understand that.  My question was more about what mechanism are
> you envisioning for new bgworkers.  What process would be in charge of
> sending such signals?  Would it be a persistent bgworker which would
> decide to start up other bgworkers based on external conditions such as
> timing?  And how would postmaster know which bgworker to start -- would
> it use the bgworker's name to find it in the registered workers list?

Well, in the Postgres-R case, I've extended the autovacuum launcher to a
more general purpose job scheduler, named coordinator. Lacking your
bgworker patch, it is the only process that is able to launch background
workers.

Your work now allows extensions to register background workers. If I
merge the two concepts, I can easily imagine allowing other (bgworker)
processes to launch bgworkers.

Another thing I can imagine is turning that coordinator into something
that can schedule and trigger jobs registered by extensions (or even
user defined ones). Think: cron daemon for SQL jobs in the background.
(After all, that's pretty close to what the autovacuum launcher does,
already.)

> The trouble with the above rough sketch is how does the coordinating
> bgworker communicate with postmaster.

I know. I've gone through that pain.

In Postgres-R, I've solved this with imessages (which was the primary
reason for rejection of the bgworker patches back in 2010).

The postmaster only needs to starts *a* background worker process. That
process itself then has to figure out (by checking its imessage queue)
what job it needs to perform. Or if you think in terms of bgworker
types: what type of bgworker it needs to be.

> Autovacuum has a very, um,
> peculiar mechanism to make this work: avlauncher sends a signal (which
> has a hardcoded-in-core signal number) and postmaster knows to start a
> generic avworker; previously avlauncher has set things up in shared
> memory, and the generic avworker knows where to look to morph into the
> specific bgworker required.  So postmaster never really looks at shared
> memory other than the signal number (which is the only use of shmem in
> postmaster that's acceptable, because it's been carefully coded to be
> robust).

In Postgres-R, I've extended exactly that mechanism to work for other
jobs that autovacuum.

> This doesn't work for generic modules because we don't have a
> hardcoded signal number; if two modules wanted to start up generic
> bgworkers, how would postmaster know which worker-main function to call?

You've just described above how this works for autovacuum: the
postmaster doesn't *need* to know. Leave it to the bgworker to determine
what kind of task it needs to perform.

> As posted, the feature is already useful and it'd be good to have it
> committed soon so that others can experiment with whatever sample
> bgworkers they see fit.  That will give us more insight on other API
> refinements we might need.

I completely agree. I didn't ever intend to provide an alternative patch
or hold you back. (Except for the extra daemon issue, where we disagree,
but that's not a reason to keep this feature back). So please, go ahead
and commit this feature (once the issues I've mentioned in the review
are fixed).

Please consider all of these plans or ideas in here (or in Postgres-R)
as extending on your work, rather than competing against.

> I'm going to disappear on paternity leave, most likely later this week,
> or early next week; I would like to commit this patch before that.  When
> I'm back we can discuss other improvements.  That will give us several
> weeks before the end of the 9.3 development period to get these in.  Of
> course, other committers are welcome to improve the code in my absence.

Okay, thanks for sharing that. I'd certainly appreciate your inputs on
further refinements for bgworkers and/or autovacuum.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-04 Thread Markus Wanner
On 12/03/2012 10:35 PM, Alvaro Herrera wrote:
> So here's version 8.  This fixes a couple of bugs and most notably
> creates a separate PGPROC list for bgworkers, so that they don't
> interfere with client-connected backends.

Nice, thanks. I'll try to review this again, shortly.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: logical changeset generation v3 - comparison to Postgres-R change set format

2013-01-13 Thread Markus Wanner
On 01/13/2013 12:30 PM, Hannu Krosing wrote:
> On 01/13/2013 10:49 AM, Hannu Krosing wrote:
>> Does this hint that postgreSQL also needs an sameness operator
>> ( "is" or "===" in same languages).
> 
> How do people feel about adding a real sameness operator ?

We'd need to define what "sameness" means. If this goes toward "exact
match in binary representation", this gets a thumbs-up from me.

As a first step in that direction, I'd see adjusting send() and recv()
functions to use a portable binary format. A "sameness" operator could
then be implemented by simply comparing two value's send() outputs.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: logical changeset generation v3 - comparison to Postgres-R change set format

2013-01-14 Thread Markus Wanner
On 01/13/2013 09:04 PM, Hannu Krosing wrote:
> I'd just start with what send() and recv() on each type produces
> now using GCC on 64bit Intel and move towards adjusting others
> to match. For a period anything else would still be allowed, but
> be "non-standard"

Intel being little endian seems like a bad choice to me, given that
send/recv kind of implies network byte ordering. I'd rather not tie this
to any particular processor architecture at all (at least not solely on
the ground that it's the most common one at the time).

I have no strong opinion on "sameness" of NULLs and could also imagine
that to throw some kind of invalid operation error. Based on the ground
that neither is a value and it's unclear what send() method to use at all.

FWIW, trying to determine the length of a sent NULL gives an interesting
result that I don't currently understand.

> psql (9.2.2)
> Type "help" for help.
> 
> postgres=# SELECT length(int4send(NULL));
>  length 
> 
>
> (1 row)
> 
> postgres=# SELECT length(float4send(NULL));
>  length 
> 
>
> (1 row)
> 
> postgres=# SELECT length(textsend(NULL));
>  length 
> 
>
> (1 row)
>
> postgres=# SELECT length(textsend(NULL) || '\000'::bytea);
>  length 
> 
>
> (1 row)


Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal Write Scaling

2010-11-25 Thread Markus Wanner
Eliot,

On 11/23/2010 09:43 PM, Eliot Gable wrote:
> I know there has been a lot of talk about replication getting built into
> Postgres and I know of many projects that aim to fill the role. However,
> I have not seen much in the way of a serious attempt at multi-master
> write scaling.

Postgres-XC and Postgres-R are two pretty serious projects, IMO.

> I understand the fundamental problem with write scaling
> across multiple nodes is Disk I/O and inter-node communication latency
> and that in the conventional synchronous, multi-master replication type
> setup you would be limited to the speed of the slowest node,

That's not necessarily true for Postgres-R, which is why I call it an
'eager' solution (as opposed to fully synchronous). While it guarantees
that all transactions that got committed *will* be committable on all
nodes at some time in the future, nodes may still lag behind others.

Thus, even a slower / busy node doesn't hold back the others, but may
serve stale data. Ideally, your load balancer accounts for that and
gives that node a break or at least reduces the amount of transactions
going to that node, so it can catch up again.

Anyway, that's pretty Postgres-R specific.

> plus the
> communication protocol overhead and latency. However, it occurs to me
> that if you had a shared disk system via either iSCSI, Fiber Channel,
> NFS, or whatever (which also had higher I/O capabilities than a single
> server could utilize), if you used a file system that supported locks on
> a particular section (extent) of a file, it should theoretically be
> possible for multiple Postgres instances on multiple systems sharing the
> database to read and write to the database without causing corruption.

Possible, yes. Worthwile to do, probably not.

> Has anyone put any thought into what it would take to do this in
> Postgres? Is it simply a matter of making the database file interaction
> code aware of extent locking, or is it considerably more involved than
> that? It also occurs to me that you probably need some form of
> transaction ordering mechanism across the nodes based on synchronized
> timestamps, but it seems Postgres-R has the required code to do that
> portion already written.

If you rely on such an ordering, why use additional locks. That seems
like a waste of resources compared to Postgres-R. Not to mention the
introduction of a SPOF with the SAN.

> Wouldn't this type of setup be far
> simpler to implement

That's certainly debatable, yes. I obviously think that the benefit per
cost ratio for Postgres-R is better :-)

> and provide better scalability than trying to do
> multi-master replication using log shipping or binary object shipping or
> any other techniques?

It's more similar to replication using two phase commit, which provably
doesn't scale (see for example [1]) And using a SAN for locking
certainly doesn't beat 2PC via an equally modern/expensive interconnect.

> Wouldn't it also be far more efficient since you
> don't need to have a copy of your data on each master node and therefor
> also don't have to ship your data to each node and have each node
> process it?

You have to ship it from the SAN to the node, so I definitely don't
think so, but see this as an argument against it. Each having a local
copy and only exchange locking information and transactional changes
sounds like much less traffic overall.

Regards

Markus Wanner


[1]: The Dangers of Replication and a Solution, Gray et al, In Proc. of
the SIGMOD Conf., 1996,
http://research.microsoft.com/apps/pubs/default.aspx?id=68247

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Default mode for shutdown

2010-12-16 Thread Markus Wanner
On 12/15/2010 03:47 PM, Tom Lane wrote:
> Yeah, and more to the point, do I want to finish whatever I was doing in
> that window?  Fast-by-default is a nice hammer to swing, but one day
> you'll pound your finger.

Magnus pointed out that most distributions already use fast shutdown.
So it seems most people are used to that hammer, no?

Regards

Markus Wanner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WAL_DEBUG logs spurious data

2012-10-11 Thread Markus Wanner
Hi,

I stumbled across a minor issue in xlog.c:1030: the WAL_DEBUG code block
there passes rdata->data to the rm_desc() methode. However, that's only
the first XLogRecData struct, not the entire XLog record. So the
rm_desc() method effectively reports spurious data for any subsequent part.

Take a commit record with subxacts as an example: during XLogInsert,
Postgres reports the following:

LOG:  INSERT @ 0/16F3270: prev 0/16F3234; xid 688; len 16: Transaction -
commit: 2012-10-11 09:31:17.790368-07; subxacts: 3214563816

Note that the xid in subxacts is way off. During recovery from WAL, the
record is logged correctly:

LOG:  REDO @ 0/16F3270; LSN 0/16F329C: prev 0/16F3234; xid 688; len 16:
Transaction - commit: 2012-10-11 09:31:17.790368-07; subxacts: 689

Attached is a possible fix.

Regards

Markus Wanner
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 1033,1039  begin:;
  #ifdef WAL_DEBUG
  	if (XLOG_DEBUG)
  	{
! 		StringInfoData buf;
  
  		initStringInfo(&buf);
  		appendStringInfo(&buf, "INSERT @ %X/%X: ",
--- 1033,1056 
  #ifdef WAL_DEBUG
  	if (XLOG_DEBUG)
  	{
! 		StringInfoData	buf;
! 		char		   *full_rec = palloc(write_len), *ins_ptr;
! 
! 		/*
! 		 * We need assemble the entire record once, to be able to dump it out
! 		 * properly.
! 		 */
! 		rdt = rdata;
! 		ins_ptr = full_rec;
! 		while (rdt)
! 		{
! 			if (rdt->data != NULL)
! 			{
! memcpy(ins_ptr, rdt->data, rdt->len);
! ins_ptr += rdt->len;
! 			}
! 			rdt = rdt->next;
! 		}
  
  		initStringInfo(&buf);
  		appendStringInfo(&buf, "INSERT @ %X/%X: ",
***
*** 1042,1051  begin:;
  		if (rdata->data != NULL)
  		{
  			appendStringInfo(&buf, " - ");
! 			RmgrTable[rechdr->xl_rmid].rm_desc(&buf, rechdr->xl_info, rdata->data);
  		}
  		elog(LOG, "%s", buf.data);
  		pfree(buf.data);
  	}
  #endif
  
--- 1059,1069 
  		if (rdata->data != NULL)
  		{
  			appendStringInfo(&buf, " - ");
! 			RmgrTable[rechdr->xl_rmid].rm_desc(&buf, rechdr->xl_info, full_rec);
  		}
  		elog(LOG, "%s", buf.data);
  		pfree(buf.data);
+ 		pfree(full_rec);
  	}
  #endif
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL_DEBUG logs spurious data

2012-10-11 Thread Markus Wanner
Tom,

On 10/11/2012 03:11 PM, Tom Lane wrote:
> The original design intention was that rm_desc should not attempt to
> print any such data, but obviously some folks didn't get the word.

FWIW: in case the source code contains comments explaining that
intention, I certainly missed them so far.

Regards

Markus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL_DEBUG logs spurious data

2012-10-11 Thread Markus Wanner
On 10/11/2012 04:06 PM, Tom Lane wrote:
> Yeah, if we decide to stick with the limitation, some documentation
> would be called for.  I remember having run into this and having removed
> functionality from an rm_desc function rather than question the premise.
> But maybe the extra functionality is worth the cycles.

Well, I've been interested in getting it correct, and didn't care about
performance.

But I can certainly imagine someone enabling it on a production system
to get more debug info. In which case performance would matter more.
However, WAL_DEBUG being a #define, I bet only very few admins do that.
So I tend towards sacrificing performance for better debug info in the
WAL_DEBUG case. (Especially given that you can still disable it via GUC).

Regards

Markus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Global Sequences

2012-10-16 Thread Markus Wanner
On 10/16/2012 12:47 AM, Josh Berkus wrote:
> I'd also love to hear from the PostgresXC folks on whether this solution
> works for them.  Postgres-R too.

In Postgres-R, option 3) is implemented. Though, by default sequences
work just like on a single machine, giving you monotonically increasing
sequence values - independent from the node you call nextval() from. IMO
that's the user's expectation. (And yes, this has a performance penalty.
But no, there's no compromise in availability).

It is implemented very much like the per-backend cache we already have
in vanilla Postgres, but taken to the per-node level. This gives the
user a nice compromise between strongly ordered and entirely random
values, allowing fine-tuning the trade off between performance and
laziness in the ordering (think of CACHE 10 vs. CACHE 1).

> If it works for all three of those
> tools, it's liable to work for any potential new tool.

In Postgres-R, this per-node cache uses additional attributes in the
pg_sequence system catalog to store state of this cache. This is
something I'm sure is not feasible to do from within a plugin.

Why does a "Global Sequences" API necessarily hook at the nextval() and
setval() level? That sounds like it yields an awkward amount of
duplicate work. Reading this thread, so far it looks like we agree that
option 3) is the most feasible optimization (the strict ordering being
the un-optimized starting point). Do we really need an API that allows
for implementations of options 1) and 2)?

What I'd appreciate more is a common implementation for option 3) with
an API to plug in different solutions to the underlying consensus problem.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Global Sequences

2012-10-17 Thread Markus Wanner
Simon,

On 10/16/2012 02:36 PM, Simon Riggs wrote:
> Where else would you put the hook? The hook's location as described
> won't change whether you decide you want 1, 2 or 3.

You assume we want an API that supports all three options. In that case,
yes, the hooks need to be very general.

Given that option 3 got by far the most support, I question whether we
need such a highly general API. I envision an API that keeps the
bookkeeping and cache lookup functionality within Postgres. So we have a
single, combined-effort, known working implementation for that.

What remains to be done within the plugin effectively is the consensus
problem: it all boils down to the question of which node gets the next
chunk of N sequence numbers. Where N can be 1 (default CACHE setting in
Postgres) or any higher number for better performance (reduces the total
communication overhead by a factor of N - or at least pretty close to
that, if you take into account "lost" chucks due to node failures).

A plugin providing that has to offer a method to request for a global
ordering and would have to trigger a callback upon reaching consensus
with other nodes on who gets the next chunk of sequence numbers. That
works for all N >= 1. And properly implements option 3 (but doesn't
allow implementations of options 1 or 2, which I claim we don't need,
anyway).

> Implementations will be similar, differing mostly in the topology and
> transport layer

I understand that different users have different needs WRT transport
layers - moving the hooks as outlined above still allows flexibility in
that regard.

What different topologies do you have in mind? I'd broadly categorize
this all as multi-master. Do you need finer grained differentiation? Or
do you somehow include slaves (i.e. read-only transactions) in this process?

As you yourself are saying, implementations will only differ in that
way, let's keep the common code the same. And not require plugins to
duplicate that. (This also allows us to use the system catalogs for book
keeping, as another benefit).

> which means its not going to be possible to provide
> such a thing initially without slowing it down to the point we don't
> actually get it at all.

Sorry, I don't quite understand what you are trying to say, here.

Overall, thanks for bringing this up. I'm glad to see something
happening in this area, after all.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Global Sequences

2012-10-17 Thread Markus Wanner
Tom,

On 10/16/2012 06:15 PM, Tom Lane wrote:
> I challenge you to find anything in the SQL standard that suggests that
> sequences have any nonlocal behavior.  If anything, what you propose
> violates the standard, it doesn't make us follow it more closely.

If you look at a distributed database as a transparent equivalent of a
single-node system, I'd say the SQL standard applies to the entire
distributed system. From that point of view, I'd rather argue that any
"local-only" behavior violates the standard.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Global Sequences

2012-10-17 Thread Markus Wanner
Simon,

On 10/17/2012 10:34 AM, Simon Riggs wrote:
> IMHO an API is required for "give me the next allocation of numbers",
> essentially a bulk equivalent of nextval().

Agreed. That pretty exactly matches what I described (and what's
implemented in Postgres-R). The API then only needs to be called every N
invocations of nextval(), because otherwise nextval() can simply return
a cached number previously allocated in a single step, eliminating a lot
of the communication overhead.

You realize an API at that level doesn't allow for an implementation of
options 1 and 2? (Which I'm convinced we don't need, so that's fine with
me).

> Anything lower level is going to depend upon implementation details
> that I don't think we should expose.

Exactly. Just like we shouldn't expose other implementation details,
like writing to system catalogs or WAL.

> I'm sure there will be much commonality between 2 similar
> implementations, just as there is similar code in each index type. But
> maintaining modularity is important and ahead of us actually seeing 2
> implementations, trying to prejudge that is going to slow us all down
> and likely screw us up.

Agreed. Let me add, that modularity only serves a purpose, if the
boundaries between the modules are chosen wisely. It sounds like we are
on the same page, though.

To testify this: IMHO an API for setval() is required to invalidate all
node's caches and re-set an initial value, as a starting point for the
next bulk of numbers that nextval() will return.

currval() doesn't need to be changed or "hooked" at all, because it's a
read-only operation.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical to physical page mapping

2012-10-30 Thread Markus Wanner
On 10/29/2012 12:05 PM, Robert Haas wrote:
> OK, I'll stop babbling now...

Not perceived as babbling here. Thanks for that nice round-up of options
and ideas around the torn page problem.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2012-11-09 Thread Markus Wanner
Jeff,

On 11/09/2012 02:01 AM, Jeff Davis wrote:
> For the sake of simplicity (implementation as well as usability), it
> seems like there is agreement that checksums should be enabled or
> disabled for the entire instance, not per-table.

Agreed. I've quickly thought about making it a per-database setting, but
how about shared system catalogs... Let's keep it simple and have a
single per-cluster instance switch for now.

> I don't think a GUC entirely makes sense (in its current form, anyway).
> We basically care about 3 states:
>   1. Off: checksums are not written, nor are they verified. Pages that
> are newly dirtied have the checksum information in the header cleared.
>   2. Enabling: checksums are written for every dirty page, but only
> verified for pages where the checksum is present (as determined by
> information in the page header).
>   3. On: checksums are written for every dirty page, and verified for
> every page that's read. If a page does not have a checksum, it's
> corrupt.

Sounds sane, yes.

> And the next question is what commands to add to change state. Ideas:
> 
>CHECKSUMS ENABLE; -- set state to "Enabling"
>CHECKSUMS DISABLE; -- set state to "Off"

Yet another SQL command doesn't feel like the right thing for such a
switch. Quick googling revealed that CHECKSUM is a system function in MS
SQL and MySQL knows a CHECKSUM TABLE command. And you never know what
the committee is coming up with next.

Apart from that, I'd like something more descriptive that just
"checksums". Block checksums? Heap checksums? Data checksums?

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2012-11-09 Thread Markus Wanner
On 11/09/2012 06:18 AM, Jesper Krogh wrote:
> I would definately stuff our system in state = 2 in your
> description if it was available.

Hm.. that's an interesting statement.

What's probably worst when switching from OFF to ON is the VACUUM run
that needs to touch every page (provided you haven't ever turned
checksumming on before). Maybe you want to save that step and still get
the additional safety for newly dirtied pages, right?

A use case worth supporting?

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2012-11-09 Thread Markus Wanner
On 11/09/2012 07:53 PM, Jeff Davis wrote:
> One problem is telling which pages are protected and which aren't. We
> can have a couple bits in the header indicating that a checksum is
> present, but it's a little disappointing to have only a few bits
> protecting a 16-bit checksum.

Given your description of option 2 I was under the impression that each
page already has a bit indicating whether or not the page is protected
by a checksum. Why do you need more bits than that?

> Also, I think that people will want to have a way to protect their old
> data somehow.

Well, given that specific set of users is not willing to go through a
rewrite of each and every page of its database, it's hard to see how we
can protect their old data better.

However, we certainly need to provide the option to go through the
rewrite for other users, who are well willing to bite that bullet.

>From a users perspective, the trade-off seems to be: if you want your
old data to be covered by checksums, you need to go through such an
expensive VACUUM run that touches every page in your database.

If you don't want to or cannot do that, you can still turn on
checksumming for newly written pages. You won't get full protection and
it's hard to tell what data is protected and what not, but it's still
better than no checksumming at all. Especially for huge databases, that
might be a reasonable compromise.

One could even argue, that this just leads to a prolonged migration and
with time, the remaining VACUUM step becomes less and less frightening.

Do you see any real foot-guns or other show-stoppers for permanently
allowing that in-between-state?

Or do we have other viable options that prolong the migration and thus
spread the load better over time?

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2012-11-12 Thread Markus Wanner
Jeff,

On 11/10/2012 12:08 AM, Jeff Davis wrote:
> The bit indicating that a checksum is present may be lost due to
> corruption.

Hm.. I see.

Sorry if that has been discussed before, but can't we do without that
bit at all? It adds a checksum switch to each page, where we just agreed
we don't event want a per-database switch.

Can we simply write a progress indicator to pg_control or someplace
saying that all pages up to X of relation Y are supposed to have valid
checksums?

That would mean having to re-calculate the checksums on pages that got
dirtied before VACUUM came along to migrate them to having a checksum,
but that seems acceptable. VACUUM could even detect that case and
wouldn't have to re-write it with the same contents.

I realize this doesn't support Jesper's use case of wanting to have the
checksums only for newly dirtied pages. However, I'd argue that
prolonging the migration to spread the load would allow even big shops
to go through this without much of an impact on performance.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2012-11-12 Thread Markus Wanner
On 11/12/2012 05:55 AM, Greg Smith wrote:
> Adding an initdb option to start out with everything checksummed seems
> an uncontroversial good first thing to have available.

+1

So the following discussion really is for a future patch extending on
that initial checkpoint support.

> One of the really common cases I was expecting here is that conversions
> are done by kicking off a slow background VACUUM CHECKSUM job that might
> run in pieces.  I was thinking of an approach like this:
> 
> -Initialize a last_checked_block value for each table
> -Loop:
> --Grab the next block after the last checked one
> --When on the last block of the relation, grab an exclusive lock to
> protect against race conditions with extension
> --If it's marked as checksummed and the checksum matches, skip it
> ---Otherwise, add a checksum and write it out
> --When that succeeds, update last_checked_block
> --If that was the last block, save some state saying the whole table is
> checkedsummed

Perfect, thanks. That's the rough idea I had in mind as well, written
out in detail and catching the extension case.

> With that logic, there is at least a forward moving pointer that removes
> the uncertainty around whether pages have been updated or not.  It will
> keep going usefully if interrupted too.  One obvious this way this can
> fail is if:
> 
> 1) A late page in the relation is updated and a checksummed page written
> 2) The page is corrupted such that the "is this checksummed?" bits are
> not consistent anymore, along with other damage to it
> 3) The conversion process gets to this page eventually
> 4) The corruption of (2) isn't detected

IMO this just outlines how limited the use of the "is this checksummed"
bit in the page itself is. It just doesn't catch all cases. Is it worth
having that bit at all, given your block-wise approach above?

It really only serves to catch corruptions to *newly* dirtied pages
*during* the migration phase that *keep* that single bit set. Everything
else is covered by the last_checked_block variable. Sounds narrow enough
to be negligible. Then again, it's just a single bit per page...

> The only guarantee I see that we can give for online upgrades is that
> after a VACUUM CHECKSUM sweep is done, and every page is known to both
> have a valid checksum on it and have its checksum bits set, *then* any
> page that doesn't have both set bits and a matching checksum is garbage.

>From that point in time on, we'd theoretically better use that bit as an
additional checksum bit rather than requiring it to be set all times.
Really just theoretically, I'm certainly not advocating a 33 bit
checksum :-)

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2012-11-12 Thread Markus Wanner
On 11/12/2012 10:44 AM, Craig Ringer wrote:
> That'll make it hard for VACUUM, hint-bit setting, etc to
> opportunistically checksum pages whenever they're doing a page write anyway.

It *is* a hard problem, yes. And the single bit doesn't really solve it.
So I'm arguing against opportunistically checksumming in general. Who
needs that anyway?

> Is it absurd to suggest using another bitmap, like the FSM or visibility
> map, to store information on page checksumming while checksumming is
> enabled but incomplete?

Not absurd. But arguably inefficient, because that bitmap may well
become a bottleneck itself. Plus there's the problem of making sure
those pages are safe against corruptions, so you'd need to checksum the
checksum bitmap... doesn't sound like a nice solution to me.

This has certainly been discussed before.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2012-11-12 Thread Markus Wanner
Jeff,

On 11/12/2012 06:52 PM, Jeff Davis wrote:
> OK, so here's my proposal for a first patch (changes from Simon's
> patch):
> 
>   * Add a flag to the postgres executable indicating that it should use
> checksums on everything. This would only be valid if bootstrap mode is
> also specified.
>   * Add a multi-state checksums flag in pg_control, that would have
> three states: OFF, ENABLING, and ON. It would only be set to ON during
> bootstrap, and in this first patch, it would not be possible to set
> ENABLING.
>   * Remove GUC and use this checksums flag everywhere.
>   * Use the TLI field rather than the version field of the page header.
>   * Incorporate page number into checksum calculation (already done).
>   
> Does this satisfy the requirements for a first step? Does it interfere
> with potential future work?

As described before in this thread, I think we might be able to do
without the "has checksum"-bit, as yet another simplification. But I
don't object to adding it, either.

> It's slightly better than that. It's more like: "we can tell you if any
> of your data gets corrupted after transaction X". If old data is
> corrupted before transaction X, then there's nothing we can do. But if
> it's corrupted after transaction X (even if it's old data), the
> checksums should catch it.

I (mis?)read that as Greg referring to the intermediate (enabling)
state, where pages with old data may or may not have a checksum, yet. So
I think it was an argument against staying in that state any longer than
necessary.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2012-11-13 Thread Markus Wanner
On 11/13/2012 01:22 AM, Greg Smith wrote:
> Once you accept that eventually there need to be online conversion
> tools, there needs to be some easy way to distinguish which pages have
> been processed for several potential implementations.

Agreed. What I'm saying is that this identification doesn't need to be
as fine grained as a per-page bit. A single "horizon" or "border" is
enough, given an ordering of relations (for example by OID) and an
ordering of pages in the relations (obvious).

> All of the table-based checksum enabling ideas ...

This is not really one - it doesn't allow per-table switching. It's just
meant to be a more compact way of representing which pages have been
checksummed and which not.

> I'm thinking of this in some ways like the way creation of a new (but
> not yet valid) foreign key works.  Once that's active, new activity is
> immediately protected moving forward.  And eventually there's this
> cleanup step needed, one that you can inch forward over a few days.

I understand that. However, I question if users really care. If a
corruption is detected, the clever DBA tells his trainee immediately
check the file- and disk subsystem - no matter whether the corruption
was on old or new data.

You have a point in that pages with "newer" data are often more likely
to be re-read and thus getting checked. Where as the checksums written
to pages with old data might not be re-read any time soon. Starting to
write checksums from the end of the relation could mitigate this to some
extent, though.

Also keep in mind the "quietly corrupted after checked once, but still
in the middle of checking a relation" case. Thus a single bit doesn't
really give us the guarantee you ask for. Sure, we can add more than one
bit. And yeah, if done properly, adding more bits exponentially reduces
the likeliness of a corruption inadvertently turning off checksumming
for a page.

All that said, I'm not opposed to using a few bits of the page header. I
wanted to outline an alternative that I think is viable and less intrusive.

> This is why I think any good solution to this problem needs to
> incorporate restartable conversion.

I fully agree to that.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronized snapshots

2010-02-05 Thread Markus Wanner

Hello Joachim,

a little daughter eats lots of spare cycles - among other things. Sorry 
it took that long to review.


On Fri, 8 Jan 2010 20:36:44 +0100, Joachim Wieland 
wrote:

The attached patch implements the idea of Heikki / Simon published in

http://archives.postgresql.org/pgsql-hackers/2009-11/msg00271.php


I must admit I didn't read that up front, but thought your patch could 
be useful for implementing parallel querying.


So, let's first concentrate on the intended use case: allowing parallel 
pg_dump. To me it seems like a pragmatic and quick solution, however, 
I'm not sure if requiring superuser privileges is acceptable.


The patch currently compiles (modulo some OID changes in pg_proc.h to 
prevent duplicates) and the test suite runs through fine. I haven't 
tested the new functions, though.



Reading the code, I'm missing the part that actually acquires the 
snapshot for the transaction(s). After setting up multiple transactions 
with pg_synchronize_snapshot and pg_synchronize_snapshot_taken, they 
still don't have a snapshot, do they?


Also, you should probably ensure the calling transactions don't have a 
snapshot already (let alone a transaction id).


In a similar vein, and answering your question in a comment: yes, I'd 
say you want to ensure your transactions are in SERIALIZABLE isolation 
mode. There's no other isolation level for which that kind of snapshot 
serialization makes sense, is there?



Using the exposed functions in a more general sense, I think it's 
important to note that the patch only intents to synchronize snapshots 
at the start of the transaction, not contiguously. Thus, normal 
transaction isolation applies for concurrent writes and each of the 
transactions can commit or rollback independently.


The timeout is nice, but is it really required? Isn't the normal query
cancellation infrastructure sufficient?

Hope that helps. Thanks for working on this issue.

Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby documentation

2010-02-07 Thread Markus Wanner

Bruce,

Bruce Momjian wrote:

Ah, I now realize it only mentions "warm" standby, not "hot", so I just
updated the documentation to reflect that;  you can see it here:


Maybe the table below also needs an update, because unlike "Warm Standby 
using PITR", a hot standby accepts read-only queries and can be 
configured to not loose data on master failure.



Do we want to call the feature "hot standby"?  Is a read-only standby a
"standby" or a "slave"?


I think hot standby is pretty much the term, now.

Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronized snapshots

2010-02-10 Thread Markus Wanner

Hi Joachim,

On Wed, 10 Feb 2010 11:36:41 +0100, Joachim Wieland 
wrote:

http://www.postgresql.org/docs/8.4/static/backup-dump.html already
states about pg_dump: "In particular, it must have read access to all
tables that you want to back up, so in practice you almost always have
to run it as a database superuser." so I think there is not a big loss
here...


Hm.. I doubt somewhat that's common practice. After all, read access to 
all tables is still a *lot* less than superuser privileges. But yeah, 
the documentation currently states that.



They more or less get it "by chance" :-)  They acquire a snapshot when
they call pg_synchronize_snapshot_taken()


Oh, I see, calling the function by itself already acquires a snapshot.
Even in case of a fast path call, it seems. Then your approach is correct.

(I'd still feel more comfortable, it I had seen a 
GetTransactionSnapshot() or something akin in there).



and if all the backends do
it while the other backend holds the lock in shared mode, we know that
the snapshot won't change, so they all get the same snapshot.


Agreed, that works.

(Ab)using the ProcArrayLock for synchronization is probably acceptable 
for pg_dump, however, I'd rather take another approach for a more 
general implementation.



Also, you should probably ensure the calling transactions don't have a
snapshot already (let alone a transaction id).


True...


Hm.. realizing that a function call per-se acquires a snapshot, I fail 
to see how we could check if we really acquired a snapshot. Consider the 
following (admittedly stupid) example:


BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT version();
 ... time goes by ...
SELECT pg_synchronize_snapshot_taken(..);

As it stands, your function would silently fail to "synchronize" the
snapshots, if other transactions committed in between the two function 
calls.



It seemed more robust and convenient to have an expiration in the
backend itself. What would happen if you called
pg_synchronize_snapshots() and if right after that your network
connection dropped? Without the server noticing, it would continue to
hold the lock and you could not log in anymore...


Hm.. that's a point. Given this approach uses the ProcArrayLock, it's
probably better to use an explicit timeout.


But you are right: The proposed feature is a pragmatic and quick
solution for pg_dump and similar but we might want to have a more
general snapshot cloning procedure instead. Not having a delay for
other activities at all and not requiring superuser privileges would
be a big advantage over what I have proposed.


Agreed.

Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testing with concurrent sessions

2010-02-13 Thread Markus Wanner

Hi,

Kevin Grittner wrote:

I just got to the point of having what appears to be a working but
poorly optimized version of serializable transactions,


Cool.


so it is
critical that I create a good set of tests to confirm correct
behavior and monitor for regressions as I optimize.  I see that
you've been working on dtester recently -- should I grab what you've
got here, stick with 0.1, or do you want to package something?


I'm preparing for a 0.1 release, yes. However, I'm trying to keep up 
with the postgres-dtests branch, so that should continue to work with 
the latest dtester code.



If I
should pull from your git, any hints on the best git statements to
merge that in are welcome, I'm still rather new to git, and I tend
not to get things right on a first try without some hints.  :-/


I'd recommend merging postgres-dtests into your branch, so you'll get 
all of the adjustments from there, if I change dtester itself.


Once 0.1 is out, I'm *trying* to remain backwards compatible in future 
releases. However, that holds me from releasing 0.1 in the first place :-)



Independent of versions and releases: if you build your tests on top of 
dtester, I'm more than willing to support you with that. If you keep 
your code in a git repository, I could even provide patches, in case I 
need (or want) to change the dtester interface.


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ProcSignalSlot vs. PGPROC

2010-02-26 Thread Markus Wanner

Hi,

do I understand correctly that a BackendId is just an index into the 
ProcSignalSlots array and not (necessarily) the same as the index into 
ProcArrayStruct's procs?


If yes, could these be synchronized? Why is ProcSignalSlot not part of 
PGPROC at all? Both are shared memory structures per backend (plus 
NUM_AUXILIARY_PROCS). What am I missing?


Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] dtester-0.1 released

2010-03-21 Thread Markus Wanner
Hi,

the test harness used for Postgres-R has been unbundled and is now
available separately in a first release version 0.1. With its async,
event driven nature it is specifically targetting distributed systems
and designed to allow easy reuse and rearrangment of test components.

Special thanks to Kevin Grittner for being the motivating early adopter
with patience, good suggestions and the invaluable ability to read and
understand others code. He uses dtester for the development of true
serializability.

To learn more about the dtester project, visit its project website:
http://www.bluegap.ch/projects/dtester/

A git repository for dtester as well as some integration code for
testing Postgres based projects is available at:
http://git.postgres-r.org/

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dtester-0.1 released

2010-03-24 Thread Markus Wanner

Steve,

Steve Singer wrote:

$ git clone http://git.postgres-r.org/dtester
Initialized empty Git repository in 
/local/home/ssinger/src/dtester/dtester/.git/
fatal: http://git.postgres-r.org/dtester/info/refs download error - The 
requested URL returned error: 500


Oh, thank you for pointing this out. I've fixed that for now.

Do I need to run git update-server-info after every pull for the http 
protocol to work? Or just once to initialize the repository?


Note that the git protocol said to be is more efficient. At least that's 
what I've heard. You might want to use that instead (especially if http 
continues to pose problems).


Kind Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] nodeToString format and exporting the SQL parser

2010-04-03 Thread Markus Wanner

Hi,

Michael Tharp wrote:
I have been spending a little time making the internal SQL parser 
available to clients via a C-language SQL function.


This sounds very much like one of the Cluster Features:
http://wiki.postgresql.org/wiki/ClusterFeatures#API_into_the_Parser_.2F_Parser_as_an_independent_module

Is this what you (or David) have in mind?

Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pruning the TODO list

2012-06-30 Thread Markus Wanner
Hi,

On 06/22/2012 05:38 PM, Andrew Dunstan wrote:
> I think the real problem with the TODO list is that some people see it
> as some sort of official roadmap, and it really isn't. Neither is there
> anything else that is.

To me, it looks like TODO is just a misnomer. The file should be named
TODISCUSS, IDEAS, or something. But the current file name implies consensus.

Wouldn't renaming solve that kind of misunderstanding? (..in the vain of
"address(ing) real problems in the simplest way"..)

Regards

Markus Wanner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] BlockNumber initialized to InvalidBuffer?

2012-07-10 Thread Markus Wanner
Hackers,

I stumbled across an initialization of a BlockNumber with InvalidBuffer,
which seems strange to me, as the values for "invalid" of the two types
are different, see attached patch.

In case the 'stack' argument passed to that function is not NULL, the
variable in question gets overridden immediately, in which case it
certainly doesn't matter. I don't know nor did I check whether or not it
can ever be NULL. So this might not be a real issue at all.

Regards

Markus Wanner
# InvalidBlockNumber is -1 (or rather 0x), while
# the currently used InvalidBuffer is 0, which is a valid
# BlockNumber.

*** src/backend/access/gin/ginbtree.c	2d3e63387737b4034fc25ca3cb128d9ac57f4f01
--- src/backend/access/gin/ginbtree.c	67351e1b6541b25ab3c8e8dc7a57487c2422e124
*** ginInsertValue(GinBtree btree, GinBtreeS
*** 276,282 
  ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
  {
  	GinBtreeStack *parent = stack;
! 	BlockNumber rootBlkno = InvalidBuffer;
  	Page		page,
  rpage,
  lpage;
--- 276,282 
  ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
  {
  	GinBtreeStack *parent = stack;
! 	BlockNumber rootBlkno = InvalidBlockNumber;
  	Page		page,
  rpage,
  lpage;


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BlockNumber initialized to InvalidBuffer?

2012-07-13 Thread Markus Wanner
On 07/11/2012 05:45 AM, Tom Lane wrote:
> I'm also inclined to think that the "while (stack)" coding of the rest
> of it is wrong, misleading, or both, on precisely the same grounds: if
> that loop ever did fall out at the test, the function would have failed
> to honor its contract.  The only correct exit points are the "return"s
> in the middle.

I came to the same conclusion, yes. Looks like the additional asserts in
the attached patch all hold true.

As another minor improvement, it doesn't seem necessary to repeatedly
set the rootBlkno.

Regards

Markus Wanner
#
# old_revision [d90761edf47c6543d4686a76baa0b4e2a7ed113b]
#
# patch "src/backend/access/gin/ginbtree.c"
#  from [2d3e63387737b4034fc25ca3cb128d9ac57f4f01]
#to [62efe71b8e3429fc1bf2d2742f8d2fa69f2f4de6]
#

*** src/backend/access/gin/ginbtree.c	2d3e63387737b4034fc25ca3cb128d9ac57f4f01
--- src/backend/access/gin/ginbtree.c	62efe71b8e3429fc1bf2d2742f8d2fa69f2f4de6
*** ginInsertValue(GinBtree btree, GinBtreeS
*** 276,294 
  ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
  {
  	GinBtreeStack *parent = stack;
! 	BlockNumber rootBlkno = InvalidBuffer;
  	Page		page,
  rpage,
  lpage;
  
  	/* remember root BlockNumber */
! 	while (parent)
  	{
  		rootBlkno = parent->blkno;
  		parent = parent->parent;
! 	}
  
! 	while (stack)
  	{
  		XLogRecData *rdata;
  		BlockNumber savedRightLink;
--- 276,298 
  ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
  {
  	GinBtreeStack *parent = stack;
! 	BlockNumber rootBlkno;
  	Page		page,
  rpage,
  lpage;
  
  	/* remember root BlockNumber */
! 	Assert(stack != NULL);
! 	parent = stack;
! 	do
  	{
  		rootBlkno = parent->blkno;
  		parent = parent->parent;
! 	} while (parent);
  
! 	Assert(BlockNumberIsValid(rootBlkno));
! 
! 	for (;;)
  	{
  		XLogRecData *rdata;
  		BlockNumber savedRightLink;
*** ginInsertValue(GinBtree btree, GinBtreeS
*** 469,474 
--- 473,479 
  
  		UnlockReleaseBuffer(stack->buffer);
  		pfree(stack);
+ 		Assert(parent != NULL);		/* parent == NULL case is handled above */
  		stack = parent;
  	}
  }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BlockNumber initialized to InvalidBuffer?

2012-07-13 Thread Markus Wanner
On 07/13/2012 11:33 AM, Markus Wanner wrote:
> As another minor improvement, it doesn't seem necessary to repeatedly
> set the rootBlkno.

Sorry, my mail program delivered an older version of the patch, which
didn't reflect that change. Here's what I intended to send.

Regards

Markus Wanner
#
# Minor cleanup of ginInsertValue with additional asserts and
# a tighter loop for finding the root in the GinBtreeStack.
#

*** src/backend/access/gin/ginbtree.c	2d3e63387737b4034fc25ca3cb128d9ac57f4f01
--- src/backend/access/gin/ginbtree.c	f6dc88ae5716275e62a6f6715aa7204abd430089
*** ginInsertValue(GinBtree btree, GinBtreeS
*** 276,294 
  ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
  {
  	GinBtreeStack *parent = stack;
! 	BlockNumber rootBlkno = InvalidBuffer;
  	Page		page,
  rpage,
  lpage;
  
! 	/* remember root BlockNumber */
! 	while (parent)
! 	{
! 		rootBlkno = parent->blkno;
  		parent = parent->parent;
- 	}
  
! 	while (stack)
  	{
  		XLogRecData *rdata;
  		BlockNumber savedRightLink;
--- 276,296 
  ginInsertValue(GinBtree btree, GinBtreeStack *stack, GinStatsData *buildStats)
  {
  	GinBtreeStack *parent = stack;
! 	BlockNumber rootBlkno;
  	Page		page,
  rpage,
  lpage;
  
! 	/* extract root BlockNumber from stack */
! 	Assert(stack != NULL);
! 	parent = stack;
! 	while (parent->parent)
  		parent = parent->parent;
  
! 	rootBlkno = parent->blkno;
! 	Assert(BlockNumberIsValid(rootBlkno));
! 
! 	for (;;)
  	{
  		XLogRecData *rdata;
  		BlockNumber savedRightLink;
*** ginInsertValue(GinBtree btree, GinBtreeS
*** 469,474 
--- 471,477 
  
  		UnlockReleaseBuffer(stack->buffer);
  		pfree(stack);
+ 		Assert(parent != NULL);		/* parent == NULL case is handled above */
  		stack = parent;
  	}
  }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-06 Thread Markus Wanner

Hi,

On 09/06/2010 06:27 PM, Heikki Linnakangas wrote:

Here's an updated patch, with all the issues reported this far fixed,
except for that naming issue, and Fujii's suggestion to use poll()
instead of select() where available. I've also polished it quite a bit,
improving comments etc. Magnus, can you take a look at the Windows
implementation to check that it's sane? At least it seems to work.


Is pselect() really as unportable as stated in the patch? What platforms 
have problems with pselect()?


Using the self-pipe trick, don't we risk running into the open file 
handles limitation? Or is it just two handles per process?


Do I understand correctly that the purpose of this patch is to work 
around the brokenness of select() on very few platforms? Or is there any 
additional feature that plain signals don't give us?



+ * It's important to reset the latch*before*  checking if there's work to
+ * do. Otherwise, if someone sets the latch between the check and the
+ * ResetLatch call, you will miss it and Wait will block.


Why doesn't WaitLatch() clear it? What's the use case for waiting for a 
latch and *not* wanting to reset it?


Regards

Markus

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-06 Thread Markus Wanner

On 09/06/2010 08:46 PM, Tom Lane wrote:

Well, it's not defined in the Single Unix Spec, which is our customary
reference for portability.


FWIW, I bet the self-pipe trick isn't mentioned there, either... any 
good precedence that it actually works as expected on all of the target 
platforms? Existing users of the self-pipe trick?


(You are certainly aware that pselect() is defined in newer versions of 
POSIX).



Also, it's alleged that some platforms have
it but in a form that's not actually any safer than select().  For
example, I read in the Darwin man page for it

IMPLEMENTATION NOTES
  The pselect() function is implemented in the C library as a wrapper
  around select().


Ouch. Indeed, quick googling reveals the following source code for Darwin:

http://www.opensource.apple.com/source/Libc/Libc-391.5.22/gen/FreeBSD/pselect.c

Now that you are mentioning it, I seem to recall that even glibc had a 
user-space "implementation" of pselect. Fortunately, that's quite some 
years ago.



and that man page appears to be borrowed verbatim from FreeBSD.


At least FreeBSD seems to have fixed this about 8 months ago:
http://svn.freebsd.org/viewvc/base?view=revision&revision=200725

Maybe Darwin catches up eventually?


AFAICT the custom select() implementation we are using for Windows could 
easily be changed to mimic pselect() instead. Thus most reasonably 
up-to-date Linux distributions plus Windows certainly provide a workable 
pselect() syscall. Would it be worth using pselect() for those (and 
maybe others that support pselect() appropriately)?



It's just two handles per process.


Good. How about syscall overhead? One more write operation to the 
self-pipe per signal from within the signal handler and one read to 
actually clear the 'ready' state of the pipe from the waiter portion of 
the code, right?


Do we plan to replace all (or most) existing internal signals with these 
latches to circumvent the interruption problem? Or just the ones we need 
to wait for using pg_usleep()?


For Postgres-R, I'd probably have to extend it to call select() not only 
on the self-pipe, but on at least one other socket as well (to talk to 
the GCS). As long as that's possible, it looks like a more portable 
replacement for the pselect() variant that's currently in place there.


Regards

Markus

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-09-07 Thread Markus Wanner

Hi,

On 05/27/2010 01:28 PM, Robert Haas wrote:

How do you propose to guarantee that?  ISTM that you have to either
commit locally first, or send the commit to the remote first.  Either
way, the two events won't occur exactly simultaneously.


I'm not getting the point of this discussion. As long as the database 
confirms the commit to the client only *after* having an ack from the 
standby and *after* committing locally, there's no problem.


In any case, a server failure in between the commit request of the 
client and the commit confirmation for the client results in a client 
that can't tell if its transaction committed or not.


So why do we care what to do first internally? Ideally, these two tasks 
should happen concurrently, IMO.


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-07 Thread Markus Wanner

On 09/07/2010 09:06 AM, Heikki Linnakangas wrote:

Setting a latch that's already set is very fast, so you want to keep it
set until the last moment. See the coding in walsender for example, it
goes to some lengths to avoid clearing the latch until it's very sure
there's no more work for it to do. That helps to keep the overhead in
backends committing transactions low. (no-one has tried to measure that
yet, though)


Understood, thanks.

Markus

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-09-07 Thread Markus Wanner

On 09/07/2010 02:16 PM, Robert Haas wrote:

Right, definitely.  The trouble is that if they happen concurrently,
and there's a crash, you have to be prepared for the possibility that
either one of the two has completed and the other is not.


Understood.


In
practice, this means that the master and standby need to compare notes
on the ending WAL location and whichever one is further advanced needs
to stream the intervening records to the other.


Not necessarily, no. Remember that the client didn't get a commit 
confirmation. So reverting might also be a correct solution (i.e. not 
violating the durability constraint).



This would be an
awesome feature, but it's hard, so for a first version, it makes sense
to commit on the master first and then on the standby after the master
is known done.


The obvious downside of that is that latency adds up, instead of just 
being the max of the two operations. And that for normal operation. 
While at best it saves an un-confirmed transaction in the failure case.


It might be harder to implement, yes.

Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-09-07 Thread Markus Wanner

On 09/07/2010 04:15 PM, Robert Haas wrote:

In theory, that's true, but if we do that, then there's an even bigger
problem: the slave might have replayed WAL ahead of the master
location; therefore the slave is now corrupt and a new base backup
must be taken.


The slave isn't corrupt. It would suffice to "late abort" committed 
transactions the master doesn't know about.


However, I realize that undoing of WAL isn't something that's 
implemented (nor planned). So it's probably easier to forward the master 
in such a case.



Yeah, I hope we'll get there eventually.


Understood. Thanks.

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-09-07 Thread Markus Wanner

On 09/07/2010 04:47 PM, Ron Mayer wrote:

In that situation, wouldn't it be possible that a different client
queried the slave and already saw the result of that transaction
which would later be rolled back?


Good point, yes.

Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   3   4   5   6   >