date:20150107

[GENERAL] Central management for regular tasks on multiple databases

2015-01-07 Thread Jiří Hlinka

Hi list,

I'd like to ask you if you have any experience with some solution or how do
you manage tasks described below.

We have tens of separate databases (for different apps and purpose)  which
we want send regular tasks to. The tasks are always SQL queries, for
example count some bilance after midnight on part of the databases, delete
unneeded items ad so on.

Is there any solution which would behave somethin like central cron for
databases? It should be able to manage these tasks:
 - ideally with web UI + auth
 - setup which databases would be queried by the task
 - setup in which time period the task (query) will be run

How do you manage these kind of tasks?

Thanks for tips.

Best Regards,
Jiri

Re: [GENERAL] Improving performance of merging data between tables

2015-01-07 Thread Pawel Veselov

On Tue, Dec 30, 2014 at 7:25 PM, Maxim Boguk  wrote:
>
> On Wed, Dec 31, 2014 at 11:10 AM, Pawel Veselov 
> wrote
>>
>>
>> [skipped]
>>
>> 2) try pg_stat_statements, setting "pg_stat_statements.track = all".  see:
 http://www.postgresql.org/docs/9.4/static/pgstatstatements.html

 I have used this to profile some functions, and it worked pretty well.
 Mostly I use it on a test box, but once ran it on the live, which was
 scary, but worked great.

>>>
>>> That looks promising. Turned it on, waiting for when I can turn the
>>> server at the next "quiet time".
>>>
>>
>> I have to say this turned out into a bit of a disappointment for this use
>> case. It only measures total time spent in a call. So, it sends up
>> operations that waited a lot on some lock. It's good, but it would be great
>> if total_time was provided along with wait_time (and io_time may be as
>> well, since I also see operations that just naturally have to fetch a lot
>> of data)
>>
>
> 1) pg_stat_statements provide an information about io_time of each
> statement but you should have track_io_timing 
>
> enabled for that.
>

Enabled that now. Still the top winners are the functions that probably
lock for a long (relatively) time. This did help my find some crap that
either was missing an index, or used an unreasonable join, and just needed
re-writing. One entry that doesn't make sense to me is:

total_time - io_time = 1,366,773
calls = 666,542
query = SELECT * FROM q_SCHEDULER_STATE WHERE SCHED_NAME = ?
The table only has 18 rows, there is an index, but the analyzer chooses to
ignore it, which is right since sched_name column has the same value for
all rows. So all rows are returned in SELECT. The time to run that query
under database load varies from 0.09 to 70ms.
This is a distraction from the main topic, though, but does stand out odd.

> 2) About locking I suggest enable log_lock_waits and set deadlock_timeout
> to say 100ms (just for testing purposes), and than any lock waiting more
> than 100ms will be logged with some useful additional info.
>

> PPS: btw, please check the database logs for deadlocks messages, your
> setup around "and then call a pgsql function to merge the data from its
> tables into the common tables" part could be easily deadlock prone.
>

I don't have I have abnormal problem with locking. I wanted to eliminate
locking time out of the pg_stat_statement, to address queries that aren't
waiting on disk and/or locks first, as my problem is high CPU, not specific
query performance. I don't have deadlocks for sure -- I had them before,
and I would normally get an error if there was a deadlock. We process all
the records in exactly the same order of keys to avoid deadlocks.

PPPS: and the last suggestion, after you finished with the "write all the
> data into its own tables", then application should perform analyze of these
> own tables (or you could have weird/inefficient plans during last stage).

Any references to back this up? I don't particularly mind doing it, but I
wonder if analysis can be more expensive the processing. These tables get a
few hundreds of records inserted/updated, then are entirely processed (with
expected full scans), and then deleted...

> PS: your setup look pretty complicated and hard to analyze without seeing
> all involved table structures, transaction/query flow, and (especially)
> involved procedures source code.
>

Sure :) At this point, I've put together the "bulk merge" code as well. I
can't quite see much of a difference, actually, but it's hard to trust the
execution times, as on the same amount of data they vary from, say, 0.5s to
2s, and the sample data is not stepping on any other locks. In general, I'm
afraid all those left joins and multiple scans, even over small amount of
data, is nullifying any positive effect.

primary table: http://pastebin.com/gE2TjZd3
secondary table(s): http://pastebin.com/aDVakUkp
There are actually 10 secondary tables, but they are more or less of the
same structure.
The node tables have identical structure to the main tables.

First stage, which I don't particularly question, but may be wrongfully so,
the application does:
- create data suitable for the rows in the primary/secondary tables
- starts transaction
- merges data into its own node tables (using merge_xxx PL/pgSQL functions)
(<100 rows in primary table)
- prepares transactions
- deals with other data sources
- commits/rolls back prepared transaction depending on success of the
previous step.

An example of a merge_xxx function: http://pastebin.com/6YYm8BVM

Second stage is really:
- start transaction
- call PL/pgSQL merge_all()
- commit

2 reasons for the 2 stages:
- if stage#2 fails, the data will be merged during the next iteration
- the lock time on the shared tables is minimized

It's possible that an external process may take over writing data for
certain key subset (combination of (tagid,blockid)), to make sure there is
no race condition with such process,

Re: [GENERAL] Advice for using integer arrays?

2015-01-07 Thread Martijn van Oosterhout

On Tue, Jan 06, 2015 at 12:09:56PM -0500, Michael Heaney wrote:
> I'm fairly new to Postgres, and have a design issue for which an
> array of integers might be a good solution.  But I'd like to hear
> from the experts before proceeding down this path.

The biggest consideration is if you are ever intending to use the
values in a join condition.  Arrays a efficient space-wise and you also
have good indexing strategies with GIN indexes.  You will need to
reframe your queries in terms of ([x] subset-of field) but that's
relatively straightforward.

What doesn't work or is fiddely:

- foreign keys

- selecting part of the list

- reordering or otherwise manipulating the list.

basically, if conceptually the list is a single object which you're
really only going to want to access as a whole, but still want good
indexing, then arrays are for you.

BTW, looking at your example, you might be more interested in ranges,
see for example:
http://www.postgresql.org/docs/9.2/static/rangetypes.html

Conceptually they are a bit different and there isn't support for
multi-ranges AFAIK but they might be more appropriate.

Hope this helps,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

signature.asc
Description: Digital signature

Re: [GENERAL] Advice for using integer arrays?

2015-01-07 Thread Thomas Kellerer

> BTW, looking at your example, you might be more interested in ranges,
> see for example:
> http://www.postgresql.org/docs/9.2/static/rangetypes.html
> 
> Conceptually they are a bit different and there isn't support for
> multi-ranges AFAIK 

You could have an array of ranges




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] BDR Error restarted

2015-01-07 Thread Craig Ringer

On 01/07/2015 11:28 AM, deans wrote:
> Hi Guys.
> 
> First of all, BDR is cool, should have tried it earlier.
> 
> Environment: CentOS 6.5, PostgreSQL 9.4.0 with BDR from yum repository
> 
> Done the PostgreSQL 9.4 with BDR setup successfully by following the User
> Guide and Admin Doc, but got a issue when tried to do postgresql service
> restart on any of the cluster nodes, it always lost items in the relation
> "pg_stat_replication"  after the service restarted

It'd be weird if it didn't. That's not a table, it's a view over a
function that reports server status information.

It's like pg_stat_activity for replication.

How exactly do you restart the node?

> means lost replication
> connections, like normally we have 26 items(13 databases and 3 servers in
> cluster, so 13x2 items in pg_stat_replication on each node, yes, lots dbs on
> one cluster), but after the restart, got some random count of item in the
> relation, like 20, or 16 or 4, but not 26 as expected, i.e. not all the
> replication connections running well.

That's normal, in that it can take time for peers to notice and
re-connect. This is true whether it's BDR, other logical replication, or
conventional streaming replication.

> And the logs showing replication is
> "wait until the node has caught up", but they never catch up again. 

Please show the full log line, not just an excerpt, along with the lines
around it.

If possible run with log_error_verbosity=verbose and show the extra
line(s) printed after each log line too, please.

> BDR Settings(replaced the real db name here):
> 1. on node 01, the replication src one:
> 
> 2. on node 02, mostly like 03

You forgot these.

> Error logs:

and these.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Inconsistent bgworker behaviour

2015-01-07 Thread Craig Ringer

On 01/07/2015 11:54 AM, Beena Emerson wrote:
> 
> ResetLatch(&MyProc->procLatch);
> TerminateBackgroundWorker(workers[i]->handle);
> WaitLatch(&MyProc->procLatch, WL_LATCH_SET, 0);

This doesn't guarantee that the worker of interest has terminated, just
that your latch got set.

You should make sure the worker of interest is actually dead, and you
didn't get a SIGUSR1 for some other reason.

We could probably use a WaitForBackgroundWorkerTermination(...) to
correspond to WaitForBackgroundWorkerStartup(...) .

I think you'll probably want to GetBackgroundWorkerPid(...) and examine
the returned BgwHandleStatus to see if it's BGWH_STOPPED . If not, keep
waiting. You might want a timeout to re-check.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] NODE

2015-01-07 Thread Craig Ringer

On 01/06/2015 03:27 AM, Ravi Kiran wrote:
> *
> *
> could someone explain what exactly node mean in  postgres. 

You can think of it like a simple abstract base class that everything
else extends.

Lots of PostgreSQL is pseudo-object-oriented, so you'll see this pattern
around a fair bit.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ALTER TABLE to ADD BDR global sequence

2015-01-07 Thread Craig Ringer

On 01/04/2015 12:33 AM, John Casey wrote:
> While attempting to alter a table to add a global sequence as a primary
> key using the following commands:
> 
> CREATE SEQUENCE my_table_id_seq USING bdr;
> 
> ALTER TABLE my_table
> 
> ADD COLUMN id integer PRIMARY KEY NOT NULL DEFAULT
> nextval('my_table_id_seq'::regclass);
> 

> I started to notice some issues that concerned me. In order to create
> the sequence, I had to have the replication running. To alter the table,
> I had to stop replication. The only way I really knew how to do this was
> to remove the bdr properties in the postgres configuration file and
> restart. At that point, I executed the ALTER TABLE code, when it got to
> 15000 records, Postgres informed me that I needed to turn replication
> back on so the nodes could agree on additional sequence allocations.

In general global sequences don't play well with full table rewrites.
That's why BDR prevents the full table rewrite.

What you need to do is do the ALTER without the NOT NULL DEFAULT. Then
ALTER to add the DEFAULT so new rows get it. Now UPDATE the table in
chunks to allocate IDs where they're null. Finally, once it's fully
populated, ALTER it to add the NOT NULL DEFAULT (...) .

To get rid of the need for this it'd have to be possible to allow
blocking nextval(..) on global sequences, which for internal reasons is
way more complicated than you might expect.

> When I turned it back on, it just kind-of wigged out.

Disabling replication during sequence voting isn't something that's been
specifically tested for. Some details on "wigged out" would be useful,
though.

In general, once BDR is active it's not a great idea to disable it, make
changes, then re-activate it on a database.

> So, how is this supposed to work?

As above - create the sequence, populate IDs in batches, then set the
default and not-null constraint at the end.

> In addition, what happens when you
> have very disparate databases that are both updated often and connected
> occasionally (which is what we have). Will it quit doing inserts until
> it is connected to the other databases again? That would be really bad.

If you're relying on global sequences and your write rates are fairly
high but your databases are only intermittently connected then yes,
you're probably going to have times where you run out of allocated
sequence values.

You may want to use UUID keys instead, or one of the other conventional
approaches.

Down the track some more control over global sequences might be possible
- controlling how early new chunks are allocated, how big the chunks
are, etc. At the moment it's all pretty fixed, and it's really suited to
systems where they're connected most of the time.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Loading Data Dumps for Tables Containing BDR Global Sequence

2015-01-07 Thread Craig Ringer

On 01/04/2015 12:14 AM, John Casey wrote:
> We have been trying to load our existing database that contains local
> sequences into a BDR-based database with global sequences.

[snip]

> There just doesn’t seem to be a good way to accomplish this operation,
> at least not a documented way I have been able to find.

I think you want default_seqam = bdr .

https://wiki.postgresql.org/wiki/BDR_Global_Sequences#Global_sequences_created_by_default

You can pass this to psql or pg_restore using a PGOPTIONS setting or an
'options=' entry in a connection dsn, e.g.

pg_restore -d "dbname=mydb options='-c default_sequenceam=bdr'" dumpfile


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] NODE

2015-01-07 Thread Craig Ringer

On 01/07/2015 09:08 PM, Craig Ringer wrote:
> You can think of it like a simple abstract base class that everything
> else extends.

Ahem. Every other parse/plan tree node, that is.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ALTER TABLE to ADD BDR global sequence

2015-01-07 Thread John Casey

I have been thinking about an alternate means of implementing global sequences 
that I feel would simplify things. 

Rather than chunking out blocks, set an increment value for each sequence equal 
to the number of nodes in the "cluster". Each node has an offset. So, if you 
have 10 nodes, mode 1 has offset 0 and node 10 has offset 9. The first time a 
nextval is requested on a sequence after starting up, it makes certain that its 
value is set where val mod 10 is equal to the nodes offset. If not, it 
increments up to that value. From that point forward, sequences are incremented 
by 10 each time.

This would work even if you added new nodes, and you could add intelligence to 
support setting proper initial sequence values when tables are altered.

I may be overlooking something; but, it seems like a fairly simple solution 
that would work.

On Jan 7, 2015, at 8:25 AM, Craig Ringer  wrote:

> On 01/04/2015 12:33 AM, John Casey wrote:
> While attempting to alter a table to add a global sequence as a primary
> key using the following commands:
> 
> CREATE SEQUENCE my_table_id_seq USING bdr;
> 
> ALTER TABLE my_table
> 
> ADD COLUMN id integer PRIMARY KEY NOT NULL DEFAULT
> nextval('my_table_id_seq'::regclass);

> I started to notice some issues that concerned me. In order to create
> the sequence, I had to have the replication running. To alter the table,
> I had to stop replication. The only way I really knew how to do this was
> to remove the bdr properties in the postgres configuration file and
> restart. At that point, I executed the ALTER TABLE code, when it got to
> 15000 records, Postgres informed me that I needed to turn replication
> back on so the nodes could agree on additional sequence allocations.

In general global sequences don't play well with full table rewrites.
That's why BDR prevents the full table rewrite.

What you need to do is do the ALTER without the NOT NULL DEFAULT. Then
ALTER to add the DEFAULT so new rows get it. Now UPDATE the table in
chunks to allocate IDs where they're null. Finally, once it's fully
populated, ALTER it to add the NOT NULL DEFAULT (...) .

To get rid of the need for this it'd have to be possible to allow
blocking nextval(..) on global sequences, which for internal reasons is
way more complicated than you might expect.

> When I turned it back on, it just kind-of wigged out.

Disabling replication during sequence voting isn't something that's been
specifically tested for. Some details on "wigged out" would be useful,
though.

In general, once BDR is active it's not a great idea to disable it, make
changes, then re-activate it on a database.

> So, how is this supposed to work?

As above - create the sequence, populate IDs in batches, then set the
default and not-null constraint at the end.

> In addition, what happens when you
> have very disparate databases that are both updated often and connected
> occasionally (which is what we have). Will it quit doing inserts until
> it is connected to the other databases again? That would be really bad.

If you're relying on global sequences and your write rates are fairly
high but your databases are only intermittently connected then yes,
you're probably going to have times where you run out of allocated
sequence values.

You may want to use UUID keys instead, or one of the other conventional
approaches.

Down the track some more control over global sequences might be possible
- controlling how early new chunks are allocated, how big the chunks
are, etc. At the moment it's all pretty fixed, and it's really suited to
systems where they're connected most of the time.

-- 
Craig Ringer   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] postgresql versus riak for a global exchange

2015-01-07 Thread Seref Arikan

Hi Jonathan,
I'm really interested in the type of hybrid architecture you've mentioned.
How is the read-only index constructed in the design you're mentioning?
It'd be much appreciated if you could briefly describe the order of
writes/reads given postgres and non-postgres components of the
architecture. Another popular approach is to use Lucene for fast but not
necessarily real time indexing, similar to what you're describing but I'm
curious about the designs you've sene.

Best regards

On Tue, Jan 6, 2015 at 5:07 PM, Jonathan Vanasco  wrote:

>
> A very popular design I see is often this:
>
> - PostgreSQL for account, inventory, transactional; and all writes
> - NoSQL (Redis, Riak, Mongo, etc) for read-only index postgres
> (almost like a read-through cache) and assembled documents
>
>
> On Jan 5, 2015, at 5:46 PM, Raymond Cote wrote:
>
> > I’m familiar with both PostgreSQL and Riak (1.4, not 2.0).
> > I know that Riak 2.0 now offers strong consistency. Have not yet seen
> what that does to performance.
> > Big plusses for PostgreSQL:
> >   - you can do both relational and NOSQL tasks (the Binary JSON in the
> latest PostgreSQL).
> >   - well-tested consistency, ACID, etc.
> >   - lots of adapters and support.
> >   - big community
> >
> > Big plusses for Riak:
> >  - multi-master replication
> >  - multi-data center replication
> >  - easy to scale up
> >
> > We use PostgreSQL in combination with Riak for data storage (we have a
> tokenization service).
> > We're currently using the EnterpriseDB multi-master PostgreSQL
> replication and are quite happy with it.
> > The replication runs periodically, not streaming, so there is at least a
> 1 second delay for replication to occur.
> > Riak replicates quicker — but then you don’t have the strong relational
> structure on top.
> >
> > As mentioned earlier, ‘exchange…trade…asset’ is a bit vague.
> > In addition to just storing things, you’ll need to keep track of all
> sorts of log-in and contact info — perhaps not ideal for Riak.
> > Probably best to consider precisely what traits your planned application
> has and then look to match against the database storage.
> > May even end up with a mix of the two just as we have.
> >
> > Your decision may also depend on which development language/framework
> you chose for the implementation.
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

[GENERAL] How to exclude building/installing contrib modules on Windows

2015-01-07 Thread deepak

Hi,

I would like to exclude building and installing contrib modules on Windows.

Is there an easy way to do this?  I largely rely on the tools available in
src\tools\msvc to build using Visual Studio 2008.

Thanks,

Deepak

[GENERAL] min/max performance inequality.

2015-01-07 Thread Pawel Veselov

Hi.

I was wondering how come there is such a drastic difference between finding
max and min. Seems like "index scan backwards" is really bad... The table
is freshly re-indexed just in case. I added a count(*) in there, forcing
the seq scan, and it's even better than the backwards index scan...

db=> EXPLAIN ANALYZE select min(rowdate) from r_agrio where blockid = 4814;
   QUERY
PLAN
-
 Result  (cost=495.89..495.90 rows=1 width=0) (actual time=24.149..24.150
rows=1 loops=1)
   InitPlan 1 (returns $0)
 ->  Limit  (cost=0.00..495.89 rows=1 width=13) (actual
time=24.139..24.140 rows=1 loops=1)
   ->  Index Scan using rowdate_r_agrio on r_agrio
 (cost=0.00..222160.24 rows=448 width=13) (actual time=24.137..24.137
rows=1 loops=1)
 Index Cond: ((rowdate)::text IS NOT NULL)
 Filter: (blockid = 4814::numeric)
 Total runtime: 24.186 ms
(7 rows)

db=> EXPLAIN ANALYZE select max(rowdate) from r_agrio where blockid = 4814;

 QUERY PLAN


 Result  (cost=495.89..495.90 rows=1 width=0) (actual time=926.032..926.033
rows=1 loops=1)
   InitPlan 1 (returns $0)
 ->  Limit  (cost=0.00..495.89 rows=1 width=13) (actual
time=926.019..926.021 rows=1 loops=1)
   ->  Index Scan Backward using rowdate_r_agrio on r_agrio
 (cost=0.00..222160.24 rows=448 width=13) (actual time=926.017..926.017
rows=1 loops=1)
 Index Cond: ((rowdate)::text IS NOT NULL)
 Filter: (blockid = 4814::numeric)
 Total runtime: 926.070 ms
(7 rows)

db=> EXPLAIN ANALYZE select count(*), max(rowdate) from r_agrio where
blockid = 4814;
   QUERY PLAN

-
 Aggregate  (cost=31585.18..31585.19 rows=1 width=13) (actual
time=461.079..461.080 rows=1 loops=1)
   ->  Seq Scan on r_agrio  (cost=0.00..31582.94 rows=448 width=13) (actual
time=8.912..460.999 rows=15 loops=1)
 Filter: (blockid = 4814::numeric)
 Total runtime: 461.134 ms
(4 rows)

db=> \d r_agrio
  Table "public.r_agrio"
   Column| Type  | Modifiers
-+---+
 id  | numeric(38,0) | not null
 tagid   | numeric(38,0) | not null
 blockid | numeric(38,0) | not null
 rowdate | character varying(15) | not null
 count   | numeric(38,0) | not null default 0
 events  | numeric(38,0) | not null default 0
 devents | numeric(38,0) | not null default 0
 duration| numeric(38,0) | not null default 0
 device_type | numeric(38,0) | not null
 placement   | numeric(38,0) | not null default 0
 unserved| numeric(38,0) | not null default 0
 unconfirmed | numeric(38,0) | not null default 0
 version | numeric(38,0) | not null default 1
Indexes:
"pk_r_agrio" PRIMARY KEY, btree (id)
"u_r_agrio" UNIQUE, btree (tagid, blockid, rowdate, device_type,
placement)
"rowdate_r_agrio" btree (rowdate)

Re: [GENERAL] ALTER TABLE to ADD BDR global sequence

2015-01-07 Thread Craig Ringer

On 01/07/2015 11:57 PM, John Casey wrote:
> I have been thinking about an alternate means of implementing global 
> sequences that I feel would simplify things. 
> 
> Rather than chunking out blocks, set an increment value for each sequence 
> equal to the number of nodes in the "cluster". Each node has an offset. So, 
> if you have 10 nodes, mode 1 has offset 0 and node 10 has offset 9. The first 
> time a nextval is requested on a sequence after starting up, it makes certain 
> that its value is set where val mod 10 is equal to the nodes offset. If not, 
> it increments up to that value. From that point forward, sequences are 
> incremented by 10 each time.
> 
> This would work even if you added new nodes, and you could add intelligence 
> to support setting proper initial sequence values when tables are altered.
> 
> I may be overlooking something; but, it seems like a fairly simple solution 
> that would work.

That's the classic approach used with multiple independent nodes in a
sharding system.

I suspect that doing this in an async multi-master environment would be
risky; you'd need to be very sure that everyone knew of a new node
join/removal to avoid issues with duplicate allocations.

I wasn't involved in the design of global sequences though, and I'm not
entirely sure. I've CC'd Andres in case he has a chance to comment.

(By the way, please reply in-line, not at the top. It's difficult to
follow mailing list threads where some people post at the top and some
post in-line.)

> 
> 
> On Jan 7, 2015, at 8:25 AM, Craig Ringer  wrote:
> 
>> On 01/04/2015 12:33 AM, John Casey wrote:
>> While attempting to alter a table to add a global sequence as a primary
>> key using the following commands:
>>
>> CREATE SEQUENCE my_table_id_seq USING bdr;
>>
>> ALTER TABLE my_table
>>
>> ADD COLUMN id integer PRIMARY KEY NOT NULL DEFAULT
>> nextval('my_table_id_seq'::regclass);
> 
>> I started to notice some issues that concerned me. In order to create
>> the sequence, I had to have the replication running. To alter the table,
>> I had to stop replication. The only way I really knew how to do this was
>> to remove the bdr properties in the postgres configuration file and
>> restart. At that point, I executed the ALTER TABLE code, when it got to
>> 15000 records, Postgres informed me that I needed to turn replication
>> back on so the nodes could agree on additional sequence allocations.
> 
> In general global sequences don't play well with full table rewrites.
> That's why BDR prevents the full table rewrite.
> 
> What you need to do is do the ALTER without the NOT NULL DEFAULT. Then
> ALTER to add the DEFAULT so new rows get it. Now UPDATE the table in
> chunks to allocate IDs where they're null. Finally, once it's fully
> populated, ALTER it to add the NOT NULL DEFAULT (...) .
> 
> To get rid of the need for this it'd have to be possible to allow
> blocking nextval(..) on global sequences, which for internal reasons is
> way more complicated than you might expect.
> 
>> When I turned it back on, it just kind-of wigged out.
> 
> Disabling replication during sequence voting isn't something that's been
> specifically tested for. Some details on "wigged out" would be useful,
> though.
> 
> In general, once BDR is active it's not a great idea to disable it, make
> changes, then re-activate it on a database.
> 
>> So, how is this supposed to work?
> 
> As above - create the sequence, populate IDs in batches, then set the
> default and not-null constraint at the end.
> 
>> In addition, what happens when you
>> have very disparate databases that are both updated often and connected
>> occasionally (which is what we have). Will it quit doing inserts until
>> it is connected to the other databases again? That would be really bad.
> 
> If you're relying on global sequences and your write rates are fairly
> high but your databases are only intermittently connected then yes,
> you're probably going to have times where you run out of allocated
> sequence values.
> 
> You may want to use UUID keys instead, or one of the other conventional
> approaches.
> 
> Down the track some more control over global sequences might be possible
> - controlling how early new chunks are allocated, how big the chunks
> are, etc. At the moment it's all pretty fixed, and it's really suited to
> systems where they're connected most of the time.
> 


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] min/max performance inequality.

2015-01-07 Thread Jeff Janes

On Wed, Jan 7, 2015 at 3:00 PM, Pawel Veselov 
wrote:

> Hi.
>
> I was wondering how come there is such a drastic difference between
> finding max and min. Seems like "index scan backwards" is really bad... The
> table is freshly re-indexed just in case. I added a count(*) in there,
> forcing the seq scan, and it's even better than the backwards index scan...
>
> db=> EXPLAIN ANALYZE select min(rowdate) from r_agrio where blockid = 4814;
>

It crawls the data in rowdate order (either forward or reverse) until it
finds the first 4814.  Crawling forward it finds 4814 very early. Crawling
backwards it has to pass through a bunch of non-4814 before it finds the
first 4814.

This fact doesn't show up in your EXPLAIN ANALYZE, but if you used a more
modern version of postgresql (9.2 or above) there would be another line for
"Rows Removed by Filter:" which would tell the story of what is going on.

If you have a composite index on (blockid, rowdate), it would help make
this much faster, as it can go directly to the desired row.

Cheers,

Jeff

Re: [GENERAL] min/max performance inequality.

2015-01-07 Thread Tom Lane

Pawel Veselov  writes:
> I was wondering how come there is such a drastic difference between finding
> max and min. Seems like "index scan backwards" is really bad...

It's probably an artifact of your data distribution, ie, the "blockid =
4814" condition is skipping lots of rows at one end of the index but few
or none at the other.

If you're concerned about the performance of this type of query, an index
on (blockid, rowdate) would work a lot better than the ones you've
provided.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] min/max performance inequality.

2015-01-07 Thread Pawel Veselov

Thanks Jeff (and Tom)

On Wed, Jan 7, 2015 at 3:34 PM, Jeff Janes  wrote:

> On Wed, Jan 7, 2015 at 3:00 PM, Pawel Veselov 
> wrote:
>
>> Hi.
>>
>> I was wondering how come there is such a drastic difference between
>> finding max and min. Seems like "index scan backwards" is really bad... The
>> table is freshly re-indexed just in case. I added a count(*) in there,
>> forcing the seq scan, and it's even better than the backwards index scan...
>>
>> db=> EXPLAIN ANALYZE select min(rowdate) from r_agrio where blockid =
>> 4814;
>>
>
> It crawls the data in rowdate order (either forward or reverse) until it
> finds the first 4814.  Crawling forward it finds 4814 very early. Crawling
> backwards it has to pass through a bunch of non-4814 before it finds the
> first 4814.
>
> This fact doesn't show up in your EXPLAIN ANALYZE, but if you used a more
> modern version of postgresql (9.2 or above) there would be another line for
> "Rows Removed by Filter:" which would tell the story of what is going on.
>

Yeah, there is 10x more rows on when going backwards


>
> If you have a composite index on (blockid, rowdate), it would help make
> this much faster, as it can go directly to the desired row.
>

That does help a lot. So, when does postgres use a more-dimensional index,
even if not all dimensions are engaged (as there is an index that involves
those 2 fields, and more)? I definitely see it do that in some cases...

Even with that index, however, there is still a good difference in time
(the interest is theoretical at this point, as I found a better way to
extract that data anyway).

On a newer db.

db=> EXPLAIN analyze select min(rowdate) from r_agrio where blockid = 4814;
   QUERY
PLAN
-
 Result  (cost=521.54..521.55 rows=1 width=0) (actual time=39.770..39.770
rows=1 loops=1)
   InitPlan 1 (returns $0)
 ->  Limit  (cost=0.55..521.54 rows=1 width=13) (actual
time=39.765..39.766 rows=1 loops=1)
   ->  Index Scan using rowdate_r_agrio on r_agrio
 (cost=0.55..303738.47 rows=583 width=13) (actual time=39.763..39.763
rows=1 loops=1)
 Index Cond: ((rowdate)::text IS NOT NULL)
 Filter: (blockid = 4814::numeric)
 Rows Removed by Filter: 37246
 Total runtime: 39.798 ms
(8 rows)

db=> EXPLAIN analyze select max(rowdate) from r_agrio where blockid = 4814;

QUERY PLAN

--
 Result  (cost=521.54..521.55 rows=1 width=0) (actual
time=1497.377..1497.378 rows=1 loops=1)
   InitPlan 1 (returns $0)
 ->  Limit  (cost=0.55..521.54 rows=1 width=13) (actual
time=1497.371..1497.372 rows=1 loops=1)
   ->  Index Scan Backward using rowdate_r_agrio on r_agrio
 (cost=0.55..303738.47 rows=583 width=13) (actual time=1497.370..1497.370
rows=1 loops=1)
 Index Cond: ((rowdate)::text IS NOT NULL)
 Filter: (blockid = 4814::numeric)
 Rows Removed by Filter: 317739
 Total runtime: 1497.407 ms
(8 rows)
db=> CREATE INDEX concurrently xxx on r_agrio(rowdate,blockid);
CREATE INDEX

db=> EXPLAIN analyze select min(rowdate) from r_agrio where blockid = 4814;
   QUERY PLAN

-
 Result  (cost=85.05..85.06 rows=1 width=0) (actual time=17.585..17.585
rows=1 loops=1)
   InitPlan 1 (returns $0)
 ->  Limit  (cost=0.43..85.05 rows=1 width=13) (actual
time=17.580..17.581 rows=1 loops=1)
   ->  Index Only Scan using xxx on r_agrio  (cost=0.43..37827.09
rows=447 width=13) (actual time=17.578..17.578 rows=1 loops=1)
 Index Cond: ((rowdate IS NOT NULL) AND (blockid =
4814::numeric))
 Heap Fetches: 0
 Total runtime: 17.616 ms
(7 rows)

db=> EXPLAIN analyze select max(rowdate) from r_agrio where blockid = 4814;
QUERY
PLAN
--
 Result  (cost=85.04..85.05 rows=1 width=0) (actual time=89.141..89.142
rows=1 loops=1)
   InitPlan 1 (returns $0)
 ->  Limit  (cost=0.43..85.04 rows=1 width=13) (actual
time=89.135..89.136 rows=1 loops=1)
   ->  Index Only Scan Backward using xxx on r_agrio
 (cost=0.43..37823.09 rows=447 width=13) (actual time=89.134..89.134 rows=1
loops=1)
 Index Cond: ((rowdate IS NOT NULL) AND (blockid =
4814::numeric))
 Heap Fetches: 1
 Total runtime: 89.173 ms
(7 rows)




>
> Cheers,
>
>

Re: [GENERAL] How to exclude building/installing contrib modules on Windows

2015-01-07 Thread Michael Paquier

On Thu, Jan 8, 2015 at 6:08 AM, deepak  wrote:
> I would like to exclude building and installing contrib modules on Windows.
>
> Is there an easy way to do this?  I largely rely on the tools available in
> src\tools\msvc to build using Visual Studio 2008.
Have a look at @contrib_excludes at the top of Mkvcbuild.pm. All the
contrib modules listed there will be ignored at build and install, so
just update it according to your needs if you want to ignore one thing
or another, including any custom thing you may have copied in the code
tree.
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Improving performance of merging data between tables

2015-01-07 Thread Maxim Boguk

On Wed, Jan 7, 2015 at 8:49 PM, Pawel Veselov 
wrote:

>
> PPPS: and the last suggestion, after you finished with the "write all the
>> data into its own tables", then application should perform analyze of these
>> own tables (or you could have weird/inefficient plans during last stage).
>
>
> Any references to back this up? I don't particularly mind doing it, but I
> wonder if analysis can be more expensive the processing. These tables get a
> few hundreds of records inserted/updated, then are entirely processed (with
> expected full scans), and then deleted...
>

If these "own tables" used only in full table selects but never used in
joins - than there should be no issues.
However, once you start join these tables with anything else, you could
have very inefficient/weird plans because the database doesn't know
(without analyze) how many rows you have in these tables.

>
>
>> PS: your setup look pretty complicated and hard to analyze without seeing
>> all involved table structures, transaction/query flow, and (especially)
>> involved procedures source code.
>>
>
> Sure :) At this point, I've put together the "bulk merge" code as well. I
> can't quite see much of a difference, actually, but it's hard to trust the
> execution times, as on the same amount of data they vary from, say, 0.5s to
> 2s, and the sample data is not stepping on any other locks. In general, I'm
> afraid all those left joins and multiple scans, even over small amount of
> data, is nullifying any positive effect.
>

Now some ideas to check.
The high CPU usage usually isn't related to locking, but related to seq
scan or wrong plans or simple inefficient pl/pgsql code, locked processes
usually doesn't use too much cpu.

1)on the test database perform  select pg_stat_reset(); then perform full
round of merges, then check
select * from pg_stat_user_tables where seq_scan>0 order by seq_tup_read;
and if you find a lot of seq_scan and seq_tuple_reads on the particular
table try find where they coming from (it could be reason for high CPU
usage).

2)enable track_functions in postgresql.conf and perform the same sequence
(select pg_stat_reset() + full round of merges
) then check
select * FROM pg_stat_user_functions order by self_time desc;
and check which function using the most time.

3)old/lost prepared transactions can have deadly effect on the database
performance at whole. So check select * from pg_prepared_xact(); and verify
that you don't have a hours (or weeks) old prepared xact lying around.

PS: btw I still don't fully understood relation between the:
"
- merges data into its own node tables (using merge_xxx PL/pgSQL functions)
"
and provided code for the public."merge_all02-9A-46-8B-C1-DD" and
PUBLIC.merge_agrio.
As I see
public."merge_all02-9A-46-8B-C1-DD" calling PUBLIC.merge_agrio, and the
PUBLIC.merge_agrio updates a global table R_AGRIO (but not the "own node
table").

I think the best implementation of such task is asynchronous processing of
this changes via background process. An application only inserts events
into queue table (it lockless process), and some background process read
these data from queue table and merge it into main table (again lockless
because it single thread so no concurrent writes), and then delete the
merged data from queue table.

-- 
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ 

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com
МойКруг: http://mboguk.moikrug.ru/

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."

[GENERAL] Central management for regular tasks on multiple databases

Re: [GENERAL] Improving performance of merging data between tables

Re: [GENERAL] Advice for using integer arrays?

Re: [GENERAL] Advice for using integer arrays?

Re: [GENERAL] BDR Error restarted

Re: [GENERAL] Inconsistent bgworker behaviour

Re: [GENERAL] NODE

Re: [GENERAL] ALTER TABLE to ADD BDR global sequence

Re: [GENERAL] Loading Data Dumps for Tables Containing BDR Global Sequence

Re: [GENERAL] NODE

Re: [GENERAL] ALTER TABLE to ADD BDR global sequence

Re: [GENERAL] postgresql versus riak for a global exchange

[GENERAL] How to exclude building/installing contrib modules on Windows

[GENERAL] min/max performance inequality.

Re: [GENERAL] ALTER TABLE to ADD BDR global sequence

Re: [GENERAL] min/max performance inequality.

Re: [GENERAL] min/max performance inequality.

Re: [GENERAL] min/max performance inequality.

Re: [GENERAL] How to exclude building/installing contrib modules on Windows

Re: [GENERAL] Improving performance of merging data between tables

20 matches

Site Navigation

Mail list logo

Footer information