Re: [GENERAL] Strange security issue with Superuser access

2015-03-10 Thread PT

I've read this email over multiple times, and I don't understand
what your question is. I don't see what it is that you think is
working in an unexpected way, all the situations I see described
you claim work. Did you possible forget to put something in the
email or am I just a poor reader?

On Mon, 9 Mar 2015 17:00:14 -0400
Andrzej Pilacik  wrote:

> I ran into this yesterday and I wanted to post this to see if this is
> working as expected or it is a bug.
> 
> By creating 2 tables and creating a FK between them and then changing the
> owner of the tables to a group, I lost the ability to insert into the first
> table executing as SUPERUSER.
> I thought that SUPERUSER does not check any permissions...
> 
> Scenario:
> create role rs;
> create schema ap authorization postgres;
> 
> create table ap.table1 (a int)
> alter table ap.table1 owner to rs; -- this is a group with nobody in it
> 
> create table ap.tablefk (b INT)
> alter table ap.tablefk owner to rs;
> insert into ap.tablefk values (12)
> 
> select * from ap.tablefk
> select * from ap.table1
> 
> alter table ap.table1 add constraint apk1 primary key (a)
> alter table ap.tablefk add constraint apkfk1 primary key (b)
> 
> --- Insert as superuser
> Insert into ap.table1 values (12) --- works without an issue
> 
> 
> -- create a foreign key to second table
> ALTER TABLE ap.table1
>   ADD CONSTRAINT id_fk FOREIGN KEY (a)
>   REFERENCES ap.tablefk (b) MATCH SIMPLE
>   ON UPDATE NO ACTION ON DELETE NO ACTION;
> 
> --- Insert as superuser
> delete from ap.table1
> Insert into ap.table1 values (12) --- works without an issue
> 
> /*
> It seems like even though you execute the insert as a superuser, the
> constraint check executes as the owner of the object (table) and therefore
> needs usage permission on the ap schema
> I thought that superuser privs are "god" like and always work regardless of
> the scenario
> */
> 
> --- TO FIX
> grant usage on schema ap to rs;
> Insert into ap.table1 values (12)
> 
> 
> Can anyone explain how the FK constraint function works?  Is it executed as
> the owner of the object.  That is the only thing that would make sense for
> me.
> 
> Thank you.
> 
> Andrzej


-- 
PT 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Error in the connection to the server

2015-04-15 Thread PT
On Thu, 16 Apr 2015 02:05:34 +0530
Ravi Kiran  wrote:
> 
> I have installed postgresql-9.4.0 version.
> 
> I have started the server from eclipse indigo version using Run
> configurations.
> 
> There is a table in my database whose name is "b". whenever I give a query
> which is related to this table I get the error *"The connection to the
> server was lost. Attempting reset: Failed." *Even for the drop command it
> is giving the same error.
> But it is working fine for the rest of the queries which are not related to
> the table b.
> Please help me in resolving this error.

Find the postgresql.conf, read through it to understand where
PostgreSQL is logging, then go look at the PostgreSQL logs.
There's a 90% chance that there will be a detailed message in
the logs that will help you quickly resolve the problem. If not,
you can increase the log verbosity.


-- 
PT 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What constitutes "reproducible" numbers from pgbench?

2015-04-23 Thread PT
On Thu, 23 Apr 2015 11:07:05 +0200
 wrote:

> On Tuesday, April 21, 2015 7:43 PM, Andy Colson wrote:
> > On 4/21/2015 9:21 AM, holger.friedrich-fa-triva...@it.nrw.de wrote:
> >> Exactly what constitutes "reproducible" values from pgbench?  I keep 
> >> getting a range between 340 tps and 440 tps or something like that 
> > I think its common to get different timings.  I think its ok because things 
> > are changing (files, caches, indexes, etc).
> 
> As I found out, our test server is a virtual machine, so while I should be 
> "alone" on that virtual machine, of course I have no idea what else might be 
> going on on the physical server the virtual machine is running on.  That 
> would explain the somewhat wide variations.
> 
> Qingqing Zhou wrote that the range between 340 tps and 440 tps I keep getting 
> is not ok and numbers should be the same within several per cent.  Of course, 
> if other things are going on on the physical server, I can't always expect a 
> close match.
> 
> Since someone asked, the point of the exercise is to see if and how various 
> configurations in postgresql.conf are affecting performance.

You're going to have difficulty doing that sort of tuning and testing
on a VM. Even when there's nothing else going on, VMs tend to have
a wider range of behaviors than native installs (since things like
cron jobs can run both on the host and the guest OS, as well as
other reasons, I'm sure).

Whether such an endeavour is worthwhile depends on your reason for
doing it. If your production environment will also be a VM of similar
configuration to this one, then I would proceed with the tests, simply
tracking the +/- variance and keeping it in mind; since you'll likely
see the same variance on production.

If you're doing it for your own general learning, then it might still
be worth it, but it's hardly an idea setup for that kind of thing.

-- 
PT 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] documenting tables version control

2015-05-05 Thread PT
On Tue, 5 May 2015 15:13:42 -0500
Suresh Raja  wrote:
> 
> 
> I have tables with different versions in the same schema.  Like
> T1a, T1b, T1c
> T2a, T2b, T2c, T2d
> 
> ...
> etc.
> 
> 
> I'm interested in documenting various version of tables, may be in excel
> sheet or may be in another schema in the database.  I would like to store
> information on table name, column names, column order, pry key column etc.
> Also is there a way I can reverse engineer from the database itself.
> 
> Let me know if anybody any suggestion or format that they used.  I
> appreciate all your help and shall be suitably acknowledged.

This section of the documentation should tell you everything you
need to know:
http://www.postgresql.org/docs/9.4/static/information-schema.html

-- 
PT 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_xlog Concern

2015-05-20 Thread PT
(I'm not going to try to fix the top-posting)

I've seen what I think you're describing on several occasions.

What we basically discovered is very simple: disks have a top speed and
you can't exceed it.

If I understand you correctly, you see the # of wal segments grow very
large during and shortly after bulk data changes, then they shrink back
down to what you'd expect, but grow again during the next bulk data
change.

Essentially, what we discovered was happening was that we were doing
bulk data manipulations at about 100% the speed of the disk subsystem.
As a result, there was no additional capacity for the wal archiving to
copy files with. Archiving still runs, but it's much slowed down, just
like any other disk activity when the disks are very busy. Since
PostgreSQL thinks your data is important, it won't discard wal files
that have not yet been archived, so they stick around until it can
catch up.

If I'm diagnosing your situation correclty, you have a few options:

1) Just accept it, that's what we decided since the bulk operations
   only happened about once a month and the problem cleared up in a
   few hours.
2) Faster disks: move to SSDs or a better RAID controller or whatever
   it takes to make the disks fast enough not to have the problem.
3) Move the wal and/or the wal archive directories onto a different disk
   subsystem, which essentially increases the speed of the disks through
   "divide and conquer". You many not even need new hardware to accomplish
   this -- if you have enough disks you might benefit from rearranging
   how they're organized in the RAID controller. Of course, you'll have
   to back up and restore the system to do so.
4) Change the application that does the bulk loading to throttle itself
   so it doesn't overload the disks, which will then allow wal archiving
   to keep up through the process.

Of course, each of these solutions has its benefits and drawbacks, so
you'll have to decide which is right for you.

On Mon, 18 May 2015 17:34:21 +0300
Koray Eyidoğan  wrote:

> Hi,
> 
> Any kind of slowness on your archive directory may cause the archiving
> process fall behind thus accumulating segment files in your cluster's
> pg_xlog directory.
> 
> I assume that you are on PostgreSQL 9.4. Could you please check your
> archiver status by typing "select * from pg_catalog.pg_stat_archiver;" in
> psql ? If the last_archived_wal column's value is not so close to your
> current xlog location, then it probably means a slow write speed on your
> archive path compared to your pg_xlog path.
> 
> You can check your current xlog file by typing "select
> pg_xlogfile_name(pg_current_xlog_location());" in psql.
> 
> If you are not on PostgreSQL 9.4, you can check your archiver status by
> typing "ps -ef | grep archiver" in your shell.
> 
> I've also assumed that you are not using replication slots.
> 
> As far as I know, long running transactions (just sitting in idle) won't
> affect pg_xlog directory size. Correct me if I'm wrong.
> 
> Hope that helps.
> 
> Have a nice day.
> 
> Koray
> 
> 
> 
> 
> On Mon, May 18, 2015 at 5:00 PM, Torsten Förtsch 
> wrote:
> 
> > On 18/05/15 13:44, Sachin Srivastava wrote:
> > > But currently my pg_xlog size is 60 GB and there are 3740 WAL file in
> > > this folder and in Last week this was 400 GB(pg_xlog folder) and WAL
> > > file were approx. 3. Due to archiving pg_xlog folder size is
> > > decreasing now but it’s taking one week to come in normal size.
> >
> > Any chance you have unfinished transactions running for a week?
> >
> > pg_stat_activity should be able to tell you that.
> >
> > Torsten
> >
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-general
> >


-- 
PT 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Allowing postgresql to accept 0xff syntax for data types that it makes sense for?

2015-05-27 Thread PT

Well, whether good or bad, my employer has nixed the idea of paying me
to work on this, and I don't have personal time right now to do it,
so it's not going to be addressed by me at this time.

-- 
Bill 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Fwd: Raster performance

2015-05-27 Thread PT
On Tue, 26 May 2015 12:52:24 -0500
David Haynes II  wrote:

> Hello,
> 
> I have a question about the query optimizer and its performance on spatial
> datasets, specifically rasters. My use case is rather unique, the
> application that I am developing allows users to request summarizations of
> various geographic boundaries around the world. Therefore our raster
> datasets are global. We are in the process of conducting some benchmarks
> for our system and we noticed something unexpected.
> 
> The query is the same except the first is run on a raster (46gigs) in out
> of database (outdb) and the second is the same raster (46gigs) stored in
> database (indb). The raster is multibanded (13), with each band
> representing one entire MODIS global scene. A single year of MODIS is
> approximately 3.6 gigs.
> 
> The outdb is being out performed by indb, because the query optimizer gets
> smarter. But what is also interesting is all the extra pieces that are
> brought in with outdb.
> 
> with poly as
> ( SELECT gid, label as name, ST_Transform(geom, 6842) as geom  FROM
> us_counties )
> , rast_select as
> ( SELECT r.rid as id, s.name , ST_CLIP(r.rast, s.geom) as rast from
> rasters.glc2000 as r inner join poly as s on ST_Intersects(r.rast, s.geom) )
> select r.id, r.name, ST_Count(r.rast, 1, True)
> 
> 
>QUERY PLAN With Outdb
> --
> Sort   (cost=93911.29..93926.80 rows=6204 width=254)
>   Sort Key: r.id, r.name
>   CTE poly
> ->  Seq Scan on us_counties  (cost=0.00..112.86 rows=3109 width=62247)
>   CTE rast_select
> ->  Nested Loop  (cost=0.28..76131.41 rows=62033 width=1086)
>   ->  CTE Scan on poly s  (cost=0.00..62.18 rows=3109 width=250)
>   ->  Index Scan using modis_rast_gist on modis r_1
>  (cost=0.28..24.40 rows=2 width=836)
> Index Cond: ((rast)::geometry && s.geom)
> Filter: _st_intersects(s.geom, rast, NULL::integer)
>   ->  HashAggregate  (cost=17214.16..17276.20 rows=6204 width=254)
> ->  CTE Scan on rast_select r  (cost=0.00..1240.66 rows=62033
> width=254)
> 
> QUERY PLAN With Indb
> 
> -
> Sort   (cost=69547.29..69562.80 rows=6204 width=254)
>   Sort Key: r.id, r.name
>   CTE poly
> ->  Seq Scan on us_counties  (cost=0.00..112.86 rows=3109 width=62247)
>   CTE rast_select
> ->  Nested Loop  (cost=0.28..51767.41 rows=62033 width=272)
>   ->  CTE Scan on poly s  (cost=0.00..62.18 rows=3109 width=250)
>   ->  Index Scan using modis_noout_rast_gist on modis_noout r_1
>  (cost=0.28..16.56 rows=2 width=22)
> Index Cond: ((rast)::geometry && s.geom)
> Filter: _st_intersects(s.geom, rast, NULL::integer)
>   ->  HashAggregate  (cost=17214.16..17276.20 rows=6204 width=254)
> ->  CTE Scan on rast_select r  (cost=0.00..1240.66 rows=62033
> width=254)

I could be missing something here, but I don't see how long the queries
actually take to run. Have you actually run the queries and timed them?
Keep in mind that analyze does not actually run the query, it only plans
it, so the actual run time is unknown if all you do is analyze.

The query plans appear to be equal, assuming there are slight variances
in the names of tables from one DB to another (and I assume that your
description of indb and outdb reflects the fact that there are (for
reasons unknown) two copies of the data).

The only purpose to those estimates is to choose a good plan. If the
plan is bad for one database and both databases have the same data, then
the plan will be bad for both.

Since there have been no other responses, I'm guessing that others are
confused by your question as well. Can you describe the actual problem
that you're seeing?

-- 
Bill Moran 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Fwd: Raster performance

2015-05-29 Thread PT
On Thu, 28 May 2015 10:06:24 -0500
David Haynes II  wrote:
> 
> The query run times are significantly slower on outdb as that using indb
> here are the run times on 2 queries.
> 
> ST_Count/ST_Clip(select single band here)/Inner Join/ST_Transform (US
> Counties)
> OutDB: 873.564s (14 minutes 33s) InDB:  127.36s (2 minutes 7s)
> 
> ST_Count(select single band here)/ST_Clip(on all bands)/Inner
> Join/ST_Transform (US Counties)
> OutDB: 9537.371s (2 hours 38minutes)   InDB:  310s (5 minutes 10 seconds)
> 
> In the query planner it shows a large change in the number of columns
> (width) that are picked up in the CTE_rast_select.
> These extra columns slow down the ability to process the data.

As I mentioned before, that's not how that works. Those are estimates. If
those rows are actually different between the two tables, then your data
is different between the two databases and you can't expect the performance
to be the same.

Additionally, the part you snip out below isn't the part that's different.
In particular, the difference is coming from the fact that one of the
plans uses modis and the other uses modis_noout.

Does modis exist in indb? Does modis_noout exist on outdb? What is the
difference between these two tables? Because _that_ is where the time
difference is most likely happening (based on the explain output).

Additionally, run EXPLAIN ANALYZE on these queries to get the actual
times in addition to the estimates. Furthermore, given that a lot of
the confusion in this question is due to a lack of iformation, it would
be a good idea to include the table definitions.

> OUT DB
> CTE rast_select
> > ->  Nested Loop  (cost=0.28..76131.41 rows=62033 *width=1086)*
> >   ->  CTE Scan on poly s  (cost=0.00..62.18 rows=3109 width=250)
> 
> 
> In DB
> Nested Loop  (cost=0.28..51767.41 rows=62033 *width=272*)
> >   ->  CTE Scan on poly s  (cost=0.00..62.18 rows=3109 width=250)
> 
> On Wed, May 27, 2015 at 4:31 PM, PT  wrote:
> 
> > On Tue, 26 May 2015 12:52:24 -0500
> > David Haynes II  wrote:
> >
> > > Hello,
> > >
> > > I have a question about the query optimizer and its performance on
> > spatial
> > > datasets, specifically rasters. My use case is rather unique, the
> > > application that I am developing allows users to request summarizations
> > of
> > > various geographic boundaries around the world. Therefore our raster
> > > datasets are global. We are in the process of conducting some benchmarks
> > > for our system and we noticed something unexpected.
> > >
> > > The query is the same except the first is run on a raster (46gigs) in out
> > > of database (outdb) and the second is the same raster (46gigs) stored in
> > > database (indb). The raster is multibanded (13), with each band
> > > representing one entire MODIS global scene. A single year of MODIS is
> > > approximately 3.6 gigs.
> > >
> > > The outdb is being out performed by indb, because the query optimizer
> > gets
> > > smarter. But what is also interesting is all the extra pieces that are
> > > brought in with outdb.
> > >
> > > with poly as
> > > ( SELECT gid, label as name, ST_Transform(geom, 6842) as geom  FROM
> > > us_counties )
> > > , rast_select as
> > > ( SELECT r.rid as id, s.name , ST_CLIP(r.rast, s.geom) as rast from
> > > rasters.glc2000 as r inner join poly as s on ST_Intersects(r.rast,
> > s.geom) )
> > > select r.id, r.name, ST_Count(r.rast, 1, True)
> > >
> > >
> > >QUERY PLAN With Outdb
> > >
> > --
> > > Sort   (cost=93911.29..93926.80 rows=6204 width=254)
> > >   Sort Key: r.id, r.name
> > >   CTE poly
> > > ->  Seq Scan on us_counties  (cost=0.00..112.86 rows=3109
> > width=62247)
> > >   CTE rast_select
> > > ->  Nested Loop  (cost=0.28..76131.41 rows=62033 width=1086)
> > >   ->  CTE Scan on poly s  (cost=0.00..62.18 rows=3109 width=250)
> > >   ->  Index Scan using modis_rast_gist on modis r_1
> > >  (cost=0.28..24.40 rows=2 width=836)
> > > Index Cond: ((rast)::geometry && s.geom)
> > > Filter: _st_intersects(s.geom, rast, NULL::integer)
> > >   ->  HashAggregate  (cost=17214.16..17276.20 rows=6204 width=254)
> > > ->  CTE Scan on rast_select r  (cost=0.00..1240.66 rows=62033
> > > width=254)
> > >
> > > 

Re: [GENERAL] Planner cost adjustments

2015-05-29 Thread PT
On Fri, 29 May 2015 09:39:00 -0400
Daniel Begin  wrote:

> Hi all,
> 
> Running some queries, I found that the planner often selects sequential scan
> instead of an index scan, even if the latter is way faster (one order of
> magnitude faster if I consider some tests I made by setting enable_seqscan =
> ON/OFF). How can I figure out what parameter I should adjust to get the
> planner select an appropriate plan that would better consider my DB setup?
> 
> I had a look at
> http://www.postgresql.org/docs/9.3/static/runtime-config-query.html but at
> this point it is too much information for me;-)  Any rules of thumb, recipes
> I could use to select which parameters I should look at first?

Here's the correct way to handle this:

1) As mentioned elsewhere, first take the time to ensure that your
   cost estimate settings are reasonable for your hardware.  See
   section 18.7.2:
   http://www.postgresql.org/docs/9.4/static/runtime-config-query.html
2) If #1 doesnt' fix it, don't change enable_seqscan. Run a bunch of
   tests on the query(s) to see how well it performs. Then do
   ANALYZE DATABASE $insert_name_here; and run all the tests again.
   If performance/planning improves, then the analyze settings on
   your server aren't aggressive enough. Make changes to related
   config settings to fix.
3) If #2 doesn't uncover the problem, run EXPLAIN ANALYZE on all the
   queries in your test. It takes a bit of understanding to do this
   step, so you'll want to read up a bit and possibly ask questions
   if you have trouble interpreting the output, but you're looking
   for discrepencies between the estimated and actual times for any
   particular table. If you find them, that tends to indicate that
   you'll need to update statistics targets on any tables with the
   problem. See:
   http://www.postgresql.org/docs/9.4/static/planner-stats.html
4) If #3 doesn't fix things, then the PostgreSQL developers want to
   know about your problem so they can improve the planner. First,
   if there are queries that are causing you problems, update the
   application to disable sequential scans _for_those_particular_
   _queries_ so your application continues to trundle along but
   don't disable sequential scans globally, as that may cause
   other queries to perform badly. Once that immediate problem is
   out of the way, put together a test case that demonstrates the
   problem you're having (but doesn't contain any proprietary
   data, etc) and post it to the list so the developers can figure
   out what to do to improve Postgres.

Hope this helps.

-- 
Bill Moran 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Fwd: Raster performance

2015-05-29 Thread PT
l
> time=257.610..863474.778 rows=7677 loops=1)
>   ->  CTE Scan on poly s  (cost=0.00..62.18 rows=3109 width=250)
> (actual time=1.341..7030.138 rows=3109 loops=1)
>   ->  Index Scan using modis_rast_gist on modis r_1
>  (cost=0.28..24.40 rows=2 width=836) (actual time=1.481..3.952 rows=2
> loops=3109)
> Index Cond: ((rast)::geometry && s.geom)
>     Filter: _st_intersects(s.geom, rast, NULL::integer)
> Rows Removed by Filter: 0
>   ->  HashAggregate  (cost=17214.16..17276.20 rows=6204 width=254) (actual
> time=866317.923..866320.316 rows=7612 loops=1)
> ->  CTE Scan on rast_select r  (cost=0.00..1240.66 rows=62033
> width=254) (actual time=257.625..863555.082 rows=7677 loops=1)
> Total runtime: 866691.113 ms
> 
> On Fri, May 29, 2015 at 6:40 AM, PT  wrote:
> 
> > On Thu, 28 May 2015 10:06:24 -0500
> > David Haynes II  wrote:
> > >
> > > The query run times are significantly slower on outdb as that using indb
> > > here are the run times on 2 queries.
> > >
> > > ST_Count/ST_Clip(select single band here)/Inner Join/ST_Transform (US
> > > Counties)
> > > OutDB: 873.564s (14 minutes 33s) InDB:  127.36s (2 minutes 7s)
> > >
> > > ST_Count(select single band here)/ST_Clip(on all bands)/Inner
> > > Join/ST_Transform (US Counties)
> > > OutDB: 9537.371s (2 hours 38minutes)   InDB:  310s (5 minutes 10 seconds)
> > >
> > > In the query planner it shows a large change in the number of columns
> > > (width) that are picked up in the CTE_rast_select.
> > > These extra columns slow down the ability to process the data.
> >
> > As I mentioned before, that's not how that works. Those are estimates. If
> > those rows are actually different between the two tables, then your data
> > is different between the two databases and you can't expect the performance
> > to be the same.
> >
> > Additionally, the part you snip out below isn't the part that's different.
> > In particular, the difference is coming from the fact that one of the
> > plans uses modis and the other uses modis_noout.
> >
> > Does modis exist in indb? Does modis_noout exist on outdb? What is the
> > difference between these two tables? Because _that_ is where the time
> > difference is most likely happening (based on the explain output).
> >
> > Additionally, run EXPLAIN ANALYZE on these queries to get the actual
> > times in addition to the estimates. Furthermore, given that a lot of
> > the confusion in this question is due to a lack of iformation, it would
> > be a good idea to include the table definitions.
> >
> > > OUT DB
> > > CTE rast_select
> > > > ->  Nested Loop  (cost=0.28..76131.41 rows=62033 *width=1086)*
> > > >   ->  CTE Scan on poly s  (cost=0.00..62.18 rows=3109
> > width=250)
> > >
> > >
> > > In DB
> > > Nested Loop  (cost=0.28..51767.41 rows=62033 *width=272*)
> > > >   ->  CTE Scan on poly s  (cost=0.00..62.18 rows=3109
> > width=250)
> > >
> > > On Wed, May 27, 2015 at 4:31 PM, PT  wrote:
> > >
> > > > On Tue, 26 May 2015 12:52:24 -0500
> > > > David Haynes II  wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I have a question about the query optimizer and its performance on
> > > > spatial
> > > > > datasets, specifically rasters. My use case is rather unique, the
> > > > > application that I am developing allows users to request
> > summarizations
> > > > of
> > > > > various geographic boundaries around the world. Therefore our raster
> > > > > datasets are global. We are in the process of conducting some
> > benchmarks
> > > > > for our system and we noticed something unexpected.
> > > > >
> > > > > The query is the same except the first is run on a raster (46gigs)
> > in out
> > > > > of database (outdb) and the second is the same raster (46gigs)
> > stored in
> > > > > database (indb). The raster is multibanded (13), with each band
> > > > > representing one entire MODIS global scene. A single year of MODIS is
> > > > > approximately 3.6 gigs.
> > > > >
> > > > > The outdb is being out performed by indb, because the query optimizer
> > > > gets
> > > > > smarter. But what is also interesting is all the 

Re: [GENERAL] postgres: dbname dbuser 9.9.9.9[2222] PARSE waiting

2017-06-27 Thread PT
On Tue, 27 Jun 2017 18:41:25 -0400
Melvin Davidson  wrote:

> On Tue, Jun 27, 2017 at 6:32 PM, Adrian Klaver 
> wrote:
> 
> > On 06/27/2017 01:10 PM, DrakoRod wrote:
> >
> >> Hi folks.
> >>
> >> Today I had a problem with production's database PostgreSQL version
> >> 9.4.4.9.
> >> The server have max_connections set to 200, but today I reviewed
> >> pg_stat_activity and saw 199 active connections, obviously the server
> >> rejected any new connection and the production stopped.
> >>
> >> I saw another posts with a similar problems, but this was because the
> >> pg_xlog was full or disk does'nt write, but the directory and disk  had no
> >> problems.
> >>
> >> I just canceled some SELECTs querys and the server returned to normality.
> >> Now a monitoring activity of server and I can see some backends like this:
> >>
> >> postgres  9737 23340  2 14:55 ?00:00:15 postgres: dbname user
> >> 8.8.8.8[37082] idle in transaction
> >> postgres  9741 23340  9 14:55 ?00:00:47 postgres: dbname user
> >> 8.8.8.8[54286] idle in transaction
> >>
> >> Any suggestions?
> >>
> >
> > https://www.postgresql.org/docs/9.4/static/monitoring-stats.
> > html#PG-STAT-ACTIVITY-VIEW
> >
> > SELECT * FROM pg_stat_activity WHERE state = 'idle in transaction';
> >
> > To get more detail on what is holding these transactions open.
> >
> 
> *His problem is NOT 'idle in transaction' per se. It is all connections are
> used up.*
> *Hence the need for pg_bouncer for connection pooling.*

That assessment lacks enough information to be substantiated.

One of the things I've frequently seen happen is applications written with
ORMs will create a new connection because the existing connections in the
pool are already in a transaction. If his application is not properly
committing transactions, an additional pooler layer will not improve on
the problem. Hence, what he needs to do first is gather more information and
understand exactly what's going on.

Of course, if usage has just scaled up to the point where he doesn't have
any free connections, then your assessment might be correct. But he hasn't
provided enough information to be sure of that.

Regardless, lots of "idle in transaction" connections that stick around a
long time is a clear sign of application bugs. If they're not the cause
of his immediate problem, they will be the cause of problems at some point,
so he might as well track them down and fix them.

-- 
PT 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] postgres: dbname dbuser 9.9.9.9[2222] PARSE waiting

2017-06-27 Thread PT
On Tue, 27 Jun 2017 16:16:53 -0700 (MST)
DrakoRod  wrote:

> Yep, the real problem was all connections are used up. A ps command showed
> this:
> 
> postgres  1172 23340  1 13:00 ?00:01:23 postgres: dbsomething
> dbsomething 8.8.8.1[34024] PARSE waiting
> postgres  1527 23340  3 13:07 ?00:02:47 postgres: dbsomething
> dbsomething 8.8.8.2[49193] PARSE waiting
> postgres  1869 23340  1 13:13 ?00:01:05 postgres: dbsomething
> dbsomething 8.8.8.1[34209] PARSE waiting
> postgres  1963 23340  0 13:15 ?00:00:23 postgres: dbsomething
> dbsomething 8.8.8.1[34244] PARSE waiting
> postgres  2408 23340  2 13:23 ?00:01:31 postgres: dbsomething
> dbsomething 8.8.8.3[38324] PARSE waiting
> postgres  2442 23340  3 13:23 ?00:02:19 postgres: dbsomething
> dbsomething 8.8.8.3[38359] PARSE waiting
> postgres  2526 23340  2 13:25 ?00:01:39 postgres: dbsomething
> dbsomething 8.8.8.2[49994] PARSE waiting
> postgres  2533 23340  2 13:25 ?00:02:00 postgres: dbsomething
> dbsomething 8.8.8.4[58916] PARSE waiting
> postgres  2616 23340  2 13:26 ?00:01:28 postgres: dbsomething
> dbsomething 8.8.8.3[38496] PARSE waiting
> postgres  2632 23340  3 13:27 ?00:02:09 postgres: dbsomething
> dbsomething 8.8.8.2[50088] idle in transaction
> postgres  2644 23340  0 13:27 ?00:00:25 postgres: dbsomething
> dbsomething 8.8.8.4[58999] PARSE waiting
> postgres  2787 23340  0 13:30 ?00:00:16 postgres: dbsomething
> dbsomething 8.8.8.5[57944] PARSE waiting
> postgres  2815 23340  1 13:31 ?00:00:52 postgres: dbsomething
> dbsomething 8.8.8.2[50263] PARSE waiting
> postgres  2822 23340  0 13:31 ?00:00:29 postgres: dbsomething
> dbsomething 8.8.8.4[59158] PARSE waiting
> postgres  2825 23340  1 13:31 ?00:00:47 postgres: dbsomething
> dbsomething 8.8.8.4[59161] PARSE waiting
> postgres  2826 23340  0 13:31 ?00:00:11 postgres: dbsomething
> dbsomething 8.8.8.4[59163] PARSE waiting
> postgres  2876 23340  0 13:32 ?00:00:26 postgres: dbsomething
> dbsomething 8.8.8.1[34469] PARSE waiting
> postgres  2888 23340  0 13:32 ?00:00:36 postgres: dbsomething
> dbsomething 8.8.8.3[38729] PARSE waiting
> postgres  2911 23340  0 13:33 ?00:00:11 postgres: dbsomething
> dbsomething 8.8.8.2[50352] PARSE waiting
> postgres  2912 23340  0 13:33 ?00:00:36 postgres: dbsomething
> dbsomething 8.8.8.2[50353] PARSE waiting
> postgres  2916 23340  0 13:33 ?00:00:30 postgres: dbsomething
> dbsomething 8.8.8.3[38750] PARSE waiting
> postgres  2922 23340  0 13:33 ?00:00:33 postgres: dbsomething
> dbsomething 8.8.8.4[59238] PARSE waiting
> postgres  2927 23340  1 13:33 ?00:00:38 postgres: dbsomething
> dbsomething 8.8.8.4[59242] PARSE waiting
> postgres  3012 23340  0 13:35 ?00:00:03 postgres: dbsomething
> dbsomething 8.8.8.2[50439] PARSE waiting
> postgres  3017 23340  0 13:35 ?00:00:01 postgres: dbsomething
> dbsomething 8.8.8.3[38833] PARSE waiting
> postgres  3018 23340  0 13:35 ?00:00:27 postgres: dbsomething
> dbsomething 8.8.8.3[38834] PARSE waiting
> postgres  3020 23340  0 13:35 ?00:00:24 postgres: dbsomething
> dbsomething 8.8.8.4[59318] PARSE waiting
> postgres  3026 23340  0 13:35 ?00:00:04 postgres: dbsomething
> dbsomething 8.8.8.4[59323] PARSE waiting
> postgres  3033 23340  0 13:35 ?00:00:15 postgres: dbsomething
> dbsomething 8.8.8.4[59328] PARSE waiting
> 
> 
> When I ran *SELECT * FROM pg_stat_activity*, the state in all queries was
> active and most were SELECTs, then the server did not open new connections.
> I canceled many queries (only SELECTs) and server back to normal.
> 
> I understand that the principal problem probably are the application, of
> that I'm sure, but in the process debug. The best way to avoid or "fix" this
> are with connections pool like pgbouncer? How is the most secure way to
> return connections without restart service?

There are various timeout settings that can be configured:
https://www.postgresql.org/docs/9.6/static/runtime-config-client.html
idle_in_transation_session_timeout is probably the one you want to
enable.

It's likely that your application developers will start to complain about
database "errors" once you enable that, as connections will get killed and
cause errors on the application. You'll need to work to educate your
developers on how to fix their application so the situation stops happening.

-- 
PT 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strange case of database bloat

2017-07-05 Thread PT
On Wed, 5 Jul 2017 07:18:03 +0200
Chris Travers  wrote:

> Hi;
> 
> First, I haven't seen major problems of database bloat in a long time which
> is why I find this case strange.  I wanted to ask here what may be causing
> it.
> 
> Problem:
> ==
> Database is in the 100GB to 200GB size range, running on btrfs (not my
> choice) with nodatacow enabled (which I set up to fix a performance
> issue).  The workload is a very heavy batch-update workload.
> 
> The database bloats linearly.  I have measured this on one  table (of 149M
> rows).
> 
> After vacuum full this table is (including indexes): 17GB
> Every 24 hrs, seems to add its original space in size to the file system
> +/-.
> 
> Bloat seems to be affecting both indexes and underlying tables.
> 
> Vacuum verbose does not indicate a disproportionate number of rows being
> unremovable.  So autovacuum is keeping up without too much difficulty.
> 
> 
> Troubleshooting so far
> ===
> 
>  filefrag finds a single extent on each file, so copy-on-write is not the
> culprit
> 
> Selecting the smallest 10 values of ctid from one of the bloating tables
> shows the first page used is around page 35 with one row per used page (and
> large gaps in between).
> 
> Questions
> ===
> I assume that it is the fact that rows update frequently which is the
> problem here? But why doesn't Postgres re-use any of the empty disk pages?
> 
> More importantly, is there anything that can be done to mitigate this issue
> other than a frequent vacuum full?

2x the working size for a frequently updated table isn't terrible bloat. Or are
you saying it grows 2x every 24 hours and keeps growing? The real question is
how often the table is being vacuumed. How long have you let the experiment run
for? Does the table find an equilibrium size where it stops growing? Have you
turned on logging for autovacuum to see how often it actually runs on this
table?

No unremovable rows does not indicate that autovaccum is keeping up. It just
indicates that you don't have a problem with uncommitted transactions holding
rows for long periods of time.

Have you looked at tuning the autovacuum parameters for this table? More 
frequent
vacuums should keep things more under control. However, if the write load is
heavy, you'll probably want to lower autovacuum_vacuum_cost_delay. Personally,
I feel like the default value for this should be 0, but there are likely those
that would debate that. In any event, if that setting is too high it can cause
autovacuum to take so long that it can't keep up. In theory, setting it too low
can cause autovaccum to have a negative performance impact, but I've never seen
that happen on modern hardware.

But that's all speculation until you know how frequently autovacuum runs on
that table and how long it takes to do its work.

-- 
PT 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-23 Thread PT
l time=0.003..0.330 rows=200 loops=1)
>  Planning time: 1.094 ms
>  Execution time: 4086.333 ms
> 
> 
> -- '...'::hstore ? id
> SELECT ('''' || string_agg(id::text || '=>NULL', ',') || '''::hstore') AS
> hstore_clause
> FROM (SELECT id FROM ids GROUP BY id ORDER BY id LIMIT 200) AS s \gset
> 
> EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE :hstore_clause ?
> id::text;
>  Planning time: 0.206 ms
>  Execution time: 5032.794 ms
> 
> 
> -- '...'::jsonb ? id
> SELECT ('''{' || string_agg('"' || id::text || '": null', ',') ||
> '}''::jsonb') AS jsonb_clause
> FROM (SELECT id FROM ids GROUP BY id ORDER BY id LIMIT 200) AS s \gset
> 
> EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM ids WHERE :jsonb_clause ?
> id::text;
>  Planning time: 0.114 ms
>  Execution time: 9277.307 ms
> 
> 
> IN-VALUES clause has the bestest perfomance. So I have some questions:
> 
> - May be exist better solution?
> - Does PostgreSQL have support of hashset structure? Extension (I don't
> found)?
> - IN-VALUES clause adds new node to plan. Has additional node big overhead?
> How about filter by two or more IN-VALUES clause?
> 
> Thanks.


-- 
PT 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

2017-07-24 Thread PT
On Mon, 24 Jul 2017 13:17:56 +0300
Dmitry Lazurkin  wrote:

> On 07/24/2017 01:40 AM, PT wrote:
> > In this example you count approximately 40,000,000 values, which is
> > about 40% of the table. 
>
> 4 000 000 (:
>
> > If you really need these queries to be faster, I would suggest
> > materializing the data, i.e. create a table like:
> >
> > CREATE TABLE id_counts (
> >  id BIGINT PRIMARY KEY,
> >  num BIGINT
> > )
> >
> > Then use a trigger or similar technique to keep id_counts in sync
> > with the id table. You can then run queries of the form:
> >
> > SELECT sum(num) FROM id_counts WHERE id IN :values:
> >
> > which I would wager houseboats will be significantly faster.
> I use count only for example because it uses seqscan. I want optimize
> IN-clause ;-).

The IN clause is not what's taking all the time. It's the processing of
millions of rows that's taking all the time.

Perhaps you should better describe what it is you really want to accomplish.
Regardless of what it is, if it involves processing many millions of rows,
you're probably going to need to do some sort of materialization.

-- 
PT 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG and database encryption

2017-08-22 Thread PT
On Tue, 22 Aug 2017 12:48:13 -0700 (MST)
rakeshkumar464  wrote:

> We have a requirement to encrypt the entire database.  What is the best tool
> to accomplish this. Our primary goal is that it should be transparent to the
> application, with no change in the application, as compared to un-encrypted
> database. Reading about pgcrypto module, it seems it is good for few columns
> only and using it to encrypt entire database is not a good use-case.
> 
> Is this which can be done best by file level encryption?  What are the good
> tools on Linux (RHES), preferably open-source.

"encrypt the database" is bullshit wank terminology for "we're a government
agency and don't know what we're talking about"

On multiple occasions, I demonstrated that an unecrypted database was the
least likely disclosure vector for sensative data, and that we shouldn't
waste any time on it until we had ensured that all other breach vectors had
been fixed.  Over the course of 4 years at that job, we never managed to get
all the other (more likely) breach vectors secured.

While it's possible that you've already fixed all other breach
vectors, I'd be willing to bet actual money that you have not.
The very fact that you ask for something that "is transparent to the
application" tells me that you're not going to actually implement it
effectively anyway.

As a result, my opinion would be that you use filesystem encryption. It's
very efficient, low management overhead, and proven technology that doesn't
interfere with anything else you're doing. You can then check that box on
whatever form you have to fill out and the beaurocrats will leave you alone.
On top of that, it effectivley protects againts possible breach vectors that
don't require changing the application.

Real security will require changing the application. But take my word for it,
nobody wants to hear the list of breach vectors that can only be fixed by
modifying the application. Because people aren't interested in real security,
they're just interested in checking boxes on a form.

-- 
PT 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] VM-Ware Backup of VM safe?

2017-09-20 Thread PT
On Wed, 20 Sep 2017 10:03:15 +0200
Thomas Güttler  wrote:

> We run a PostgreSQL 9.6 server in a virtual machine.
> 
> The virtual machine is managed by the customer.
> 
> He does backup the VM.
> 
> Is this enough, is this safe?

There are so many variables involved with doing that ... I don't think
anyone can reliably answer that question.

I recommend you put together a periodic test schedule where you restore
a machine from the backup and ensure everything works. To be honest, you
should be doing that anyway.

-- 
PT 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general