[GENERAL] Domain check constraint not honored?

2015-10-29 Thread Eric Schwarzenbach
I have created a custom type as a domain based on text, which adds a 
check constraint using a regexp to limit it to containing digits and 
'.'. However I am finding I can add values with other characters to a 
column of this type. Is this to be expected for some reason?


Or alternately, did I define the constraint wrong somehow? It is defined 
thus:


CREATE DOMAIN hierpath AS text
CHECK(
   VALUE ~ '[0-9.]+'
);

Checking the docs I'm pretty sure the '.' doesn't need escaping but I 
also tried a test leaving it out ( '[0-9]+') and the result is the same. 
It lets me store letters in a column defined to be of this type.


The version() function tells me
"PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 
4.8.2-19ubuntu1) 4.8.2, 64-bit"


Thanks,
Eric


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Domain check constraint not honored?

2015-10-29 Thread Eric Schwarzenbach


Thank you! (Slapping head)
Your regexp seems to do the trick.

On 10/29/2015 01:49 PM, Rob Sargent wrote:

On 10/29/2015 11:41 AM, Eric Schwarzenbach wrote:
I have created a custom type as a domain based on text, which adds a 
check constraint using a regexp to limit it to containing digits and 
'.'. However I am finding I can add values with other characters to a 
column of this type. Is this to be expected for some reason?


Or alternately, did I define the constraint wrong somehow? It is 
defined thus:


CREATE DOMAIN hierpath AS text
CHECK(
   VALUE ~ '[0-9.]+'
);

Checking the docs I'm pretty sure the '.' doesn't need escaping but I 
also tried a test leaving it out ( '[0-9]+') and the result is the 
same. It lets me store letters in a column defined to be of this type.


The version() function tells me
"PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc 
(Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit"


Thanks,
Eric


I think you regexp is too weak.  So long as the value has a digit or 
period, it's good.

'^[0-9.]+$' might work




Re: [GENERAL] Domain check constraint not honored?

2015-10-29 Thread Eric Schwarzenbach

On 10/29/2015 03:44 PM, Alvaro Herrera wrote:

Rob Sargent wrote:
  

Also thought I should mention that there is an ip address type if that's
what you're trying to accomplish.

Looking at the domain name, I wonder whether contrib/ltree would be
helpful.
Very observant! This is indeed part of a hierarchical data solution. 
Thanks for the suggestion, but this solution has been in place and 
working for a few years already. I'm not positive, but I think I may 
have looked at ltree when I first implemented it, but decided against it 
in favor of a transitive closure table, augmented with this path for 
sorting.
(I do sometimes wonder whether the transitive closure table is worth it 
vs just a materialized path.)


I'm just now converting that path to use a custom domain (along with 
custom operators) instead of just being a string. (The custom operators 
allow the paths to be sorted properly without each segment needing to be 
filled with zeros to a fixed length.) (Also FWIW, the latest version of 
this regexp is now '^([0-9]+.)*[0-9]+$')




Cheers,

Eric


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Domain check constraint not honored?

2015-10-30 Thread Eric Schwarzenbach

Thank you! (Slapping head)
Your regexp seems to do the trick.

On 10/29/2015 01:49 PM, Rob Sargent wrote:

On 10/29/2015 11:41 AM, Eric Schwarzenbach wrote:
I have created a custom type as a domain based on text, which adds a 
check constraint using a regexp to limit it to containing digits and 
'.'. However I am finding I can add values with other characters to a 
column of this type. Is this to be expected for some reason?


Or alternately, did I define the constraint wrong somehow? It is 
defined thus:


CREATE DOMAIN hierpath AS text
CHECK(
   VALUE ~ '[0-9.]+'
);

Checking the docs I'm pretty sure the '.' doesn't need escaping but I 
also tried a test leaving it out ( '[0-9]+') and the result is the 
same. It lets me store letters in a column defined to be of this type.


The version() function tells me
"PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc 
(Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit"


Thanks,
Eric


I think you regexp is too weak.  So long as the value has a digit or 
period, it's good.

'^[0-9.]+$' might work




Re: [GENERAL] Domain check constraint not honored?

2015-10-30 Thread Eric Schwarzenbach

On 10/30/2015 09:53 AM, Jim Nasby wrote:

On 10/29/15 5:29 PM, Eric Schwarzenbach wrote:

I'm just now converting that path to use a custom domain (along with
custom operators) instead of just being a string. (The custom operators
allow the paths to be sorted properly without each segment needing to be
filled with zeros to a fixed length.) (Also FWIW, the latest version of
this regexp is now '^([0-9]+.)*[0-9]+$')


Have you looked at using int[]? It wouldn't be hard to go between that 
and the string representation using string_to_array() and 
array_to_string(). There's also a chance that eventually you'd be able 
to do FKs on it.
Do you mean making the column int[] and converting to string if needed, 
or converting the string column to int[] for the purposes of the 
ordering algorithm?


I did consider making the column int[] instead of a string, and it would 
probably be slightly more efficient in a few ways. My main hesitations 
were having to revisit the code that puts together this path, and 
compatibility (at the moment we're only using PostgreSQL but we've had 
to run on other databases for certain clients in the past, and in theory 
are open to that in the future). I realize the compatibility concern is 
a little humorous in light of having gone down the 
custom-operator-for-sorting route, but I can always fall back to 0 padding.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PDF files: to store in database or not

2016-12-06 Thread Eric Schwarzenbach

On 12/06/2016 01:30 PM, Rich Shepard wrote:
  With no experience of storing binary data in a bytea column I don't 
know

when its use is appropriate. I suspect that for an application I'm
developing it would be better to store row-related documents outside the
database, and want to learn if that is the appropriate approach.

  Consider an application that manages a fleet of vehicles. There's a
Vehicles table with information on each one (perhaps make, model, VIN, 
year
of purchase) and a Services table. There are many PDF documents 
associated
with each row in the tables: purchase contract, insurance form, 
service and

maintenance records, etc.

  My thinking is to not store these documents in the database, but to 
store

them in subdirectories outside the database.

  Your thoughts?

Rich
I'd also be interested in answers to this that are give specific pros 
and cons, and not in terms of "its better to do this than that."
What's "better" depends on how much you value the various pros and the 
cons.


One of the pros of keeping them in the database is ease of protecting 
adds and updates to the files and their related data with a transaction 
and being able to have system where it iss pretty much impossible for 
the documents to ever be out of sync with the related data.


I maintain some systems that do keep the documents outside of the 
database, and the application code maintains the transactional integrity 
of the files and data, and for the most part we don't have integrity 
problems. In the worst of an add or update operation being interrupted 
by a system crash or unexpected error, we have a new document saved but 
the data about this document has not been written to the database and it 
is as if that operation never happened. The file may really be there but 
the system does not "know about it." This works even for updates because 
our system versions documents and the old version is not written over, 
there is simply a new version that the system never "knows" about. 
Without versioning this would be more of a problem, and you would 
probably need to protect yourself with code that does something like 
temporarily keeping the last version of a file during an update and 
switching over the metadata to reference the new document only at the 
very last operation in the transaction.


We also have the potential of the database not matching the file store 
when a system is migrated or "cloned." We are very careful about this, 
but we've at least once had a case where a client's IT depart screwed it 
up, and got a mismatched system to which they started writing new data. 
Luckily this was a test or staging system and no production data was lost.


I've often wondered if we'd have been better off storing the files in 
the database. This design decision was made some years ago, and our 
concerns around this had to do with performance, but I don't know that 
we had any real data that this should have been a concern, and I suspect 
you could ameliorate if not eliminate this as an issue by careful 
design. I'd loved to hear this idea confirmed or debunked by someone who 
has more expertise (and ideally, done actual testing).


Cheers,

Eric


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PDF files: to store in database or not

2016-12-06 Thread Eric Schwarzenbach

On 12/06/2016 01:34 PM, Joshua D. Drake wrote:

On 12/06/2016 10:30 AM, Rich Shepard wrote:

  My thinking is to not store these documents in the database, but to 
store

them in subdirectories outside the database.

  Your thoughts?


Due to the widely variable size of a PDF document, I would say no. I 
would store the metadata and file location.



Can you elaborate on this? Why is the variable size an issue? Are you 
assuming the files go into the same table as the rest of the data? (They 
certainly don't have to, and I would assume that not to be the smartest 
design.)



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PDF files: to store in database or not

2016-12-06 Thread Eric Schwarzenbach

On 12/06/2016 02:40 PM, Joshua D. Drake wrote:

On 12/06/2016 11:12 AM, Eric Schwarzenbach wrote:

On 12/06/2016 01:34 PM, Joshua D. Drake wrote:

On 12/06/2016 10:30 AM, Rich Shepard wrote:


  My thinking is to not store these documents in the database, but to
store
them in subdirectories outside the database.

  Your thoughts?


Due to the widely variable size of a PDF document, I would say no. I
would store the metadata and file location.



Can you elaborate on this? Why is the variable size an issue?


Because it will use at least that size in memory to deliver the 
document to you. Consider a 100MB PDF (not at all uncommon), now 
imagine 40 connections requesting that PDF.


Are you sure the whole thing necessarily gets pulled into memory? JDBC 
and ODBC support streaming on their BLOB interfaces and isn't the whole 
point of this that an application can stream large files a chunk at a 
time, the same way it would from the file system? Of course if the db 
engine always pulls the whole thing into memory to work with it 
regardless of the API, that's another thing, but that wouldn't seem like 
a very good design, and I have more faith in the PostgreSQL developers 
than that...but I'd certainly like to know for sure.


Cheers,

Eric




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Proper relational database?

2016-04-22 Thread Eric Schwarzenbach

On 04/22/2016 06:21 AM, David Goodenough wrote:

On Thursday 21 April 2016 13:36:54 Guyren Howe wrote:

Anyone familiar with the issue would have to say that the tech world would
be a significantly better place if IBM had developed a real relational
database with an elegant query language rather than the awful camel of a
thing that is SQL.

If I had a few $million to spend in a philanthropical manner, I would hire
some of the best PG devs to develop a proper relational database server.
Probably a query language that expressed the relational algebra in a
scheme-like syntax, and the storage model would be properly relational (eg
no duplicate rows).

It's an enormous tragedy that all the development effort that has gone into
NoSQL database has pretty much all gotten it wrong: by all means throw out
SQL, but not the relational model with it. They're all just rehashing the
debate over hierarchical storage from the 70s. Comp Sci courses should
feature a history class.

It's a bit odd to me that someone isn't working on such a thing.

Just curious what folks here have to say…

Well when IBM were first developing relational databases there were two
different teams.  One in California which produced System-R which became
what we now know as DB2 and spawned SQL, and the other in Peterlee in
the UK which was called PRTV (the Peterlee Relational Test Vehicle).  PRTV
rather died but bits of it survived.  In particular it was the first to system
to include a relational optimiser.  You can find some details on the PRTV
page in Wikipedia.

It was written in PL/1, although it also used some modified microcode
and therefore some assembler.

It never appeared as a product, but there was a geographical system
which built on top of it which was if I recall corrected used by the Greater
London Council and Central Region Scotland, which did something of
what postgis does for PostgreSQL.

According to the Wikipedia page it did have a language (ISBL) but from what
I recall (and it was nearly 40 years ago) there were a series of PL/1
function calls we used rather than encoding the request as a string
as SQL systems require.

The IBM centre in Peterlee was closed, and the lab moved to Winchester
where I think it still resides.
One of the people involved in that was Hugh Darwen, who is one of the 
authors of The Third Manifesto, which is an attempt to define what a 
properly relational language and system should look like. So you could 
say the experience of ISBL vs SQL has been folded into that effort.





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Schema, database, or tables in different folders?

2009-06-05 Thread Eric Schwarzenbach
Alan Hodgson wrote:
> On Tuesday 02 June 2009, "Carlos Oliva"  wrote:
>   
>> Is there a way to create a database or a table of a database in its own
>> folder?  We are looking for ways to backup the sytem files of the
>> database to tape and one to exclude some tables from this backup.  We
>> can selectively backup folders of the file system so we figure that if
>> we can create a schema or database or table in its own folder, we can
>> backup our database and exclude the tables selectively.  We are using
>> Linux RedHat.  Thank you.
>> 
>
> You can put them in their own tablespace, but backing them up would not be 
> useful. Filesystem backups of the cluster must include the whole database 
> to be useful (and see the documentation on PITR for how to do it right).
>
> You can easily back up single databases with pg_dump, though.
>
>   
And pg_dump has a parameter to dump only a particular schema.
I'd also suggest that a schema is a unit of logical partitioning whereas
a tablespace is unit of physical partitioning and the need to back up
only certain tables suggests a difference /requirement at the logical
level to me, so I suspect schema is the more appropriate design choice
anyway.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] When to use cascading deletes?

2009-06-11 Thread Eric Schwarzenbach
My rule of thumb for when to use to not use cascading deletes is this:

If the what the record represents can essentially be thought of a "part
of" what the record that it references represents, I use cascading
deletes. If what the record represents has an independent existence,
that it, it does not necessarily have the same life cycle, I prohibit
cascading. (This is more or less the distinction between composition and
aggregation in UML terms, if I remember correctly.)

This amounts to the same advice others have already given, but in other
terms, and may be helpful if you conceive of your data this way.

Eric

David wrote:
> Hi there.
>
> When is a good time to use cascading deletes?
>
> Usually, for safety reasons, I prefer to not ever use cascading
> deletes. But that can lead to some complex code that uses topological
> sorts etc to manually delete records in the right order, when a
> cascading delete is needed.
>
> Ideally, I'd like postgresql to not do cascading deletes, *except*
> when I tell it to, and the rest of the time fail when the user didn't
> explicitly "opt in" for cascading deletes. When it comes to enabling
> cascading deletes, I don't really like the idea that deleting or
> updating a row from one table can have a possibly unexpected (to the
> app programmer, using the database) chain reaction to other tables.
>
> I don't know, maybe I have the wrong mindset, and cascading is
> preferable (in terms of object model) in some cases? I'd like to read
> more on this subject (general best practices for what types of
> cascading are appropriate to use when).
>
> Any tips?
>
> Thanks,
>
> David.
>
>   


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Very slow joins

2009-07-25 Thread Eric Schwarzenbach
Alban Hertroys wrote:
> On 25 Jul 2009, at 11:36, MS wrote:
>
>>> can we see an explain analyze at least?
>>>
>>
>> Hi,
>> Well, it won't be necessary - I mean it looks just like the explain I
>> sent in my first post.
>
> What first post? The only thing I can find is a reference in a message
> by you from yesterday, to a two-year old post that you claim is about
> the same problem. Though it's possible that it is the same problem,
> you don't provide any data to back that up. 
Yeah I'm confused too. The first post in this thread that I recieved was
the same one you mention and began with a "Re:" in the subject line as
if it wasn't the first message, but I can find no sign of a message
prior to it. Was this a cross-post where the thread started in another
group? Or did something go awry with the listserv and the first post or
three get lost?

Eric

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Does derby have an embedded Mode like Derby ?

2009-08-04 Thread Eric Schwarzenbach
Bayless Kirtley wrote:
> Yes, I'm afraid you're gonna be out of luck on finding the array type
> in any
> of
> the smaller embedded databases. Honestly, the beg project I've been on
> for a
> year or so has used Postgres right through full development and testing.
> It's
> not hard to start and stop the database whenever I need to although I
> usually
> just leave it running.
>
Even if you find one supporting all the same SQL syntax that you are
using, another database implementation could, in some rare
circumstances, return slightly different data for the same query. In
particular odd situations like this arise around nulls. Conceivably this
could happen between versions of PostgreSQL and running such unit tests
to test your code's expectations against a real database guards against
this possibility.

Eric


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] explicit JOIN faster than implicit?

2009-08-05 Thread Eric Schwarzenbach
I'm in the process taking a large SELECT statement which had been
written using implicit join syntax (that is, just listing all the tables
in the FROM clause, and listing join conditions in the WHERE clause) and
rewriting it to use explicit JOIN syntax (they are all inner joins).
This has sped up the query by 50%.

This is using Postgres 8.3 on a database with GEQO turned off.

Is this what would be expected? Does the query planner / optimizer
generally do better with explicit JOIN syntax?

Cheers,

Eric

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] explicit JOIN faster than implicit?

2009-08-05 Thread Eric Schwarzenbach
Um, ok. You've listed some conditions in order of how well they should
perform and these generally agree with my understanding. But how does
this relate to the relative performance of the semantically equivalent
explicit and implicit join syntaxes?

Eric

Martin Gainty wrote:
> here is my best -> worst join scenario starting with best
> 1)low cardinality tables
> 2)where or join on indexed columns
> 3)function indexes
> 4)concatenated indexes
> .
> 5)cartesian join ..every row join specifically joins every other row
> from every other table
>
> Martin Gainty
> __
> Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité
>  
> Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene
> Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede
> unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig.
> Diese Nachricht dient lediglich dem Austausch von Informationen und
> entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten
> Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den
> Inhalt uebernehmen.
> Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le 
> destinataire prévu, nous te demandons avec bonté que pour satisfaire informez 
> l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci 
> est interdite. Ce message sert à l'information seulement et n'aura pas 
> n'importe quel effet légalement obligatoire. Étant donné que les email 
> peuvent facilement être sujets à la manipulation, nous ne pouvons accepter 
> aucune responsabilité pour le contenu fourni.
>
>
>
>
>
> > Date: Wed, 5 Aug 2009 17:43:20 -0400
> > From: subscri...@blackbrook.org
> > To: pgsql-general@postgresql.org
> > Subject: [GENERAL] explicit JOIN faster than implicit?
> >
> > I'm in the process taking a large SELECT statement which had been
> > written using implicit join syntax (that is, just listing all the tables
> > in the FROM clause, and listing join conditions in the WHERE clause) and
> > rewriting it to use explicit JOIN syntax (they are all inner joins).
> > This has sped up the query by 50%.
> >
> > This is using Postgres 8.3 on a database with GEQO turned off.
> >
> > Is this what would be expected? Does the query planner / optimizer
> > generally do better with explicit JOIN syntax?
> >
> > Cheers,
> >
> > Eric
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-general
>
> 
> Get your vacation photos on your phone! Click here.
> 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] explicit JOIN faster than implicit?

2009-08-05 Thread Eric Schwarzenbach
I tried playing with the join_collapse_limit setting. The implicit join
query was unaffected. The explicit join out-performed the implicit one
by 50% when the join_collapse_limit  was low enough, and took just as
long as the implicit one when the join_collapse_limit was high enough.

If I'm reading the documentation right, when the join_collapse_limit is
high enough, the planner is rewriting my explicit join syntax into the
implicit join syntax...with the effect of slowing it down!

I'm not especially concerned, as the performance is acceptable either
way, and I'll be getting the better performance anyway, but I'm just
wondering if this behavior is expected.


Eric

Eric Schwarzenbach wrote:
> I'm in the process taking a large SELECT statement which had been
> written using implicit join syntax (that is, just listing all the tables
> in the FROM clause, and listing join conditions in the WHERE clause) and
> rewriting it to use explicit JOIN syntax (they are all inner joins).
> This has sped up the query by 50%.
>
> This is using Postgres 8.3 on a database with GEQO turned off.
>
> Is this what would be expected? Does the query planner / optimizer
> generally do better with explicit JOIN syntax?
>
> Cheers,
>
> Eric
>
>   


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] "Number of columns exceed limit" on a hierarchy of views

2009-08-21 Thread Eric Schwarzenbach
Tom Lane wrote:
> David Waller  writes:
>   
>> I'm struggling with a database query that under some circumstances returns 
>> the error "ERROR:  number of columns (2053) exceeds limit (1664)".  
>> Confusingly, though, no table is that wide.
>> 
>
> This limit would be enforced against the output rows of any intermediate
> join step.  Without looking at EXPLAIN plans it's hard to say exactly
> what's biting you, but it doesn't surprise me a a whole lot that joining
> multiple 500-column tables would get you into trouble.  I'd suggest
> reconsidering your table schemas.  Array columns might help.
>
>   regards, tom lane
>
>   
I have a question may be relevant, if a little tangential, to this
problem. When using views that perform joins, will the join order be
enforced by the groupings created by the views, or does the planner
decide the join order just as if there were no views and it all was
written as a single query?

In other words, if you have

create view C select * from A join B on (A.foo = B.foo);
create view D select * from C join E on (C.foo= E.foo);
and you execute some select query on D, does it necessarily join A and B
before joining the result to E, or might it decide to join B with E
first before joining the result to A?

Eric

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Wildly erratic query performance

2008-10-31 Thread Eric Schwarzenbach
I've got a particular query that is giving me ridiculously erratic query
performance. I have the SQL in a pgadmin query window, and from one
execution to another, with no changes, the time it takes varies from
half a second to, well, at least 10 minutes or so at which point I give
up an cancel the query. A typical time is 2-3 seconds, but it's all over
the map. I've seen numbers like 112 seconds for one which returns
without exceeding my patience. In every half a dozen or so execution
there will be one time which is an order of magnitude bigger than the
others. A typical series of executions might be something like 2
seconds, 5 seconds, 8 seconds, 0.5 seconds, 80 seconds, 5 seconds.

Note that the database is running on my local machine, the same machine
I'm running the queries from, and nothing else is using this postgresql
installation. The data in the database is also not changing--there are
no inserts or updates happening between queries. I ran a vaccuum (full,
analyze) just before I trying these queries. I do monitor my CPU usage
and there is definitely not some other process on my machine sucking up
all the cpu cycles now and then to explain this.

This is postgreslq 8.3, on Windows XP. The query joins about 17 tables
(without an explicit JOIN, just using the WHERE criteria) with a few
further predicates. One thing which distinguishes it from other similar
queries I've been doing where I haven't seen this odd erraticness is
there are 2 predicates ORred together (and then ANDed with all the other
conditions which are all ANDed) which effectively divides 2 subsets of
joined tables which are not joined to each other, but both joined to
another set of tables. (I don't know if that was a comprehensible way of
explaining this...but I don't know if it's relevant enough to be worth
explaining in more detail).

I've tried running explain, however the wild erraticness seems to go
away when I use explain, taking in the ballpark of 1.5 seconds every
time. This is faster than my average query time using a plain execute,
even if I don't discount all the unusually long times.

Is there any reasonable explanation for this phenomena?

I do realize I could help the query planner with explicit JOINs, however
I have not yet embarked on this optimization, and might not bother if
the query performance is acceptable without doing so. I don't expect the
execution plan to be optimal, however I do expect it to be deterministic.

Thanks,
Eric

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] GEQO randomness?

2008-10-31 Thread Eric Schwarzenbach
This is in a sense a followup to my post with subject "Wildly erratic
query performance".

The more I think about it the only thing that makes sense of my results
is if the query planner really WAS choosing my join order truly randomly
each time. I went digging into the manual and Section 49.3.1.
"Generating Possible Plans with GEQO" says

"In the initial stage, the GEQO code simply generates some possible join
sequences at random."

Now ordinarily I would interpret this use of the word "random" loosely, to
mean "arbitrarily" or "using some non-meaningful selection criteria". But
given what I am seeing, this leads me to consider that "random" is meant
literally, and that it actually uses a random number generator to choose paths. 
Can
someone confirm that this really is the case?

If so, I is this really a good idea? Is non-deterministic behavior really
acceptable? I would think it would be much more sensible to have it
operate deterministically (such as with some predetermined random
sequence of numbers used repeatedly).

Eric


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Fwd: Re: [GENERAL] GEQO randomness?]

2008-10-31 Thread Eric Schwarzenbach

My problem with GEQO using a random number generator is that
non-deterministic behavior is really hard to debug, and problems can go
undiagnosed for ages. Frankly I would rather something fail all the
time, than it work most of the time and fail just now and then. Never
getting a good plan for a query would be an improvement because I would
immediately be aware there's a problem and be forced to something about
it, as opposed to maybe realizing there is going to *sometimes* be a
problem.

Suppose a complex query, like mine, had an even more rarely occurring
bad plan result, where as with mine, now and then the query would simply
go out to lunch for all intents and purposes and bog down the server for
the next 30 minutes.  But suppose that result was rarer than in my case,
and the developer never saw it, and blithely sent it out into
production. Every now and then the system would start performing
horribly and no one would know why. The developers might hear of it and
bring in the debugger, and perhaps simply never duplicate it because
it's so erratic. In fact, I'd be willing to bet there are any number of
production applications out in the wild using postgresql with that very
problem and the problem is just never traced back to postgresql.

I'm sorry if I sound strident, but I feel strongly about non-determinacy
in system being a Bad Thing, and wish to convey why. I understand from
the documentation that the postgresql team is aware the algorithm is not
ideal, and I appreciate the non-triviality of replacing it. And I do
appreciate your responses and your suggestions.

For my own case, I'll certainly be doing one or more of the alternatives
you mentioned (#1 for the short term, at least), and I've had #3 in mind
even before I ran into this problem (the only question is when I will
have time to do it).

Thanks again,

Eric

Tom Lane wrote:
> Eric Schwarzenbach <[EMAIL PROTECTED]> writes:
>   
>> Now ordinarily I would interpret this use of the word "random" loosely, to
>> mean "arbitrarily" or "using some non-meaningful selection criteria". But
>> given what I am seeing, this leads me to consider that "random" is meant
>> literally, and that it actually uses a random number generator to choose 
>> paths. Can
>> someone confirm that this really is the case?
>> 
>
> What it's doing is searching a subset of the space of all possible join
> orders.  It still picks the best (according to cost estimate) plan
> within that subset, but if you're unlucky there may be no very good plan
> in that subset.  And yes, there is a random number generator in there.
>
>   
>> If so, I is this really a good idea?
>> 
>
> The alternatives are not very appealing either ...
>
>   
>> I would think it would be much more sensible to have it
>> operate deterministically (such as with some predetermined random
>> sequence of numbers used repeatedly).
>> 
>
> ... in particular, that one's hardly a panacea.  For one thing, a
> not-unlikely outcome would be that you *never* get a good plan and thus
> don't even get a hint that you might be missing something.  For another,
> the data values used in the query and the current ANALYZE statistics
> also affect the search, which means that in the real world where those
> things change, you'd still be exposed to getting the occasional
> unexpectedly bad plan.
>
> There are a number of alternatives you can consider though:
>
> 1. Disable geqo or bump up the threshold enough that it's not used for
> your query.  Whether this is a feasible answer is impossible to say with
> the limited detail you've provided.  (Remember that potentially
> exponential search time.)
>
> 2. Increase geqo_effort to make the randomized search run a bit longer
> and examine more plans.  This just decreases the probability of losing,
> but maybe it will do so enough that you won't care anymore.
>
> 3. Figure out what's a good join order, rewrite your query to explicitly
> join in that order, and *decrease* join_collapse_limit to force the
> planner to follow that order instead of searching.  Permanent solution
> but the initial development effort is high, especially if you have a lot
> of different queries that need this treatment.
>
> 4. Write a better randomized-search algorithm and submit a patch ;-)
> We have good reason to think that the GEQO code is not a really
> intelligent approach to doing randomized plan searching --- it's based
> on an algorithm designed to solve traveling-salesman problems, which is
> not such a good match to join-order problems --- but no one's yet gotten
> motivated to replace it.
>
>   regards, tom lane
>
>   




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Wildly erratic query performance

2008-10-31 Thread Eric Schwarzenbach
Dann,

Thanks for your response. I thought I'd covered most of what your are
asking in my first message, but these results are weird enough that I
can understand you might not give me the benefit of the doubt and
without very explicit confirmation. To answer your questions:

YES the query each time is IDENTICAL. I am not changing a single
character, I am simply clicking the execute button in pgadmin for each run.

This is my personal laptop, so of course there are other processes
running, like say, a web browser, an email client, etc. And of course,
as on any such machine there may be minor processes that execute in the
background without my awareness. But I am not actively doing anything
else (like running a large compile, yeesh!) while the query is
executing. This is why I noted that I was monitoring the CPU usage (and
processes), so that I can be confident that something major is not
suddenly running in the background without my initiating it directly.
And to make the difference between 2 seconds and a minute, let alone 10
minutes, would take a pretty major and hard not to notice process.

As I explained already (no pun intended) running the query using EXPLAIN
makes the wild variation go away. So I cannot get explain results for a
fast and for a slow execution.

I did not include schema information and such because I am not clear I
am allowed to make them public, and because I'm not looking for a highly
specific answer, merely are there ANY conditions where the SAME EXACT
QUERY should perform so radically differently. If the query planner, for
example, used a random number generator to choose the order in which it
performed my joins, such that the join order would be different each
time, this would explain it--that possibility would seem bizarre to me,
but it would certainly answer my question.

Eric

Dann Corbit wrote:
>> -Original Message-
>> From: [EMAIL PROTECTED] [mailto:pgsql-general-
>> [EMAIL PROTECTED] On Behalf Of Eric Schwarzenbach
>> Sent: Friday, October 31, 2008 12:35 PM
>> To: pgsql-general@postgresql.org
>> Subject: [GENERAL] Wildly erratic query performance
>>
>> I've got a particular query that is giving me ridiculously erratic
>> query
>> performance. I have the SQL in a pgadmin query window, and from one
>> execution to another, with no changes, the time it takes varies from
>> half a second to, well, at least 10 minutes or so at which point I
>> 
> give
>   
>> up an cancel the query. A typical time is 2-3 seconds, but it's all
>> over
>> the map. I've seen numbers like 112 seconds for one which returns
>> without exceeding my patience. In every half a dozen or so execution
>> there will be one time which is an order of magnitude bigger than the
>> others. A typical series of executions might be something like 2
>> seconds, 5 seconds, 8 seconds, 0.5 seconds, 80 seconds, 5 seconds.
>>
>> Note that the database is running on my local machine, the same
>> 
> machine
>   
>> I'm running the queries from, and nothing else is using this
>> 
> postgresql
>   
>> installation. The data in the database is also not changing--there are
>> no inserts or updates happening between queries. I ran a vaccuum
>> 
> (full,
>   
>> analyze) just before I trying these queries. I do monitor my CPU usage
>> and there is definitely not some other process on my machine sucking
>> 
> up
>   
>> all the cpu cycles now and then to explain this.
>>
>> This is postgreslq 8.3, on Windows XP. The query joins about 17 tables
>> (without an explicit JOIN, just using the WHERE criteria) with a few
>> further predicates. One thing which distinguishes it from other
>> 
> similar
>   
>> queries I've been doing where I haven't seen this odd erraticness is
>> there are 2 predicates ORred together (and then ANDed with all the
>> other
>> conditions which are all ANDed) which effectively divides 2 subsets of
>> joined tables which are not joined to each other, but both joined to
>> another set of tables. (I don't know if that was a comprehensible way
>> of
>> explaining this...but I don't know if it's relevant enough to be worth
>> explaining in more detail).
>>
>> I've tried running explain, however the wild erraticness seems to go
>> away when I use explain, taking in the ballpark of 1.5 seconds every
>> time. This is faster than my average query time using a plain execute,
>> even if I don't discount all the unusually long times.
>>
>> Is there any reasonable explanation for this phenomena?
>>
>> I do realize I could help the query planner with explicit JOINs,

[GENERAL] GEQO randomness?

2008-11-04 Thread Eric Schwarzenbach
This is in a sense a followup to my post with subject "Wildly erratic
query performance". The more I think about it the only thing that makes
sense of my results is if the query planner really WAS choosing my join
order truly randomly each time. I went digging into the manual and
Section 49.3.1. "Generating Possible Plans with GEQO" says

"In the initial stage, the GEQO code simply generates some possible join
sequences at random."

Now ordinarily I would interpret this use of the word random loosely, to
mean "arbitrarily" or using some non-meaningful selection criteria. But
given what I am seeing, this leads me to consider that "random" is meant
literally, and that it uses a random number generate to pick paths. Can
someone confirm that this is the case?

Is this really a good idea? Is non-deterministic behavior really
acceptable? I would think it would be much more sensible to have it
operate deterministically (such as with some predetermined random
sequence of numbers used repeatedly).

Eric



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Fwd: Re: [GENERAL] GEQO randomness?]

2008-11-04 Thread Eric Schwarzenbach

My problem with GEQO using a random number generator is that
non-deterministic behavior is really hard to debug, and problems can go
undiagnosed for ages. Frankly I would rather something fail all the
time, than it work most of the time and fail just now and then. Never
getting a good plan for a query would be an improvement because I would
immediately be aware there's a problem and be forced to something about
it, as opposed to maybe realizing there is going to *sometimes* be a
problem.

Suppose a complex query, like mine, had an even more rarely occurring
bad plan result, where as with mine, now and then the query would simply
go out to lunch for all intents and purposes and bog down the server for
the next 30 minutes.  But suppose that result was rarer than in my case,
and the developer never saw it, and blithely sent it out into
production. Every now and then the system would start performing
horribly and no one would know why. The developers might hear of it and
bring in the debugger, and perhaps simply never duplicate it because
it's so erratic. In fact, I'd be willing to bet there are any number of
production applications out in the wild using postgresql with that very
problem and the problem is just never traced back to postgresql.

I'm sorry if I sound strident, but I feel strongly about non-determinacy
in system being a Bad Thing, and wish to convey why. I understand from
the documentation that the postgresql team is aware the algorithm is not
ideal, and appreciate the non-triviality of replacing it. I do
appreciate your responses and your suggestions.

For my own case, I'll certainly be doing one or more of the alternatives
you mentioned (#1 for the short term, at least), and I've had #3 in mind
even before I ran into this problem (the only question is when I will
have time to do it).

Thanks again,

Eric



Tom Lane wrote:
> Eric Schwarzenbach <[EMAIL PROTECTED]> writes:
>   
>> Now ordinarily I would interpret this use of the word "random" loosely, to
>> mean "arbitrarily" or "using some non-meaningful selection criteria". But
>> given what I am seeing, this leads me to consider that "random" is meant
>> literally, and that it actually uses a random number generator to choose 
>> paths. Can
>> someone confirm that this really is the case?
>> 
>
> What it's doing is searching a subset of the space of all possible join
> orders.  It still picks the best (according to cost estimate) plan
> within that subset, but if you're unlucky there may be no very good plan
> in that subset.  And yes, there is a random number generator in there.
>
>   
>> If so, I is this really a good idea?
>> 
>
> The alternatives are not very appealing either ...
>
>   
>> I would think it would be much more sensible to have it
>> operate deterministically (such as with some predetermined random
>> sequence of numbers used repeatedly).
>> 
>
> ... in particular, that one's hardly a panacea.  For one thing, a
> not-unlikely outcome would be that you *never* get a good plan and thus
> don't even get a hint that you might be missing something.  For another,
> the data values used in the query and the current ANALYZE statistics
> also affect the search, which means that in the real world where those
> things change, you'd still be exposed to getting the occasional
> unexpectedly bad plan.
>
> There are a number of alternatives you can consider though:
>
> 1. Disable geqo or bump up the threshold enough that it's not used for
> your query.  Whether this is a feasible answer is impossible to say with
> the limited detail you've provided.  (Remember that potentially
> exponential search time.)
>
> 2. Increase geqo_effort to make the randomized search run a bit longer
> and examine more plans.  This just decreases the probability of losing,
> but maybe it will do so enough that you won't care anymore.
>
> 3. Figure out what's a good join order, rewrite your query to explicitly
> join in that order, and *decrease* join_collapse_limit to force the
> planner to follow that order instead of searching.  Permanent solution
> but the initial development effort is high, especially if you have a lot
> of different queries that need this treatment.
>
> 4. Write a better randomized-search algorithm and submit a patch ;-)
> We have good reason to think that the GEQO code is not a really
> intelligent approach to doing randomized plan searching --- it's based
> on an algorithm designed to solve traveling-salesman problems, which is
> not such a good match to join-order problems --- but no one's yet gotten
> motivated to replace it.
>
>   regards, tom lane
>
>   




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Ubuntu for servers (was TurnKey PostgreSQL)

2008-12-10 Thread Eric Schwarzenbach
Robert Treat wrote:
> On Tuesday 09 December 2008 19:43:02 Liraz Siri wrote:
>   
>> Greg has a good point. Ubuntu is a bit of a moving target. In contrast,
>> Debian has a much slower release cycle than Ubuntu and is thus
>> considered by many people to be preferable for production server
>> applications.
>>
>> 
>
> Another option for folks is to switch to another operating system thats a bit 
> more stable *cough*solaris*cough*bsd*cough* 
>
> :-)
>   
And don't forget about BSD.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] metadata development

2009-03-28 Thread Eric Schwarzenbach
Dara Olson wrote:
>
> Happy spring.
>
> I am new to postgres/postgis and am trying to figure out the best way
> to approach documenting metadata within postgres.  Has there been
> anything developed to add FGDC or Dublin Core standard metadata
> records into postgres for each table within the database?  Is there
> any program that can access postgres to insert metadata based on a
> standard or any other way to document metadata for a postgres
> database?  Any help, suggestions or advice from prior experiences
> would be greatly appreciated.
>
> Thanks in advance.
>
> Dara
>
I'm curious how this would be useful. Why would you want to associate
that sort of information with tables as metadata? Dublin core is a
metadata standard for "information resources" like documents (Tile,
Creator, Publisher, Author, Rights, etc), and FGDC is Geographic data. I
can easily imagine wanting to store such information in your database
using actual tables, as information about "entities" represented in your
database schema, but I don't think attaching it as metadata to a table
is really an appropriate solution. But I don't understand the scenario
you have in mind, and perhaps you could explain the scenario where this
makes sense.

I suspect you are reaching for database metadata simply because these
standards are often described as metadata; but these are usages of the
word "metadata" in very different contexts. What is called metadata in
many contexts is simply becomes data from the context of using a
relational database to represent it.

Eric

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL Object-Oriented Database?

2009-04-27 Thread Eric Schwarzenbach
Bill Moran wrote:
> In response to "Robert Pepersack" :
>
>   
>> I read the document on array data types.  Do they have anything at all to do 
>> with PostgreSQL being "object-oriented"?
>> 
>
> If you want to be pedantic, not really.  Technically, Postgres isn't
> "object-oriented", it's "object-relational".
>
> But then again, C isn't considered to be object-oriented, but I've
> seen some very clever object-oriented code written in C.  Of course,
> there are languages that have object-oriented syntax as more of the
> core of their design, which usually is what's meant by saying that
> a language is "object-oriented".
>
> Going from that statement, you could argue that PostgreSQL is very
> object-oriented.  Arrays are the least of the objecty features in
> the system: stored procedures, triggers and table inheritance are
> much more objectivy than arrays, although arrays could arguably
> be a part of Postgres' object friendliness.
>
> Looking for a more concise, more to-the-point answer?  Ask a
> salesperson, I'm they'll tell you whatever you want to hear.
>
>   
>> Also, these comma-delimited fields make creating reports with our reporting 
>> tool impossible.
>> 
>
> Sounds like your reporting tool is horribly limited.  Of course,
> if you knew that you'd be using this reporting tool, then it was
> your database designer's fault for not considering this limitation.
> If you chose the reporting tool after the database was designed, then
> it was a poor decision on your part.
>
> If you're looking for someone to blame (and it seems like you are)
> then you should just pick someone and start making up reasons.  That's
> what politicians do with great success.
>
> Honestly ... what are you attempting to accomplish with this thread?
> It seems to me that you're trying get the people on this mailing list
> to help you justify being angry with your database designer.
>   
It seems to me he's quite legitimately trying to determine if there is
more to his database designer's claim that these
comma separated fields being "object-oriented", than he might think
otherwise. PostgreSQL's (not very meaningful or helpful, IMO)
characterization of itself as an "object-relational database system" no
doubt leads to his very reasonable query whether he should
be taking something more into account than normal relational database
design principles.

I think it's uncalled for to be attacking him or his motives.

Eric


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] XML -> PG ?

2009-05-06 Thread Eric Schwarzenbach
Gauthier, Dave wrote:
>
> Is there a way to read an XML file into a postgres DB?  I’m thinking
> that it will create and relate whatever tables are necessary to
> reflect whatever’s implied by the XML file structure.
>
>  
>
> Thanks for any pointers !
>
That's a pretty common problem, and not one that needs to have a
postgresql-specific solution. There are definitely solutions out there,
but it may take some Googling to find a good one for your needs. One
option might be data binding tools (Castor and Liquid XML come to mind)
with which you can definitely go from XML to SQL, though many of them
may require going through an intermediate object model.

A simple approach (simple depending what technologies you're using and
are already familiar with) might be to use XSLT to transform your XML
either directly into SQL or into some other format easily loaded to a
database (such as the XML format used by DbUnit).

Eric

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general