date:20110504

Re: [GENERAL] pervasiveness of surrogate (also called synthetic) keys

2011-05-04 Thread Greg Smith


David Johnston wrote:

Is there any rules-of-thumb on the performance of a PK as a function of key 
length?  I like using varchar based identifiers since I tend to query tables 
directly and writing where clauses is much easier if you can avoid the joins.  
I'm likely better off creating views and querying those but am still curious on 
any basic thoughts on having a 100+ length primary key.
  


The shorter the better, but it may not be as bad as you fear.  The way 
B-tree indexes are built, it isn't that expensive to hold a longer key 
so long as the unique part doesn't average out to be that long.  So if 
you insert "12345" and "12345777", that's 
not going to be much different than navigating "123456" and "123457", 
because once you get that far you've already reached a unique prefix.  
But if your entries have a really long common prefix, like 
"12" and "13", that's going to be more 
expensive to deal with--even though the strings are the same length.


If your identifiers become unique after only a few characters, it may 
not be so bad.  But if they go many characters before you can 
distinguish between any two entries, you're probably not going to be 
happy with the performance or size of the indexes, relative to simple 
integer keys.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Bidirectional replication

2011-05-04 Thread Greg Smith


Merlin Moncure wrote:

I know some people do some cool, usable things with that stuff,
but the whole concept seems awfully awkward to me.  I suppose I'm a
crotchety, cane shaking fundamentalist...


It's possible--do you sometimes find yourself yelling at young 
developers, telling them to stop replicating in your yard?


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Question on Wal time lines

2011-05-04 Thread Greg Smith


dabicho wrote:


For restoring a database from wal files, if I omit a target on the 
recovery.conf file, can I make it so the database continues the time 
line instead of starting one?
Or is there a tool to pick the most recent time line from a bunch of 
wal files?




When recovery finishes, you get a new timeline.  That way it's always 
possible to distinguish between a server that's exited recovery, and 
started generating new WAL data, from one that is still sync'd to the 
master and running recovery of the original timeline.  If you don't want 
a new timeline, don't let recovery finish.


As for the options you can tweak, see 
http://www.postgresql.org/docs/9.0/static/continuous-archiving.html#BACKUP-TIMELINES 
and the recovery target parameters at 
http://www.postgresql.org/docs/9.0/static/recovery-target-settings.html  
You can navigate among multiple timelines in a set of WAL files using 
recovery_target_timeline and the other target settings.  It really is 
worth the trouble to run some experiments with these ideas to see what 
you can do, before you're forced to do so by an emergency.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Bidirectional replication

2011-05-04 Thread tushar nehete

Hi Thanks to ALL,
John I tried Perl built into RHEL 5.5 but i got some errors so I download
activeperl 5.12 and
installed it.
After that when start installation I stuck with the error,

FAILED! (psql:/usr/local/share/bucardo/bucardo.schema:40: ERROR:  didn't get
a returINSTALLATION n item from mksafefunc )

Can any one help to deal with this error  !!!

Thanks,
Tushar

On Wed, May 4, 2011 at 12:59 PM, Greg Smith  wrote:

> Merlin Moncure wrote:
>
>> I know some people do some cool, usable things with that stuff,
>> but the whole concept seems awfully awkward to me.  I suppose I'm a
>> crotchety, cane shaking fundamentalist...
>>
>
> It's possible--do you sometimes find yourself yelling at young developers,
> telling them to stop replicating in your yard?
>
> --
> Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

[GENERAL] multiple group by on same table

2011-05-04 Thread Leonardo Francalanci

Hi,


I'm going to need to GROUP BY the same table
multiple times. That is, something like:

select (some aggregate functions here) from
tableA group by f1, f2

select (some other aggregate functions here) from
tableA group by f3, f4

etc

The table is pretty large; can someone suggest the
best way of doing it? Is running N queries at the same
time (that is, using N connections with N threads in
the client code) the only way to speed up things (so
that the "concurrent scan" thing can help)? Or it's
more likely that it won't help that much, given that
we have a fairly good storage? Just trying to get some
ideas before starting testing

(table will be 5M rows, where some of the group by
select could return 3-400K groups)

Leonardo

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] multiple group by on same table

2011-05-04 Thread Sim Zacks


On 05/04/2011 01:51 PM, Leonardo Francalanci wrote:


Hi,


I'm going to need to GROUP BY the same table
multiple times. That is, something like:

select (some aggregate functions here) from
tableA group by f1, f2

select (some other aggregate functions here) from
tableA group by f3, f4

etc

The table is pretty large; can someone suggest the
best way of doing it? Is running N queries at the same
time (that is, using N connections with N threads in
the client code) the only way to speed up things (so
that the "concurrent scan" thing can help)? Or it's
more likely that it won't help that much, given that
we have a fairly good storage? Just trying to get some
ideas before starting testing

(table will be 5M rows, where some of the group by
select could return 3-400K groups)

Leonard


The fastest way of doing it would probably be to build materialized 
views for each reporting query and call it your data warehouse.
In any case, each resultset will require its own query, so the only way 
to get it done in parallel is separate connections. If you use a single 
connection, the queries will have to be run serial (one after the other).


Sim

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pervasiveness of surrogate (also called synthetic) keys

2011-05-04 Thread Misa Simic

2011/4/28 Merlin Moncure 

> On Thu, Apr 28, 2011 at 12:29 PM, Jim Irrer  wrote:
> *) most tables don't have unique natural keys (let's see em)
> etc
>
>
i.e for an Invoice, we have at least 2 tables (more in practice...):

Invoice Header
-Invoice Number
-Date
-CustomerID
-Currency

Invoice Line
-ItemID
-qty
-Price

This is not real design for tables, just basic real info stored to represent
1 Invoice Document.

Now, let say for Invoice Header, "Natural" PK is Invoice Number, so we will
add that column to Invoice Line table as FK

What should be "Natural" PK for Invoice Line table?

Also, if we now add additional tables i.e. Serial Numbers, related to
Invoice Line, what should be "Natural" key in it?

Thanks,

Misa

Re: [GENERAL] multiple group by on same table

2011-05-04 Thread Gabriele Bartolini


Ciao Leonardo,

 I am not sure if this could apply to your case, but maybe - unless you 
have done it before - you could look at windowing functions 
(http://www.postgresql.org/docs/current/interactive/tutorial-window.html). 
They require PG8.4+ though.


Cheers,
Gabriele

On Wed, 4 May 2011 11:51:08 +0100 (BST), Leonardo Francalanci 
 wrote:

Hi,


I'm going to need to GROUP BY the same table
multiple times. That is, something like:

select (some aggregate functions here) from
tableA group by f1, f2

select (some other aggregate functions here) from
tableA group by f3, f4

etc

The table is pretty large; can someone suggest the
best way of doing it? Is running N queries at the same
time (that is, using N connections with N threads in
the client code) the only way to speed up things (so
that the "concurrent scan" thing can help)? Or it's
more likely that it won't help that much, given that
we have a fairly good storage? Just trying to get some
ideas before starting testing

(table will be 5M rows, where some of the group by
select could return 3-400K groups)

Leonardo


--
 Gabriele Bartolini - 2ndQuadrant Italia
 PostgreSQL Training, Services and Support
 gabriele.bartol...@2ndquadrant.it - www.2ndQuadrant.it

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] postgres segfaulting on pg_restore

2011-05-04 Thread Chris Curvey

>
>
> It occurred to me that a simple explanation for a core dump there would
> be if something had scribbled past the end of the preceding palloc
> chunk.  That would tend to clobber the "context" link of the palloc
> chunk after it, which would send GetMemoryChunkSpace off into
> never-never land following a trashed pointer.
>
> That doesn't get us very much closer to a solution, but it does suggest
> that you might learn something if you try this with an assert-enabled
> build (configure --enable-cassert).  Also, are there any contrib modules
> or third-party add-on modules or home-brew C functions that you're using?
>
>
in reverse order:  no third-party, no contrib, no home-brew C.here is a
stack trace from a fresh build of 9.1 beta 1 built with enable-cassert.

 Program received signal SIGSEGV, Segmentation fault.
0x00744777 in AllocSetAlloc (context=0x13f6d08, size=16) at
aset.c:639
639 Assert(chunk->size >= size);
#0  0x00744777 in AllocSetAlloc (context=0x13f6d08, size=16) at
aset.c:639
#1  0x0074d07b in readtup_index (state=0x1453568,
stup=0x7fffa5135010, tapenum=0, len=) at
tuplesort.c:3258
#2  0x0074a57c in mergeprereadone (state=0x1453568, srcTape=0) at
tuplesort.c:2157
#3  0x0074a9d3 in tuplesort_gettuple_common (state=0x1453568,
forward=, stup=0x7fffa51350a0, should_free=) at tuplesort.c:1491
#4  0x0074ac96 in tuplesort_getindextuple (state=0x13f6d08,
forward=16 '\020', should_free=0x10 ) at
tuplesort.c:1580
#5  0x0048d076 in _bt_load (btspool=,
btspool2=) at nbtsort.c:786
#6  _bt_leafbuild (btspool=, btspool2=) at nbtsort.c:224
#7  0x0048788c in btbuild (fcinfo=) at
nbtree.c:142
#8  0x0072d55c in OidFunctionCall3Coll (functionId=, collation=0, arg1=140144035752040, arg2=21827704, arg3=20972800) at
fmgr.c:1645
#9  0x004d095d in index_build (heapRelation=0x7f75d377e868,
indexRelation=0x14d1078, indexInfo=0x1400500, isprimary=, isreindex=0 '\000') at index.c:1737
#10 0x004d31c5 in index_create (heapRelation=,
indexRelationName=, indexRelationId=, indexInfo=0x1400500, indexColNames=,
accessMethodObjectId=, tableSpaceId=0,
collationObjectId=0x1400618, classObjectId=0x1400638, coloptions=0x1400658,
reloptions=0, isprimary=1 '\001', isconstraint=1 '\001', deferrable=0
'\000', initdeferred=0 '\000', allow_system_table_mods=0 '\000',
skip_build=0 '\000', concurrent=0 '\000') at index.c:1067
#11 0x0054bd81 in DefineIndex (heapRelation=,
indexRelationName=, indexRelationId=, accessMethodName=, tableSpaceName=, attributeList=0x85cf3e, predicate=0x0, options=0x0,
exclusionOpNames=0x0, unique=1 '\001', primary=1 '\001', isconstraint=1
'\001', deferrable=, initdeferred=, is_alter_table=1 '\001', check_rights=1 '\001', skip_build=0 '\000',
quiet=0 '\000', concurrent=) at indexcmds.c:397
#12 0x0055b115 in ATExecAddIndex (tab=0x13bb038, rel=, stmt=0x14001d8, is_rebuild=0 '\000', lockmode=) at tablecmds.c:5173
#13 0x005653c9 in ATExecCmd (wqueue=,
tab=0x13bb038, rel=0x7f75d377e868, cmd=0x14001a0, lockmode=6) at
tablecmds.c:3074
#14 0x00565fc5 in ATRewriteCatalogs (rel=,
cmds=, recurse=, lockmode=6) at
tablecmds.c:3001
#15 ATController (rel=, cmds=,
recurse=, lockmode=6) at tablecmds.c:2735
#16 0x0066bda5 in standard_ProcessUtility (parsetree=, queryString=0x13ad538 "ALTER TABLE ONLY transactions\n
 ADD CONSTRAINT transactions_pkey PRIMARY KEY (id);", params=0x0,
isTopLevel=1 '\001', dest=0x13ae5a8, completionTag=0x7fffa5135e30 "") at
utility.c:767
#17 0x006674b7 in PortalRunUtility (portal=0x1329078,
utilityStmt=0x13ae220, isTopLevel=16 '\020', dest=0x13ae5a8,
completionTag=0x7fffa5135e30 "") at pquery.c:1184
#18 0x00668740 in PortalRunMulti (portal=0x1329078, isTopLevel=1
'\001', dest=, altdest=0x13ae5a8,
completionTag=0x7fffa5135e30 "") at pquery.c:1315
#19 0x00668f52 in PortalRun (portal=,
count=, isTopLevel=16 '\020', dest=, altdest=, completionTag=) at
pquery.c:813
#20 0x0066518b in exec_simple_query (query_string=) at postgres.c:1018
#21 0x0066615b in PostgresMain (argc=,
argv=, username=) at
postgres.c:3924
#22 0x0061d2f1 in BackendRun () at postmaster.c:3600
#23 BackendStartup () at postmaster.c:3285
#24 ServerLoop () at postmaster.c:1454
#25 0x0061fc9d in PostmasterMain (argc=3, argv=0x1305880) at
postmaster.c:1115
#26 0x005b9720 in main (argc=3, argv=0x1305870) at main.c:199

-- 
e-Mail is the equivalent of a postcard written in pencil.  This message may
not have been sent by me, or intended for you.  It may have been read or
even modified while in transit.  e-Mail disclaimers have the same force in
law as a note passed in study hall.  If your corporate attorney says that
you need an disclaimer in your signature, you need a new corporate attorney.

[GENERAL] postgresql not updating the sequence

2011-05-04 Thread kosna

hi everyone,

i ve created a table with id and datatype of id is serial
and i ve created a  sequence on that table 
whenever i insert the new row values the id is not being incremented and it
is giving the exception entityalreadyexists. pls help me
thanks and regards,
kosna.

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/postgresql-not-updating-the-sequence-tp4369180p4369180.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] undead index

2011-05-04 Thread Jens Wilke

Hi,

pg_upgrade brakes with the following error:

pg_upgrade 8.4.5 to 9.0.4:
Restoring user relation files
  /data1/postgres/pgsql/foo/data_8.4/base/11564/2613   ^M  
/data1/postgres/pgsql/foo/data_8.4/base/11564/2683   
Could not find foo.bar_idx in old cluster

This index was deleted several weeks ago.

server_version | 8.4.5

foo=# select * from pg_class where relname = 'bar_idx';
(No rows)

after pg_dumpall|psql from 8.4 to 9.0 the undead index revived on the target 
DB:

server_version | 9.0.4

foo=# select * from pg_class where relname = 'bar_idx';
-[ RECORD 1 ]---+
relname | bar_idx
relnamespace| 16409

Besides the question how this could happen, Is there another way to correct 
this without using dump|restore?
I'd like to get pg_upgrade working.

Regards, Jens

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] postgresql not updating the sequence

2011-05-04 Thread Raymond O'Donnell


On 04/05/2011 08:10, kosna wrote:

hi everyone,

i ve created a table with id and datatype of id is serial
and i ve created a  sequence on that table
whenever i insert the new row values the id is not being incremented and it
is giving the exception entityalreadyexists. pls help me
thanks and regards,
kosna.


Can you show us your query? Also, are you using any sort of framework or 
ORM, or just sending the query directly to the server?


The more detail you give, the better people will be able to help you.

Ray.

--
Raymond O'Donnell :: Galway :: Ireland
r...@iol.ie

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pervasiveness of surrogate (also called synthetic) keys

2011-05-04 Thread Merlin Moncure

On Wed, May 4, 2011 at 7:14 AM, Misa Simic  wrote:
>
>
> 2011/4/28 Merlin Moncure 
>>
>> On Thu, Apr 28, 2011 at 12:29 PM, Jim Irrer  wrote:
>> *) most tables don't have unique natural keys (let's see em)
>> etc
>>
>
> i.e for an Invoice, we have at least 2 tables (more in practice...):
> Invoice Header
>     -Invoice Number
>     -Date
>     -CustomerID
>     -Currency
> Invoice Line
>     -ItemID
>     -qty
>     -Price
>
> This is not real design for tables, just basic real info stored to represent
> 1 Invoice Document.
> Now, let say for Invoice Header, "Natural" PK is Invoice Number, so we will
> add that column to Invoice Line table as FK
> What should be "Natural" PK for Invoice Line table?
> Also, if we now add additional tables i.e. Serial Numbers, related to
> Invoice Line, what should be "Natural" key in it?

Most of the old school accounting systems maintained an invoice line number.

> Invoice Line
> -Invoice Number
> -LineNo
> -ItemID
> -qty
> -Price

The line number started from 1 (the first line on the invoice) on
every unique invoice.  This has the added benefit of allowing the
primary key index (invoice_no, line_no) allowing you to pull up the
invoice line records in correct order without requiring an extra sort
and, if you don't need any fields from the invoice, a join.

Compare the two queries pulling up invoice lines over a range of
invoice numbers.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] postgresql not updating the sequence

2011-05-04 Thread Dickson S. Guedes

2011/5/4 kosna :
> hi everyone,
>
> i ve created a table with id and datatype of id is serial
> and i ve created a  sequence on that table
> whenever i insert the new row values the id is not being incremented and it
> is giving the exception entityalreadyexists. pls help me
> thanks and regards,
> kosna.

Could you send us the result of \d your_table ?

The column has a 'nextval' ?


--
Dickson S. Guedes
mail/xmpp: gue...@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pervasiveness of surrogate (also called synthetic) keys

2011-05-04 Thread Misa Simic

2011/5/4 Merlin Moncure 

> Most of the old school accounting systems maintained an invoice line
> number.
>
> > Invoice Line
> > -Invoice Number
> > -LineNo
> > -ItemID
> > -qty
> > -Price
>
> The line number started from 1 (the first line on the invoice) on
> every unique invoice.  This has the added benefit of allowing the
> primary key index (invoice_no, line_no) allowing you to pull up the
> invoice line records in correct order without requiring an extra sort
> and, if you don't need any fields from the invoice, a join.
>
> Compare the two queries pulling up invoice lines over a range of
> invoice numbers.
>
> merlin
>


Thanks, merlin,

And in that case, what is "Natural" in LineNo? I would say, with adding
LineNo we are creating syntethic/surrogate Key (just instead of 1 surrogate
column - it will be Compound key with more columns...)? The same is with all
other tables what are "parts" of an Entity, Serial Numbers, Accounting
Distribution produced by Invoice...etc etc...

Re: [GENERAL] pervasiveness of surrogate (also called synthetic) keys

2011-05-04 Thread David Johnston

 

>>Thanks, merlin,

 

>>>And in that case, what is "Natural" in LineNo? I would say, with adding 
>>>LineNo we are creating syntethic/surrogate Key (just instead of 1 surrogate 
>>>column - it will be Compound key with more columns...)? The >>>same is with 
>>>all other tables what are "parts" of an Entity, Serial Numbers, Accounting 
>>>Distribution produced by Invoice...etc etc...

 

Being the “first line” or the “second line” of a physical invoice is a property 
for that line.  Identifying its position on the invoice is only natural.

 

By your reasoning all identifiers are synthetically generated if you consider 
there is never truly only a single instance of anything in the multi-verse.  
The only truly unique identifier would be the time+place of an objects creation.

 

“Hello - person born in Liverpool London, St. Whatever hospital, Room 101 @ 
13:14:57AM on the 5th of March 2001 – how may I direct your call?” (I guess you 
could use the conception date as well although twins+ might be tough to 
distinguish in that case).

 

Generally it could be argued that any well-normalized compound key is 
inherently natural (whether left as multiple fields or concatenated into a 
single field).  The identifier that is assigned to the “part” individually is 
likely to be “synthetic” but its membership in the hierarchy naturalizes it.

 

David J.

Re: [GENERAL] pervasiveness of surrogate (also called synthetic) keys

2011-05-04 Thread Karsten Hilbert

On Wed, May 04, 2011 at 09:33:57AM -0400, David Johnston wrote:

> “Hello - person born in Liverpool London, St. Whatever
> hospital, Room 101 @ 13:14:57AM on the 5th of March 2001 –
> how may I direct your call?” (I guess you could use the
> conception date as well

That will rarely be known to any sufficient precision.

Karsten
-- 
GPG key ID E4071346 @ gpg-keyserver.de
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pervasiveness of surrogate (also called synthetic) keys

2011-05-04 Thread Merlin Moncure

On Wed, May 4, 2011 at 7:50 AM, Misa Simic  wrote:
> 2011/5/4 Merlin Moncure 
>>
>> Most of the old school accounting systems maintained an invoice line
>> number.
>> > Invoice Line
>> >     -Invoice Number
>> >     -LineNo
>> >     -ItemID
>> >     -qty
>> >     -Price
>>
>> The line number started from 1 (the first line on the invoice) on
>> every unique invoice.  This has the added benefit of allowing the
>> primary key index (invoice_no, line_no) allowing you to pull up the
>> invoice line records in correct order without requiring an extra sort
>> and, if you don't need any fields from the invoice, a join.
>>
>> Compare the two queries pulling up invoice lines over a range of
>> invoice numbers.
>>
>> merlin
>
>
> Thanks, merlin,
> And in that case, what is "Natural" in LineNo? I would say, with adding
> LineNo we are creating syntethic/surrogate Key (just instead of 1 surrogate
> column - it will be Compound key with more columns...)? The same is with all
> other tables what are "parts" of an Entity, Serial Numbers, Accounting
> Distribution produced by Invoice...etc etc...

It's natural because it contains information that is not synthetic,
and unique/well defined.  Specifically, the position of the line on
the invoice; you can't have to invoice lines at the second line of an
invoice for example.   This information is visible and important to
users because presumably the invoice lines as entered into an
application are in the order you would want them printed out on a
physical invoice.

The invoice number itself is more interesting case because it's
generated and what it might represent (the 77th invoice for this
customer) isn't very interesting.  An actual, 'true' natural key for
an invoice might be a combination of user, entry time, and customer,
but this key is fairly weak and not very useful for identification
purposes by human beings -- so we created the invoice number concept
as a proxy for that information.  Also timestamps and floating point
values tend to suck for a primary key.  This is what Jeff D was
talking about in detail largely upthread, to wit: if your users (man
or machine) really prefer to work with a generated identifier for some
reason or another, by all means make one, just make sure the record
has a rigorous definition.

A line number table would be a good candidate for adding an additional
surrogate key for purposes of relating on performance grounds if you
have foreign keys pointing to the record.. Any shifts in invoice line
position require renumbering the invoice which would annoyingly
cascade the updates.

The reason, by the way, that I *know* you aren't going to turn up many
interesting cases of not being able to define a key at least in the
accounting and manufacturing world is that I happen to have worked a
large ERP/CRM that dated from the pre-sql era.  Like many systems of
the day, it was written in COBOL over an isam data store which didn't
have the technical capability to do what sequences or uuids do today
(or at least, it was more trouble than it was worth)...100% natural
keys.  Somehow those guys seemed to manage just fine working under
hardware constraints much tighter than today's  Although there were
some real headaches in that system, for example when say, part numbers
changed, it had an inner elegance that grew on me over time, and
taught me the value of rigorous definition of records and their keys.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Rearranging simple where clauses

2011-05-04 Thread Michael Graham

Hi,

I was playing around with some sql in postgres and got to wondering why
the optimiser can't figure out that rearranging some expressions can
result in massive improvements in the queue plan.  For example id + 5 <
100 compared with id < 100 - 5.

Is it simply that no one has go around to doing it or is there some
deeper reasons?  It's not really important I'm just curious.

Cheers,
-- 
Michael Graham 



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Default Operator Class for datatype

2011-05-04 Thread Jakub Królikowski

Hello,

I've set pl_PL.UTF-8 collation on my database after upgrade do 9.0.
That means indexes for varchar column doesn't work anymore in selects
using "like" or "=" operators with that columns.
I know the solution - operator classes - which works very well.
But that means I have to find and recreate all indexes on varchar
column like this:
CREATE INDEX test_index ON test_table (col varchar_pattern_ops);

Does anybody know if there is any possibility to set a default
operator class for varchar datatatype?
I can see that in system table pg_catalog.pg_opclass ther is a column
called "opcdefault". Does anybody has an experience with changing that
value? The idea is to set this default operator class and make after
that dump-restore of database data. Should work.

Best regards,

Jakub

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] postgresql not updating the sequence

2011-05-04 Thread Dickson S. Guedes

2011/5/4 Kosna Sridhar :
>
> actually i resolved the problem thanks for quick response

Great!

For history and reference of this list could you tell us how with a "Reply all"?

regards
-- 
Dickson S. Guedes
mail/xmpp: gue...@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pervasiveness of surrogate (also called synthetic) keys

2011-05-04 Thread Merlin Moncure

On Wed, May 4, 2011 at 2:25 AM, Greg Smith  wrote:
> David Johnston wrote:
>>
>> Is there any rules-of-thumb on the performance of a PK as a function of
>> key length?  I like using varchar based identifiers since I tend to query
>> tables directly and writing where clauses is much easier if you can avoid
>> the joins.  I'm likely better off creating views and querying those but am
>> still curious on any basic thoughts on having a 100+ length primary key.
>>
>
> The shorter the better, but it may not be as bad as you fear.  The way
> B-tree indexes are built, it isn't that expensive to hold a longer key so
> long as the unique part doesn't average out to be that long.  So if you
> insert "12345" and "12345777", that's not going
> to be much different than navigating "123456" and "123457", because once you
> get that far you've already reached a unique prefix.  But if your entries
> have a really long common prefix, like "12" and
> "13", that's going to be more expensive to deal with--even
> though the strings are the same length.
>
> If your identifiers become unique after only a few characters, it may not be
> so bad.  But if they go many characters before you can distinguish between
> any two entries, you're probably not going to be happy with the performance
> or size of the indexes, relative to simple integer keys.

yeah.  The number of comparisons should be basically the same, but
situational things are going to make/break you.  As the in house
'performance guy', it might interest you to work through them all --
indexing strategies are the key to good database performance and,
modeling concepts and religious debates aside, this is an interesting
discussion from a strictly performance point of view.

One reason why natural keys work much better than expected is you get
much better index utilization and potentially *much* better tuple/page
efficiency on certain very common classes of lookups/scans especially
if you cluster.   Speaking of clustering, you no longer have to
agonize in cases of say, having to cluster on 'email' or 'email_id'.
The identifying and ordering/searching criteria are in the same index
which can be an enormous win in some cases.  Do not underestimate the
value of this when the table is large and dependent scans are common.

If you key on email and the query coming from another table doesn't
need any other email properties, you just saved yourself a join
without having to de-normailze in the classic sense.  You also get to
cut out many sorts on similar principles.

OTOH, updates as noted tend to suck.  In some cases hilariously so.
The larger index is going to cause more cache pressure which is a
point against...sometimes the large index just isn't worth the cost
for what you get. Also, you have a hard limit on key sizes imposed by
postgres.  It's rare to hit that in the real world but it should be
noted.  Another negative point is that the postgres stats system also
doesn't deal well with composite keys for range scans.  I have a
pretty good idea on how to fix this, but I haven't gotten around to it
yet.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Rearranging simple where clauses

2011-05-04 Thread Tom Lane

Michael Graham  writes:
> I was playing around with some sql in postgres and got to wondering why
> the optimiser can't figure out that rearranging some expressions can
> result in massive improvements in the queue plan.  For example id + 5 <
> 100 compared with id < 100 - 5.

> Is it simply that no one has go around to doing it or is there some
> deeper reasons?  It's not really important I'm just curious.

Well, it'd require a very large amount of
type-specific/operator-specific knowledge, and it's not clear what would
drive the planner towards doing useful rearrangements rather than
counterproductive ones, and the number of real-world queries where it'd
actually help doesn't seem to be that large.  I've seen one or two
complaints about that sort of thing, but it's way way down the list
of serious problems.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] SSDD reliability

2011-05-04 Thread Scott Ribe

Yeah, on that subject, anybody else see this:

<>

Absolutely pathetic.

-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] postgres segfaulting on pg_restore

2011-05-04 Thread Tom Lane

Chris Curvey  writes:
> in reverse order:  no third-party, no contrib, no home-brew C.here is a
> stack trace from a fresh build of 9.1 beta 1 built with enable-cassert.

>  Program received signal SIGSEGV, Segmentation fault.
> 0x00744777 in AllocSetAlloc (context=0x13f6d08, size=16) at
> aset.c:639
> 639 Assert(chunk->size >= size);
> #0  0x00744777 in AllocSetAlloc (context=0x13f6d08, size=16) at
> aset.c:639

Hmm, that's pretty interesting.  Possibly it's just another
manifestation of something scribbling past the end of its allocated
chunk, but my credence for that theory is way down if there's no add-on
code involved.  Does the postmaster log show any messages like
WARNING: detected write past chunk end in ...
before the crash?

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] undead index

2011-05-04 Thread Tom Lane

Jens Wilke  writes:
> pg_upgrade brakes with the following error:
> Could not find foo.bar_idx in old cluster

Hmm, is this an autogenerated index?  I suspect pg_upgrade can't cope if
it's been assigned a different name in the new cluster.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Rearranging simple where clauses

2011-05-04 Thread Michael Graham

On Wed, 2011-05-04 at 10:49 -0400, Tom Lane wrote:
> Well, it'd require a very large amount of
> type-specific/operator-specific knowledge, and it's not clear what
> would drive the planner towards doing useful rearrangements rather
> than counterproductive ones, and the number of real-world queries
> where it'd actually help doesn't seem to be that large.  I've seen one
> or two complaints about that sort of thing, but it's way way down the
> list of serious problems. 

I did suspect that the answer would be that the difficulty out ways the
benefit.  But in terms of driving the planner don't we always want to be
looking to move all the constants to one side of the expression since
the planner seems to like those?

-- 
Michael Graham 



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Rearranging simple where clauses

2011-05-04 Thread Tom Lane

Michael Graham  writes:
> I did suspect that the answer would be that the difficulty out ways the
> benefit.  But in terms of driving the planner don't we always want to be
> looking to move all the constants to one side of the expression since
> the planner seems to like those?

Well, you failed to show us any concrete examples of the cases you were
looking at, but no I don't think the planner necessarily likes "all the
constants on one side".  Most likely the win cases are where one side of
a WHERE-condition operator exactly matches an index, so you'd need to be
looking for places where rearrangement could make that happen.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] auto-reconnect: temp schemas, sequences, transactions

2011-05-04 Thread Marek Więckowski

On Monday 02 May 2011 17:32:26 Tom Lane wrote:
> If the client-side logic tries to re-issue these queries
> after re-connecting, it would be up to that logic to be careful about
> what to reissue or not.  Possibly this is a question for the author
> of your client library.

I see. So I have two use cases:
1) "my" client library;
2) psql :p

Let's look briefly at psql, shall we? (I would look at what needs changing in 
"my" client library when I understand what is according to you a proper 
solution to psql.)

If there is a script executed in psql there is no easy way to catch that psql 
has reconnected in the middle of it... The simplest example that could do a 
lot of damage would be a simple script executed in psql (by e.g. \i file_name 
or "psql -f"):
begin;
update test1 set value = value+1;
update test1 set value = value+10;
update test1 set value = value+100;
commit;
Obviously the intention is that all three queries succeed (values would be 
increased by 111) or all three fail (values not changed). 

See attached "orig" output: if backend dies just after the first query, then 
the next query triggers reconnect and psql does not execute it, but ... the 
third update query gets happily executed outside of transaction. So the result 
is that in this case value was increased by 100 - a really not expected (and 
possibly - very, very bad) result.

I think it should be considered a problem which should be solved in psql. What 
follows is my rough solution => it does solve the problem, although there is 
probably a much easier way to solve it.

Now, the problem is that psql (as well as any other client program/library) 
* would have hard time keeping track of whether its session is in a 
transaction;
* libpq already does this for us, in conn->xactStatus;
* but (because of check that conn->status == CONNECTION_OK in 
PQtransactionStatus() ) there is no way to get to the value of conn-
>xactStatus once the connection is dead (i.e. to the last trans status of a 
now-disconnected connection)

You will probably have a much better idea, but what I did (see attached patch) 
is I removed the part of PQtransactionStatus(const PGconn *conn) which says 
that "if conn->status != CONNECTION_OK then returned value is 
PQTRANS_UNKNOWN". Thus the meaning of the function PQtransactionStatus() 
changes slightly (instead of "trans status of an active connection" it would 
mean now "trans status of an active connection or last trans status of a 
disconnected connection"), but the API and binary compatibility of libpq is 
preserved.

Anyway, after this change I am able to get pre-disconnect trans status in psql 
just before reconnecting. And if we were in a transaction then after 
reconnecting I create immediately a trans-in-error (again, see a psql part of 
the attached patch; BTW: how to trigger an error in a way nicer than "select 
1/0", preferably with a message? i.e., is there some libpq equivalent of 
"raise exception"?).

See the "new" output in the attachment: the result is that, in the example I 
gave at the beginning of this mail, if there is a disconnect after the first 
update, then the second query triggers a reconnect, but we are in trans-in-
error, so also all subsequent queries fail => it is as if a proper db 
transaction was rolled back. I think this is a much, much better behaviour of 
psql, is it?

PS: It would be more straightforward to change PQreset() in a similar way 
instead of changing psql (PQreset has direct access to conn->xactStatus), but 
ofc PQreset() it's a part of public API of libpq; client code could in 
principle execute PQreset() when within a db transaction, and the expectation 
would be that after the call you get a "clean" new session. Still, maybe a not 
bad idea for the future would be to keep PQreset() as it is and add e.g. a 
PQreconnect() which would do basically the same but in the case the old 
connection was in transaction it would right away create a new trans-in-error 
in the new session? Well, just an idea => it would lead to even less handling 
in the client programs (like psql => it would just call PQreconnect() and not 
have to issue "begin + create and error").

Best,
~Marek
test=# select * from test1 ;
 id | value
+---
  1 |11
  2 |21
  3 |31
(3 rows)

test=# begin;
BEGIN

test=# update test1 set value = value + 1;
UPDATE 3

test=# update test1 set value = value + 10;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

test=# update test1 set value = value + 100;
UPDATE 3

test=# commit;
NOTICE:  there is no transaction in progress
COMMIT

test=# select * from test1 ;
 id | value
+---
  1 |   111
  2 |   121
  3 |   131
(3 rows)
diff -ruN postgresql-8.4.8_orig/src/bin/psql/common.c postgresql-8.4.8_new/src/bin/psql/common.c
--- postgr

Re: [GENERAL] Rearranging simple where clauses

2011-05-04 Thread Michael Graham

On Wed, 2011-05-04 at 11:49 -0400, Tom Lane wrote:
> Well, you failed to show us any concrete examples of the cases you
> were looking at, but no I don't think the planner necessarily likes
> "all the constants on one side".  Most likely the win cases are where
> one side of a WHERE-condition operator exactly matches an index, so
> you'd need to be looking for places where rearrangement could make
> that happen. 

The reason I never showed you any was because I don't have any I was
just curious.  But yeah making one side match an index exactly is
probably the biggest win.

-- 
Michael Graham 



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] auto-reconnect: temp schemas, sequences, transactions

2011-05-04 Thread Tom Lane

Marek =?utf-8?q?Wi=C4=99ckowski?=  writes:
> If there is a script executed in psql there is no easy way to catch that psql 
> has reconnected in the middle of it...

As far as psql goes, it should certainly abandon executing any script
file if it loses the connection.  I rather thought it did already.
See the bit just above where you propose patching:

if (!pset.cur_cmd_interactive)
{
psql_error("connection to server was lost\n");
exit(EXIT_BADCONN);
}


regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] auto-reconnect: temp schemas, sequences, transactions

2011-05-04 Thread Marek Więckowski

On Wednesday 04 May 2011 18:04:16 Tom Lane wrote:
> Marek Wieckowski  writes:
> > If there is a script executed in psql there is no easy way to catch that
> > psql has reconnected in the middle of it...
> 
> As far as psql goes, it should certainly abandon executing any script
> file if it loses the connection.  I rather thought it did already.
> See the bit just above where you propose patching:
> 
>   if (!pset.cur_cmd_interactive)
>   {
>   psql_error("connection to server was lost\n");
>   exit(EXIT_BADCONN);
>   }

Hmm, indeed, I've missed it. Clear for psql. And yes, I agree that for psql it 
is better to exit. Thanks for the answer.

But for the library which I'm using, simply exiting/aborting is not an option 
(and this is why I was looking into this in the first place). There is a 
danger that client programs will continue issuing queries while believing that 
they are in a transaction... They do expect db errors and rolled back 
transactions, but not that their begin-commit section would be executed only 
partially. Solving this on the level of "my" library would solve it once for 
good (and the alternative sounds more complex: it would require exposing extra 
info to the programs using this library, and add handling of reconnect 
situation in each of these programs etc.).

In my head, it wraps up to a following structure: In the library (which gives 
access to libpq functionality):
1. anytime we use db connection we would check if connection is OK;
2. if yes, we would ask for PQtransactionStatus() and keep a copy of returned 
status;
3. if not, we would try to reconnect, BUT based on (2.) we would know if 
before the connection was lost we were in a trans;
4. if we were in a trans before disconnect, then immediately after 
reconnecting we would create a trans-in-error.

Does the above make sense to you? Any points of attention?

Also, would you have any suggestion for how to create a trans-in-error in a 
way nicer than  
begin;
select 1/0;
preferably with a message? In other words, is there some libpq equivalent of 
"raise exception"?

Best,
~Marek

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] SSDD reliability

2011-05-04 Thread Scott Ribe

On May 4, 2011, at 10:50 AM, Greg Smith wrote:

> Your link didn't show up on this.

Sigh... Step 2: paste link in ;-)




-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] auto-reconnect: temp schemas, sequences, transactions

2011-05-04 Thread Tom Lane

Marek =?utf-8?q?Wi=C4=99ckowski?=  writes:
> But for the library which I'm using, simply exiting/aborting is not an option 
> (and this is why I was looking into this in the first place). There is a 
> danger that client programs will continue issuing queries while believing 
> that 
> they are in a transaction... They do expect db errors and rolled back 
> transactions, but not that their begin-commit section would be executed only 
> partially. Solving this on the level of "my" library would solve it once for 
> good (and the alternative sounds more complex: it would require exposing 
> extra 
> info to the programs using this library, and add handling of reconnect 
> situation in each of these programs etc.).

Well, I think it's foolish to imagine that a client library should try
to do transparent reconnection: it's somewhere between difficult and
impossible to keep track of all the server-side state that the
application might be relying on, above and beyond the immediate problem
of an unfinished transaction.  It's almost always better to punt the
problem back to the application, and let it decide whether to try again
or just curl up and die.

If you have server restarts occurring often enough that this seems
useful to work on, then I submit that you have problems you ought to be
fixing on the server side instead.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] SSDD reliability

2011-05-04 Thread David Boreham


On 5/4/2011 11:15 AM, Scott Ribe wrote:


Sigh... Step 2: paste link in ;-)




To be honest, like the article author, I'd be happy with 300+ days to 
failure, IF the drives provide an accurate predictor of impending doom.
That is, if I can be notified "this drive will probably quit working in 
30 days", then I'd arrange to cycle in a new drive.

The performance benefits vs rotating drives are for me worth this hassle.

OTOH if the drive says it is just fine and happy, then suddenly quits 
working, that's bad.


Given the physical characteristics of the cell wear-out mechanism, I 
think it should be possible to provide a reasonable accurate remaining 
lifetime estimate, but so far my attempts to read this information via 
SMART have failed, for the drives we have in use here.


FWIW I have a server with 481 days uptime, and 31 months operating  that 
has an el-cheapo SSD for its boot/OS drive.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] auto-reconnect: temp schemas, sequences, transactions

2011-05-04 Thread Andrew Sullivan

On Wed, May 04, 2011 at 07:03:31PM +0200, Marek Więckowski wrote:

> (and this is why I was looking into this in the first place). There is a 
> danger that client programs will continue issuing queries while believing 
> that 
> they are in a transaction... They do expect db errors and rolled back 
> transactions, but not that their begin-commit section would be executed only 
> partially.

I don't understand.  If they are prepared for errors, then if they
have any error they have to roll back to the beginning of the
savepoint or the transaction if there's no savepoint.  What is this
"partial execution" of which you speak?  Nothing is partially
executed: if the transaction rolls back, everything is lost.

> good (and the alternative sounds more complex: it would require exposing 
> extra 
> info to the programs using this library, and add handling of reconnect 
> situation in each of these programs etc.).

[. . .]

> 4. if we were in a trans before disconnect, then immediately after 
> reconnecting we would create a trans-in-error.

You're going to have to expose trans-in-error to the client anyway.
What's the difference between that and exposing the program to needing
to fix its connection?

A

-- 
Andrew Sullivan
a...@crankycanuck.ca

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] SSDD reliability

2011-05-04 Thread Scott Ribe

On May 4, 2011, at 11:31 AM, David Boreham wrote:

> To be honest, like the article author, I'd be happy with 300+ days to 
> failure, IF the drives provide an accurate predictor of impending doom.

No problem with that, for a first step. ***BUT*** the failures in this article 
and many others I've read about are not in high-write db workloads, so they're 
not write wear, they're just crappy electronics failing.

-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] ZEOS or PGDAC - How to lock a resource?

2011-05-04 Thread durumdara


Hi!

We will porting an application to PGSQL from some table based app (BDE 
like).


The older application used a special technic of the driver: if a record 
edited, some exclusive (over transaction), "forever living" lock put on it.

On exit, cancel, or post this lock removed.

We used this to lock the main resource from concurrent edits.

For example:
A product (a Car) have many properties and sublists (details), like 
color, transport date, elements (what we need to build into car: wheel, 
etc), bill informations, subtransport times, etc.
Because ALL of them is the product, we must protect it with "Edit lock" 
on open.

The subinformations are easily editable, and postable (there is autocommit).

Now I search for some technics in PGSQL.
As I read, the locks are transaction depended, because they are vanishes 
on rollback/commit.


But we want to save the subelements on editing (one by one), not on 
saving the main.
This meaning that we break the transaction with commit - ergo the lock 
vanish.


For example:
Car Edit:
- Lock This car
- Edit color
- Open product elements tab
- Add two new elements
- Save them (ApplyUpdates, Commit)
- Add a bill date
- Save it (Apply, Commit)
- Post car record (Apply, Commit)
- Release resource
- Close Form

I read the help, but I saw only transaction-dependent locks.

Zeos or PGDAC is not like IBX/IBO (Firebird), so they don't have 
Transaction Component, I can use only one transaction by connection.


How can I do a lock mechanism that:
- Session based
- No limit on how many I used
- Linked to a Row, or a Resource Name
- I can test to is it exists or not

Like Mutex in Windows, but in PGSQL...

Thanks for your help:
   dd

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Fwd: Re: [GENERAL] SSDD reliability

2011-05-04 Thread David Boreham




No problem with that, for a first step. ***BUT*** the failures in this article 
and
many others I've read about are not in high-write db workloads, so they're not 
write wear,
they're just crappy electronics failing.


As a (lapsed) electronics design engineer, I'm suspicious of the notion that
a subassembly consisting of solid state devices surface-mounted on FR4 
substrate will fail
except in very rare (and of great interest to the manufacturer) circumstances.
And especially suspicious that one product category (SSD) happens to have a much
higher failure rate than all others.

Consider that an SSD is much simpler (just considering the electronics) than a 
traditional
disk drive, and subject to less vibration and heat.
Therefore one should see disk drives failing at the same (or higher rate).
Even if the owner is highly statically charged, you'd expect the to destroy all 
categories
of electronics at roughly the same rate (rather than just SSDs).

So if someone says that SSDs have "failed", I'll assume that they suffered from 
Flash cell
wear-out unless there is compelling proof to the contrary.





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] GROUP BY Wildcard Syntax Thought

2011-05-04 Thread David Johnston

When specifying columns in a GROUP BY clause would it be possible to use a
wildcard to specify all columns coming from a given relation?

 

SELECT 

rosum.*, 

sum(ld.amount) AS ldcost, 

count(ld.amount) AS ldcount, 

rosum.rocost + sum(ld.amount) AS netbal

 

FROM (

SELECT w.s_id, w.accountnumber, w.date_reference, w.invoicenumber, 

sum(w.amount_cost) AS rocost, count(w.amount_cost) AS rocount

FROM wiplaboradpstaging w

WHERE 

GROUP BY w.s_id, w.accountnumber, w.date_reference, w.invoicenumber

) rosum

 

LEFT JOIN ledgerdetail ld ON (

rosum.s_id = ld.s_id AND 

rosum.accountnumber = ld.accountnumber AND 

rosum.invoicenumber = ld.reference

)

 

GROUP BY rosum.*  < --- WildCard Group By Since the rosum sub-query has
already been subjected to a GROUP BY

 

Attempt to run this query in 9.0.3 results in:

 

"SQL Error: ERROR:  column "rosum.s_id" must appear in the GROUP BY clause
or be used in an aggregate function

LINE 2: rosum.*,"

 

David J.

Re: [GENERAL] pervasiveness of surrogate (also called synthetic) keys

2011-05-04 Thread Misa Simic

> Being the “first line” or the “second line” of a physical invoice is a
> property for that line.  Identifying its position on the invoice is only
> natural.



> Specifically, the position of the line on the invoice; you can't have to
> invoice lines at the second line of aninvoice for example.   This
> information is visible and important to users because presumably the
> invoice lines as entered into an application are in the order you would
> want them printed out on aphysical invoice.


I agree that Line No could say, that is order in which Item is added to
Invoice - but definately it will not mean that it should be printed in that
order... Items could be grouped by Category or whatever on Invoice... in
which order Items will be printed I would say it is more Business Rule...
and as all Business Rules it is changeable by time... for example 1 rule
known to me: SLA item (charge for Service Licence Agreement) is always on
the end, and its price is 20% of other items SubTotal for what SLA should be
paid (so for some Items on Invoice, SLA should be paid, for some not - we
don't know in which order user will enter Items)...




> By your reasoning all identifiers are synthetically generated if you
> consider there is never truly only a single instance of anything in the
> multi-verse.  The only truly unique identifier would be the time+place of an
> objects creation.
>
>
>
> “Hello - person born in Liverpool London, St. Whatever hospital, Room 101 @
> 13:14:57AM on the 5th of March 2001 – how may I direct your call?” (I
> guess you could use the conception date as well although twins+ might be
> tough to distinguish in that case).
>




> Specifically, the position of the line on the invoice; you can't have to
> invoice lines at the second line of aninvoice for example.   This
> information is visible and important to users because presumably the
> invoice lines as entered into an application are in the order you would
> want them printed out on aphysical invoice.



Yes, I agree, I really think that "Natural" keys do not exist... but
definatelly there are Entity properties, like e-mail, InvoiceNo (in whatever
format), what must be Unique (if Business Rule say: that property must be
unique) That is even hard for humans to think and identify someone on
that way - so humans invented SSN - to make them life easier... but it is
known that SSN is not best canditate for PK...

How humans identify someone/something - is totally different way then how
Relational Database identifies some record...

Human better Identify some person by picture (if it is known to them, if
have it in their memory) - Relational Database, well, today could by 1
digitial picture and just that one...  Tomorrow maybe...


If you key on email and the query coming from another table doesn't
> need any other email properties, you just saved yourself a join
> without having to de-normailze in the classic sense.  You also get to
> cut out many sorts on similar principles.



Performance and Reliability - that are main things I worry about... and then
made some tests...

JOIN - that is natural thing for Relational Database... Especially JOIN on
Keys - I agree that JOIN on non key columns are expensive...

I can't imagine, that we can get some info from ERP system without join on
every day basis, like just e-mail, Amount as salary in that period...
without First Name, Last Name... but lets do the test

So we have two tables, in two models...

Employees table: 50 000 records

Salaries table:  100 000 records

(For a first test, I didn't have time to wait to fill table with 100 000 000
records... but will run it over night...)

Model one:
Employee
-e-mail (PK)
-FirstName
-LastName


Salary (PK: e-mail, StartDate, EndDate)
-e-mail (FK to employee)
-StartDate
-EndDate
-Amount

(I recommend StartDate/EndDate to replace with Period datatype from many
reasons - but out of this test topic)


Model two:
Employee
-EmpID (UUID -PK)
-e-mail
-FirstName
-LastName


Salary
-SalID (UUID PK)
-EmpID (UUID FK to Employee)
-StartDate
-EndDate
-Amount


Test for Model 1:

SELECT e-mail, startdate, enddate, Amount FROM Salary

Returns 100 000k records in 1000 ms

SELECT Salary.e-mail, FirstName, LastName, startdate, enddate, Amount FROM
Salary INNER JOIN Employees ON Salary.e-mail = Employees.e-mail

Returns 100 000k records in 240 ms

*WOW - query with JOIN returned same result 4 times faster!!!*
*
*
just tried then:

SELECT Salary.e-mail, FirstName, LastName, startdate, enddate, Amount FROM
Salary INNER JOIN Employees ON Salary.e-mail = Employees.e-mail

all info from Salary table but just joined to emploees without columns from
Employees table...

Returns 100 000k records in 240 ms


test for Model2:

SELECT emp_id, startdate, enddate, amount FROM Salary

Returned 100 000k rows in 1314 ms (worst result - though in practice that
query would never been run because of emp_id as GUID means nothing to
human...)

SELECT email, fname, lname, startdate, enddate, amoun

Re: [GENERAL] ZEOS or PGDAC - How to lock a resource?

2011-05-04 Thread Merlin Moncure

2011/5/4 durumdara :
> Hi!
>
> We will porting an application to PGSQL from some table based app (BDE
> like).
>
> The older application used a special technic of the driver: if a record
> edited, some exclusive (over transaction), "forever living" lock put on it.
> On exit, cancel, or post this lock removed.
>
> We used this to lock the main resource from concurrent edits.
>
> For example:
> A product (a Car) have many properties and sublists (details), like color,
> transport date, elements (what we need to build into car: wheel, etc), bill
> informations, subtransport times, etc.
> Because ALL of them is the product, we must protect it with "Edit lock" on
> open.
> The subinformations are easily editable, and postable (there is autocommit).
>
> Now I search for some technics in PGSQL.
> As I read, the locks are transaction depended, because they are vanishes on
> rollback/commit.
>
> But we want to save the subelements on editing (one by one), not on saving
> the main.
> This meaning that we break the transaction with commit - ergo the lock
> vanish.
>
> For example:
> Car Edit:
> - Lock This car
> - Edit color
> - Open product elements tab
> - Add two new elements
> - Save them (ApplyUpdates, Commit)
> - Add a bill date
> - Save it (Apply, Commit)
> - Post car record (Apply, Commit)
> - Release resource
> - Close Form
>
> I read the help, but I saw only transaction-dependent locks.
>
> Zeos or PGDAC is not like IBX/IBO (Firebird), so they don't have Transaction
> Component, I can use only one transaction by connection.
>
> How can I do a lock mechanism that:
> - Session based
> - No limit on how many I used
> - Linked to a Row, or a Resource Name
> - I can test to is it exists or not
>
> Like Mutex in Windows, but in PGSQL...

advisory locks
http://www.postgresql.org/docs/current/interactive/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS

(aside: borland delphi is increasingly obsolete in the scheme of
things, but zeos is one of the best postgres drivers ever written!)


merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] GROUP BY Wildcard Syntax Thought

2011-05-04 Thread Tom Lane

"David Johnston"  writes:
> When specifying columns in a GROUP BY clause would it be possible to use a
> wildcard to specify all columns coming from a given relation?

I think the need for this will be largely superseded by the SQL-standard
behavior that grouping by a primary key is sufficient (which we've
finally implemented as of 9.1).

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pervasiveness of surrogate (also called synthetic) keys

2011-05-04 Thread Scott Marlowe

On Tue, May 3, 2011 at 8:03 PM, Greg Smith  wrote:

> With a uniqueness constraint in this situation, the unexpected data--row
> with a non unique MAC--will be rejected and possibly lost when the insertion
> happens.  You say that's a good thing, plenty of people will say that's the
> worst possible thing that can happen.

But remember the original discussion is on using these are PK/FK.
That's where things get really ugly.  I can change my data model to
not have a unique MAC or to do something to make them unique (add IP
or something) much more easily if they're NOT a PK/FK.  That's the
real issue to me.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Fwd: Re: [GENERAL] SSDD reliability

2011-05-04 Thread Greg Smith


On 05/04/2011 03:24 PM, David Boreham wrote:
So if someone says that SSDs have "failed", I'll assume that they 
suffered from Flash cell

wear-out unless there is compelling proof to the contrary.


I've been involved in four recovery situations similar to the one 
described in that coding horror article, and zero of them were flash 
wear-out issues.  The telling sign is that the device should fail to 
read-only mode if it wears out.  That's not what I've seen happen 
though; what reports from the field are saying is that sudden, complete 
failures are the more likely event.


The environment inside a PC of any sort, desktop or particularly 
portable, is not a predictable environment.  Just because the drives 
should be less prone to heat and vibration issues doesn't mean 
individual components can't slide out of spec because of them.  And hard 
drive manufacturers have a giant head start at working out reliability 
bugs in that area.  You can't design that sort of issue out of a new 
product in advance; all you can do is analyze returns from the field, 
see what you screwed up, and do another design rev to address it.


The idea that these new devices, which are extremely complicated and 
based on hardware that hasn't been manufactured in volume before, should 
be expected to have high reliability is an odd claim.  I assume that any 
new electronics gadget has an extremely high failure rate during its 
first few years of volume production, particularly from a new 
manufacturer of that product.


Intel claims their Annual Failure Rate (AFR) on their SSDs in IT 
deployments (not OEM ones) is 0.6%.  Typical measured AFR rates for 
mechanical drives is around 2% during their first year, spiking to 5% 
afterwards.  I suspect that Intel's numbers are actually much better 
than the other manufacturers here, so a SSD from anyone else can easily 
be less reliable than a regular hard drive still.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Fwd: Re: [GENERAL] SSDD reliability

2011-05-04 Thread David Boreham


On 5/4/2011 6:02 PM, Greg Smith wrote:

On 05/04/2011 03:24 PM, David Boreham wrote:
So if someone says that SSDs have "failed", I'll assume that they 
suffered from Flash cell

wear-out unless there is compelling proof to the contrary.


I've been involved in four recovery situations similar to the one 
described in that coding horror article, and zero of them were flash 
wear-out issues.  The telling sign is that the device should fail to 
read-only mode if it wears out.  That's not what I've seen happen 
though; what reports from the field are saying is that sudden, 
complete failures are the more likely event.


Sorry to harp on this (last time I promise), but I somewhat do know what 
I'm talking about, and I'm quite motivated to get to the bottom of this 
"SSDs fail, but not for the reason you'd suspect" syndrome (because we 
want to deploy SSDs in production soon).


Here's my best theory at present : the failures ARE caused by cell 
wear-out, but the SSD firmware is buggy in so far as it fails to boot up 
and respond to host commands due to the wear-out state. So rather than 
the expected outcome (SSD responds but has read-only behavior), it 
appears to be (and is) dead. At least to my mind, this is a more 
plausible explanation for the reported failures vs. the alternative (SSD 
vendors are uniquely clueless at making basic electronics 
subassemblies), especially considering the difficulty in testing the 
firmware under all possible wear-out conditions.


One question worth asking is : in the cases you were involved in, was 
manufacturer failure analysis performed (and if so what was the failure 
cause reported?).


The environment inside a PC of any sort, desktop or particularly 
portable, is not a predictable environment.  Just because the drives 
should be less prone to heat and vibration issues doesn't mean 
individual components can't slide out of spec because of them.  And 
hard drive manufacturers have a giant head start at working out 
reliability bugs in that area.  You can't design that sort of issue 
out of a new product in advance; all you can do is analyze returns 
from the field, see what you screwed up, and do another design rev to 
address it.
That's not really how it works (I've been the guy responsible for this 
for 10 years in a prior career, so I feel somewhat qualified to argue 
about this). The technology and manufacturing processes are common 
across many different types of product. They either all work , or they 
all fail. In fact, I'll eat my keyboard if SSDs are not manufactured on 
the exact same production lines as regular disk drives, DRAM modules, 
and so on (manufacturing tends to be contracted to high volume factories 
that make all kinds of things on the same lines). The only different 
thing about SSDs vs. any other electronics you'd come across is the 
Flash devices themselves. However, those are used in extraordinary high 
volumes all over the place and if there were a failure mode with the 
incidence suggested by these stories, I suspect we'd be reading about it 
on the front page of the WSJ.




Intel claims their Annual Failure Rate (AFR) on their SSDs in IT 
deployments (not OEM ones) is 0.6%.  Typical measured AFR rates for 
mechanical drives is around 2% during their first year, spiking to 5% 
afterwards.  I suspect that Intel's numbers are actually much better 
than the other manufacturers here, so a SSD from anyone else can 
easily be less reliable than a regular hard drive still.


Hmm, this is speculation I don't support (non-intel vendors have a 10x 
worse early failure rate). The entire industry uses very similar 
processes (often the same factories). One rogue vendor with a bad 
process...sure, but all of them ??


For the benefit of anyone reading this who may have a failed SSD : all 
the tier 1 manufacturers have departments dedicated to the analysis of 
product that fails in the field. With some persistence, you can usually 
get them to take a failed unit and put it through the FA process (and 
tell you why it failed). For example, here's a job posting for someone 
who would do this work :

http://www.internmatch.com/internships/4620/intel/ssd-failure-analysis-intern-592345
I'd encourage you to at least try to get your failed devices into the 
failure analysis pile. If units are not returned, the manufacturer never 
finds out what broke, and therefore can't fix the problem.







--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Fwd: Re: [GENERAL] SSDD reliability

2011-05-04 Thread Scott Marlowe

On Wed, May 4, 2011 at 6:31 PM, David Boreham  wrote:
>
> this). The technology and manufacturing processes are common across many
> different types of product. They either all work , or they all fail.

Most of it is.  But certain parts are fairly new, i.e. the
controllers.  It is quite possible that all these various failing
drives share some long term ~ 1 year degradation issue like the 6Gb/s
SAS ports on the early sandybridge Intel CPUs.  If that's the case
then the just plain up and dying thing makes some sense.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Fwd: Re: [GENERAL] SSDD reliability

2011-05-04 Thread David Boreham


On 5/4/2011 9:06 PM, Scott Marlowe wrote:

Most of it is.  But certain parts are fairly new, i.e. the
controllers.  It is quite possible that all these various failing
drives share some long term ~ 1 year degradation issue like the 6Gb/s
SAS ports on the early sandybridge Intel CPUs.  If that's the case
then the just plain up and dying thing makes some sense.


That Intel SATA port circuit issue was an extraordinarily rare screwup.

So ok, yeah...I said that chips don't just keel over and die mid-life
and you came up with the one counterexample in the history of
the industry :)  When I worked in the business in the 80's and 90's
we had a few things like this happen, but they're very rare and
typically don't escape into the wild (as Intel's pretty much didn't).
If a similar problem affected SSDs, they would have been recalled
and lawsuits would be underway.

SSDs are just not that different from anything else.
No special voodoo technology (besides the Flash devices themselves).



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Fwd: Re: [GENERAL] SSDD reliability

2011-05-04 Thread Scott Marlowe

On Wed, May 4, 2011 at 9:34 PM, David Boreham  wrote:
> On 5/4/2011 9:06 PM, Scott Marlowe wrote:
>>
>> Most of it is.  But certain parts are fairly new, i.e. the
>> controllers.  It is quite possible that all these various failing
>> drives share some long term ~ 1 year degradation issue like the 6Gb/s
>> SAS ports on the early sandybridge Intel CPUs.  If that's the case
>> then the just plain up and dying thing makes some sense.
>
> That Intel SATA port circuit issue was an extraordinarily rare screwup.
>
> So ok, yeah...I said that chips don't just keel over and die mid-life
> and you came up with the one counterexample in the history of
> the industry :)  When I worked in the business in the 80's and 90's
> we had a few things like this happen, but they're very rare and
> typically don't escape into the wild (as Intel's pretty much didn't).
> If a similar problem affected SSDs, they would have been recalled
> and lawsuits would be underway.

Not necessarily.  If there's a chip that has a 15% failure rate
instead of the predicted <1% it might not fail enough for people to
have noticed, since a user with a typically small sample might think
he just got a bit unlucky etc.  Nvidia made GPUs that overheated and
died by the thousand, but took 1 to 2 years to die.  There WAS a
lawsuit, and now to settle it, they're offering to buy everybody who
got stuck with the broken GPUs a nice single core $279 Compaq
computer, even if they bought a $4,000 workstation with one of those
dodgy GPUs.

There's a lot of possibilities as to why some folks are seeing high
failure rates, it'd be nice to know the cause.  But we can't assume
it's not an inherent problem with some part in them any more than we
can assume that it is.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] SSDD reliability

2011-05-04 Thread Toby Corkindale


On 05/05/11 03:31, David Boreham wrote:

On 5/4/2011 11:15 AM, Scott Ribe wrote:


Sigh... Step 2: paste link in ;-)




To be honest, like the article author, I'd be happy with 300+ days to
failure, IF the drives provide an accurate predictor of impending doom.
That is, if I can be notified "this drive will probably quit working in
30 days", then I'd arrange to cycle in a new drive.
The performance benefits vs rotating drives are for me worth this hassle.

OTOH if the drive says it is just fine and happy, then suddenly quits
working, that's bad.

Given the physical characteristics of the cell wear-out mechanism, I
think it should be possible to provide a reasonable accurate remaining
lifetime estimate, but so far my attempts to read this information via
SMART have failed, for the drives we have in use here.


In what way has the SMART read failed?
(I get the relevant values out successfully myself, and have Munin graph 
them.)



FWIW I have a server with 481 days uptime, and 31 months operating that
has an el-cheapo SSD for its boot/OS drive.


Likewise, I have a server with a first-gen SSD (Kingston 60GB) that has 
been running constantly for over a year, without any hiccups. It runs a 
few small websites, a few email lists, all of which interact with 
PostgreSQL databases.. lifetime writes to the disk are close to 
three-quarters of a terabyte, and despite its lack of TRIM support, the 
performance is still pretty good.


I'm pretty happy!

I note in the comments of that blog post above, it includes:

"I have shipped literally hundreds of Intel G1 and G2 SSDs to my 
customers and never had a single in the field failure (save for one 
drive in a laptop where the drive itself functioned fine but one of the 
contacts on the SATA connector was actually flaky, probably from 
vibrational damage from a lot of airplane flights, and one DOA drive). I 
think you just got unlucky there."


I do have to wonder if this Portman Wills guy was somehow Doing It Wrong 
to get a 100% failure rate over eight disks..


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

50 matches

Mail list logo