from:"Chris Browne"

Re: [GENERAL] Stored Procedures Performance: Java vs. PL/Pgsql

2007-05-28 Thread Chris Browne

[EMAIL PROTECTED] (Daniel Kastenholz) writes:
> Hi,
>
> is there a rule of thumb or an estimate which language provides the
> best performance for stored procedures?
>
> I'm currently using PL/Pgsql, and wondering
> - to what extent these stored procedures are actually precompiled
> and/or optimized
> - if Java might provide better performance, or if parameter
> transformation and VM calls eat up the performance gains (if any)
>
> I'm aware that these questions can't be answered in a "universally
> valid" fashion, but I would appreciate your experiences.

The sorts of things I'd expect Java SPs to do better would be the
handling of complex arithmetic computations.

Notably, you can use native Java types to calculate things; doing the
equivalents in pl/pgsql would require iterating across sets of
interpreted SQL statements.

If your code is not "rich" in that, then Java won't provide advantage
in that area.

A place where I'd expect Java to be an anti-optimization (as compared
to pl/pgsql) is that there's little opportunity for the query
optimizer to penetrate into the code to be able to optimize usage of
the stored procedure.  As pl/pgsql is a pretty thin veneer atop SQL,
there is more that the query optimizer can do, analytically...

I'm speaking analytically here; I haven't used Java SPs.  I have used
various other SPs, though, and this sort of result does normally hold
true.
-- 
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://www3.sympatico.ca/cbbrowne/emacs.html
(1) Sigs are preceded by the "sigdashes" line, ie "\n-- \n" (dash-dash-space).
(2) Sigs contain at least the name and address of the sender in the first line.
(3) Sigs are at most four lines and at most eighty characters per line.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Slightly OT.

2007-06-01 Thread Chris Browne

[EMAIL PROTECTED] writes:
> I'm disappointed because SLONY-II has not been released yet to support
> multi-master replication!  PostgreSQL is going through all of the
> releases - and that's great - BUT, where is the sync-up with the
> powerhouse of a component, that Slony-II would bring to the table?
> Slony-I is pretty sweet, but if Slony-II would release, I can imagine
> that this would introduce some major competition in the enterprise
> world against the commercial dyno's.

There is some effort still going into Postgres-R from a research
perspective, but as far as I know, nobody has been working on Slony-II
for well over a year now.

Unfortunately, a combination of factors went together to make it "not
workable."

- Spread licensing was something of an issue;
- Spread scalability caused quite a few issues (it's apparently
  tough to make it stable when using it under HEAVY load);
- There was a token passing bottleneck in Spread limiting
  its performance;
- New application failure scenarios emerged that wouldn't have
  take place in non-MM systems.

The issues built to the point of making it unworthwhile to continue
development effort :-(.

If someone completed a suitable reimplementation of the wheel on GCS,
and produced something more usable for the purpose, such as the
(theorized) Anasazi system, it might be worth proceeding again.

http://www.lethargy.org/~jesus/archives/53-Lets-reimplement-the-wheel...-or-at-least-another-GCS..html

But it's fair to say that reality did not live up to the early hopes.
Supposing we came up with a Wicked Better GCS, that might merely allow
efforts to get a bit further, and then hang up on something else.
Don't hold your breath expecting Slony-II to be around any corners...
-- 
let name="cbbrowne" and tld="acm.org" in name ^ "@" ^ tld;;
http://linuxfinances.info/info/sap.html
"It seems  certain that much of  the success of Unix  follows from the
readability, modifiability, and portability of its software."
-- Dennis M. Ritchie, September, 1979

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] multimaster

2007-06-01 Thread Chris Browne

[EMAIL PROTECTED] ("Alexander Staubo") writes:
> On 6/1/07, Andrew Sullivan <[EMAIL PROTECTED]> wrote:
>> These are all different solutions to different problems, so it's not
>> surprising that they look different.  This was the reason I asked,
>> "What is the problem you are trying to solve?"
>
> You mean aside from the obvious one, scalability?

I'd have to call that expectation "obviously WRONG."

It was *CERTAIN* that Slony-II, if it had turned out as good as the
*most optimistic hopes* were going, would have some substantial losses
of performance compared to a single DB instance due to the need to
apply locks across all nodes.

There would be *some* scalability gains to be had, but the primary
reason for looking for multimaster replication is that you need high
availability so badly that you are willing to give up performance to
get it.

> As it stands today, horizontally partitioning a database into multiple
> separate "shards" is incredibly invasive on the application
> architecture, and typically relies on brittle and non-obvious hacks
> such as configuring sequence generators with staggered starting
> numbers, omitting referential integrity constraints, sacrificing
> transactional semantics, and moving query aggregation into the app
> level. On top of this, dumb caches such as Memcached are typically
> layered to avoid hitting the database in the first place.

Question: In what way would you expect an attempt to do
mostly-trying-to-be-transparent multimaster replication to help with
these issues you're bringing up?

Slony-II was trying to provide answers to various of those
"non-obvious hacks"; various of those things point at areas where it
would *have to* be somewhat slow.

> Still, with MySQL and a bit of glue, guys like eBay, Flickr and
> MySpace are partitioning their databases relatively successfully using
> such tricks. These guys are not average database users, but not they
> are not the only ones that have suffered from database bottlenecks and
> overcome them using clever, if desperate, measures. Cal Henderson (or
> was it Stewart Butterfield?) of Flickr has famously said he would
> never again start a project that didn't have a partitioning from the
> start.
>
> I would love to see a discussion about how PostgreSQL could address
> these issues.

Partitioning isn't multimaster replication; it's something worthy of
having a discussion independent of anything about MMR.
-- 
(reverse (concatenate 'string "ofni.sesabatadxunil" "@" "enworbbc"))
http://linuxdatabases.info/info/sap.html
"There  are  almost  unlimited  ways  for making  your  programs  more
complicated or bizarre" -- Arthur Norman

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Running OS-level programs from Postgres?

2007-06-20 Thread Chris Browne

[EMAIL PROTECTED] (Sean Murphy) writes:
> My present need is for email notification; if there's a pg function or
> module that would handle this (I haven't turned up anything in my
> searches, but maybe I'm using the wrong search terms in the wrong
> places) I'd be OK for now, but I'd rather have the flexibility of
> starting whatever process a given circumstance calls for.

There are ways of doing this via untrusted stored function languages,
but this approach tends to be discouraged.

Consider: If you submit 200 of these requests, you may spawn your
mailer system 200 times, which may be a spectacularly inefficient
thing to do.  Indeed, spawning it 200 times in a second might trigger
Negative Reactions in system components.  (Consider: What if your
system tried delivering 200 pieces of mail concurrently, and spawned,
as a side-effect, 200 copies of SpamAssassin?)

The better answer tends to be to try to decouple this.

- Write the data that you want sent out into a row in a table that
  implements a Message Queue.

- Perhaps submit a NOTIFY request, if you want things to kick off at
  once.

- Have some other, separate process, that LISTENs for notifications
  (or which simply wakes up once in a while).  This process goes
  through the Message Queue, doing some work on each item to send the
  message on towards its destination.

This way there is only ONE process that wakes up and processes the
work queue.  It might be a program that's mostly outside PostgreSQL...
-- 
"cbbrowne","@","cbbrowne.com"
http://linuxdatabases.info/info/rdbms.html
"It is not enough to succeed, others must fail."  -- Gore Vidal

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Experiences of PostgreSQL on-disk bitmap index patch

2007-06-25 Thread Chris Browne

[EMAIL PROTECTED] ("Christan Josefsson") writes:
> So you indicate that the so called bitmap index scan, a.k.a
> in-memory bitmap indexes (right?), already adds such an
> improvement when it comes to optimized response time on large
> query sets (having the characteristics as normally used to
> identify cases where bitmap indexes improves performance like:
> low cardinality keys, large data volumes etc), so that the
> on-disk indexes are not really needed or atleast not worth wile
> implementing?

It looks very much like that may be the case...

Bitmap index scans have a somewhat different set of functionality, but
there is enough overlap that the cases where on-disk bitmap indexes
are useful (and in-memory bitmap scans aren't) look like rare edge
cases.

There may be users that see those "rare edge cases" all the time;
they'll find on-disk bitmap indexes worth having, and, possibly, worth
implementing.

But to be sure, there used to be a lot of "burning interest" in
on-disk bitmap indexes, and in-memory bitmap index scans have quenched
many of the flames...
-- 
"cbbrowne","@","cbbrowne.com"
http://linuxfinances.info/info/advocacy.html
">WindowsNT will not accept fecal matter in its diet... it's that simple.

I suppose that is a good ward against cannibalism." -- Nick Manka

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] SMTP

2007-07-16 Thread Chris Browne

[EMAIL PROTECTED] (paddy carroll) writes:
> store mail , send mail, receive mail, filter mail.
> I need a mail firewall for a set of bespoke applications in a secure
> environment.
>
> I will probably use postfix

It is possible to configure various parts of postfix (e.g. - aliases
and other such) via queries against PostgreSQL tables, so this is
something that PostgreSQL could help with.
-- 
"cbbrowne","@","linuxdatabases.info"
http://linuxfinances.info/info/linuxxian.html
"The problem with the current Lisp Machine system is that nothing ever
calls anything anymore."  -- KMP

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Delete/update with limit

2007-07-23 Thread Chris Browne

[EMAIL PROTECTED] (Csaba Nagy) writes:
> In postgres we're currently not chunking, due to the fact that the code
> to do it is simply overly contorted and inefficient compared to the
> other DBs we use. At least all the solutions we could think of to do the
> chunking in a safe way while the inserts are running in parallel,
> without disturbing them, have invariably resulted in overly complicated
> code compared to the simple delete with limit + delete trigger solution
> we have for the other DBs.
>
> Now I don't put too much hope I can convince anybody that the limit on
> the delete/update commands has valid usage scenarios, but then can
> anybody help me find a good solution to chunk-wise process such a buffer
> table where insert speed is the highest priority (thus no indexes, the
> minimum of fields), and batch processing should still work fine with big
> table size, while not impacting at all the inserts, and finish in short
> time to avoid long running transactions ? Cause I can't really think of
> one... other than our scheme with the delete with limit + trigger +
> private temp table thing.

All that comes to mind is to put a SERIAL primary key on the table,
which shouldn't be *too* terribly expensive an overhead, assuming
there is reasonably complex processing going on; you then do something
like:

- select ID from the incoming table, order by ID, limit 500, to grab a
  list of IDs;

- delete from the table for that set of IDs.

Actually, is there any particular reason why you couldn't simply have
your "batch processing" loop look like:

Loop Forever
  DELETE from incoming_table;
  VACUUM incoming_table;
End Loop;

???

The alternative that I suggested amounts to:

Loop Forever
  DELETE from incoming_table where id in (select id from incoming_table limit 
500);
  VACUUM incoming_table;
End Loop;

I realize you're concerned that maintaining the index will be too
costly; I don't think it is obvious without actual benchmarking that
this is *in fact* too costly.

I'm pretty sure of one countervailing consideration: there's a cost to
VACUUMing the table that will throw in some costs; it is possible that
the cost of the index would be noise against that.
-- 
"cbbrowne","@","acm.org"
http://cbbrowne.com/info/lisp.html
When a man talks dirty to a woman, its sexual harassment. When a woman
talks dirty to a man, it's 3.95 per minute.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Linux distro

2007-08-02 Thread Chris Browne

[EMAIL PROTECTED] ("Merlin Moncure") writes:
> On 8/2/07, Ron Johnson <[EMAIL PROTECTED]> wrote:
>> Pardon me for being the contrarian, but why does a server need a
>> GUI?  Isn't that just extra RAM & CPU overhead that could be more
>> profitably put to use powering the application?
>
> A server with a GUI sitting on a login screen is wasting zero
> resources.  Some enterprise management tools are in java which
> require a GUI to use so there is very little downside to installing
> X, so IMO a lightweight window manager is appropriate...a full gnome
> is maybe overkill.  Obviously, you want to turn of the 3d screen
> saver :-)

The server does not need the overhead of having *any* of the "X
desktop" things running; it doesn't even need an X server.

You don't need X running on the server in order use those "enterprise
management" tools; indeed, in a "lights out" environment, that server
hasn't even got a graphics card, which means that an X server *can't*
be running on it.
-- 
"cbbrowne","@","linuxfinances.info"
http://linuxfinances.info/info/x.html
"Linux poses  a real challenge for  those with a  taste for late-night
hacking (and/or conversations with God)." -- Matt Welsh

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] is there a psql equivilent of fsck?

2007-08-13 Thread Chris Browne

[EMAIL PROTECTED] (Ben) writes:
> We recently installed and populated a new postgres 7.3 server, which
> was quickly abused with a good 12 hours of 115-degree heat. Now, we
> see ~1000 rows missing from a single table, and given our application,
> a delete of those rows seems a very remote possibility. Is there some
> database analogy to fsck I can run?
>
> FWIW the hardware raid claims everything is just fine.

What we tend to use when we run into such situations is:
  "VACUUM VERBOSE ANALYZE;"

This walks through all tables and indices in the database, and seeks
to clean them up.  If the disk has been mussed up, this will tend to
terminate with suitably scary looking error messages.

FYI, is there some particular reason why you went with PostgreSQL 7.3?
That's almost five years old, which is like (hmm... (* 5 6)) thirty
years old in "Internet Years."  That's, like, way, way, way obsolete.

We haven't been quick about jumping onto fresh new releases - we only
got to 8.1 this year, when 8.2 is now *last* year's flavour.
Nonetheless, we got rid of our last 7.3 instance several years ago...
-- 
output = ("cbbrowne" "@" "linuxdatabases.info")
http://cbbrowne.com/info/x.html
Rules of the Evil Overlord  #105. "I will design all doomsday machines
myself. If I must hire a mad  scientist to assist me, I will make sure
that he is sufficiently twisted to never regret his evil ways and seek
to undo the damage he's caused." 

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Compound Indexes

2007-08-14 Thread Chris Browne

[EMAIL PROTECTED] ("Phoenix Kiula") writes:
> I have a table with ten columns. My queries basically one column as
> the first WHERE condition, so an index on that column is certain. But
> the columns after that one vary depending on end-user's choice (this
> is a reporting application) and so does the sorting order.
>
> In MySQL world, I had sort_buffer in the config file, and I made a
> compound index with the columns most often used in these types of
> queries. So my index looked like:
>
>   INDEX idx_trades(id, t_id, c_id, s_id,  t_brief, created_on);
>
> This has five columns in it. While reading the pgsql documentation, I
> gather than anything beyond three columns offers diminishing benefits.
>
> My queries will look like these:
>
>SELECT * from trades where id = 9
>and c_id = 
>ORDER by s_id;
>
>SELECT * from trades where id = 9
>and s_id = 0
>ORDER by created_on desc ;
>
>SELECT * from trades where id = 9
>and s_id = 0
>and t_brief ~* 'more|than|one|word'
>ORDER by created_on desc ;
>
> So my question: how does PGSQL optimize its sorts? If I were to index
> merely the columns that are most commonly used in the reporting WHERE
> clause, would that be ok? Some ofthese columns may be "TEXT" type --
> how should I include these in the index (in MySQL, I included only the
> first 100 words in the index).

If you have only these three sorts of queries, then I would speculate
that the following indices *might* be useful:

 create idx1 on trades (id);
 create idx2 on trades (c_id);
 create idx3 on trades (s_id);
 create idx4 on trades (created_on);
 create idx5 on trades (created_on) where t_brief ~* 'more|than|one|word';
 create idx6 on trades (id, s_id) where t_brief ~* 'more|than|one|word';

(I'm assuming with idx5 and idx6 that you were actually searching for
'more|than|one|word'; if what is searched for can vary, then idx5/idx6
are worthless.)

You could try adding them all, and check out which of them are
actually used by the query planner.  And eventually drop out the
irrelevant ones.

PostgreSQL has a rather sophisticated query planner (pretty much
"rocket science," compared to MySQL), and it is even possible that it
would use multiple of those indices simultaneously for some of the
queries.  Which indexes, if any, it will use will vary from query to
query based on the parameters in the query.

You can determine the query plan by prefixing the query with the
keyword "EXPLAIN."

Suppose you add the above 6 indexes, you could get query plans via
running the following:

>EXPLAIN SELECT * from trades where id = 9
>and c_id = 
>ORDER by s_id;
>
>EXPLAIN SELECT * from trades where id = 9
>and s_id = 0
>ORDER by created_on desc ;
>
>EXPLAIN SELECT * from trades where id = 9
>and s_id = 0
>and t_brief ~* 'more|than|one|word'
>ORDER by created_on desc ;

You may want to post the output to the list; learning to read query
planner output is a bit of an art, and you won't necessarily make the
right sense of the results on day #1...
-- 
output = ("cbbrowne" "@" "cbbrowne.com")
http://www3.sympatico.ca/cbbrowne/spreadsheets.html
Consciousness - that annoying time between naps.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] a provocative question?

2007-09-06 Thread Chris Browne

[EMAIL PROTECTED] ("TJ O'Donnell") writes:
> I am getting in the habit of storing much of my day-to-day
> information in postgres, rather than "flat" files.
> I have not had any problems of data corruption or loss,
> but others have warned me against abandoning files.
> I like the benefits of enforced data types, powerful searching,
> data integrity, etc.
> But I worry a bit about the "safety" of my data, residing
> in a big scary database, instead of a simple friendly
> folder-based files system.
>
> I ran across this quote on Wikipedia at
> http://en.wikipedia.org/wiki/Eudora_%28e-mail_client%29
>
> "Text files are also much safer than databases, in that should disk
> corruption occur, most of the mail is likely to be unaffected, and any
> that is damaged can usually be recovered."
>
> How naive (optimistic?) is it to think that "the database" can
> replace "the filesystem"?

There is certainly some legitimacy to the claim; the demerits of
things like the Windows Registry as compared to "plain text
configuration" have been pretty clear.

If the "monstrous fragile binary data structure" gets stomped on, by
any means, then you can lose data in pretty massive and invisible
ways.  It's most pointedly true if the data representation conflates
data and indexes in some attempt to "simplify" things by having Just
One File.  In such a case, if *any* block gets corrupted, that has the
potential to irretrievably destroy the database.

However, the argument may also be taken too far.

-> A PostgreSQL database does NOT assemble data into "one monstrous
   fragile binary data structure."

   Each table consists of data files that are separate from index
   files.  Blowing up an index file *doesn't* blow up the data.

-> You are taking regular backups, right???

   If you are, that's a considerable mitigation of risks.  I don't
   believe it's typical to set up off-site backups of one's Windows
   Registry, in contrast...

-> In the case of PostgreSQL, mail stored in tuples is likely to get
   TOASTed, which changes the shape of things further; the files get
   smaller (due to compression), which changes the "target profile"
   for this data.

-> In the contrary direction, storing the data as a set of files, each
   of which requires storing metadata in binary filesystem data 
   structures provides an (invisible-to-the-user) interface to
   what is, no more or less, than a "monstrous fragile binary data
   structure."

   That is, after all, what a filesystem is, if you strip out the
   visible APIs that turn it into open()/close()/mkdir() calls.

   If the wrong directory block gets "crunched," then /etc could get
   munched just like the Windows Registry could.

Much of the work going into filesystem efforts, the last dozen years,
is *exceeding* similar to the work going into managing storage in
DBMSes.  People working in both areas borrow from each other.

The natural result is that they live in fairly transparent homes in
relation to one another.  Someone who "casts stones" of the sort in
your quote is making the fallacious assumption that since the fact
that a filesystem is a database of file information is kept fairly
much invisible, that a filesystem is somehow fundamentally less
vulnerable to the same kinds of corruptions.

Reality is that they are vulnerable in similar ways.

The one thing I could point to, in Eudora, as a *further* visible
merit that DOES retain validity is that there is not terribly much
metadata entrusted to the filesystem.  Much the same is true for the
Rand MH "Mail Handler", where each message is a file with very little
filesystem-based metadata.

If you should have a filesystem failure, and discover you have a
zillion no-longer-named in lost+found, and decline to recover from a
backup, it should nonetheless be possible to re-process them through
any mail filters, and rebuild a mail filesystem that will appear
roughly similar to what it was like before.  

That actually implies that there is *more* "conservatism of format"
than first meets the eye; in effect, the data is left in raw form,
replete with redundancies that can, in order to retain the ability to
perform this recovery process, *never* be taken out.

There is, in effect, more than meets the eye here...
-- 
(format nil "[EMAIL PROTECTED]" "cbbrowne" "acm.org")
http://linuxfinances.info/info/advocacy.html
"Lumping configuration data,  security data, kernel tuning parameters,
etc. into one monstrous fragile binary data structure is really dumb."
- David F. Skoll

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Do AGGREGATES consistently use sort order?

2007-09-06 Thread Chris Browne

[EMAIL PROTECTED] (Gregory Stark) writes:
> "Webb Sprague" <[EMAIL PROTECTED]> writes:
>
>> I can always count on (note the order name):
>>
>> \a
>> oregon_2007_08_20=# select array_accum(name) from (select name from
>> placenames where desig='crater' order by name desc) a;
>> array_accum
>> {"Yapoah Crater","West Crater","Twin Craters","Timber Crater","Red
>> Crater","Newberry Crater","Nash Crater","Mount Mazama","Millican
>> Crater","Little Nash Crater","Le Conte Crater","Jordan
>> Craters","Diamond Craters","Coffeepot Crater","Cayuse Crater","Black
>> Crater","Big Hole","Belknap Crater"}
>> (1 row)
>>
>> I am interested in stitching a line out of points in postgis, but
>> the order/aggregate thing is a general question.
>
> Yes.
>
> You can even do this with GROUP BY as long as the leading columns of
> the ORDER BY inside the subquery exactly matches the GROUP BY
> columns.
>
> In theory we can't promise anything about future versions of
> Postgres but there are lots of people doing this already so if ever
> this was lost there would probably be some new explicit way to
> achieve the same thing.

Is there not some risk that the query planner might choose to do
hash-based accumulation could discard the subquery's ordering?

Under the visible circumstances, it's unlikely, but isn't it possible
for the aggregation to pick hashing and make a hash of this?
-- 
output = reverse("gro.mca" "@" "enworbbc")
http://linuxfinances.info/info/spiritual.html
If anyone ever  markets  a really  well-documented Unix that   doesn't
require  babysitting by a phalanx of  provincial Unix clones, there'll
be a  lot of unemployable,  twinky-braindamaged misfits out deservedly
pounding the pavements.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] a provocative question?

2007-09-06 Thread Chris Browne

[EMAIL PROTECTED] ("Trevor Talbot") writes:
> There's also a point in regard to how modifications are made to your
> data store.  In general, things working with text files don't go to
> much effort to maintain durability like a real database would.  The
> most direct way of editing a text file is to make all the changes in
> memory, then write the whole thing out.  Some editors make backup
> files, or use a create-delete-rename cycle, but they won't
> necessarily force the data to disk -- if it's entirely in cache you
> could end up losing the contents of the file anyway.

In the case of Eudora, if its filesystem access protocol involves
writing a new text file, and completing that before unlinking the old
version, then the risk of "utter destruction" remains fairly low
specifically because of the nature of access protocol.

> In the general case on the systems I work with, corruption is a
> relatively low concern due to the automatic error detection and
> correction my disks perform, and the consistency guarantees of
> modern filesystems.  Interruptions (e.g. crashes or power failures)
> are much more likely, and in that regard the typical modification
> process of text files is more of a risk than working with a
> database.

Error rates are not so low that it's safe to be cavalier about this.

> I've also had times where faulty RAM corrupted gigabytes of data on
> disk due to cache churn alone.

Yeah, and there is the factor that as disk capacities grow, the
chances of there being errors grow (more bytes, more opportunities)
and along with that, the number of opportunities for broken checksums
to match by accident also grow.  (Ergo "don't be cavalier" unless you
can be pretty sure that your checksums are getting more careful...)

> It will always depend on your situation.  In both cases, you
> definitely want backups just for the guarantees neither approach can
> make.

Certainly.

> [way off topic]
> In regard to the Windows Registry in particular...
>
>> There is certainly some legitimacy to the claim; the demerits of
>> things like the Windows Registry as compared to "plain text
>> configuration" have been pretty clear.
>
>> -> You are taking regular backups, right???
>>
>>If you are, that's a considerable mitigation of risks.  I don't
>>believe it's typical to set up off-site backups of one's Windows
>>Registry, in contrast...
>
> Sometimes I think most people get their defining impressions of the
> Windows Registry from experience with the Windows 9x line.  I'll
> definitely agree that it was simply awful there, and there's much to
> complain about still, but...
>
> The Windows Registry in NT is an actual database, with a WAL,
> structured and split into several files, replication of some portions
> in certain network arrangements, redundant backup of key parts in a
> local system, and any external storage or off-site backup system for
> Windows worth its salt does, indeed, back it up.
>
> It's been that way for about a decade.

I guess I deserve that :-).

There is a further risk, that is not directly mitigated by backups,
namely that if you don't have some lowest common denominator that's
easy to recover from, you may not have a place to recover that data.

In the old days, Unix filesystems were sufficiently buggy corruptible
that it was worthwhile to have an /sbin partition, all statically
linked, generally read-only, and therefore seldom corrupted, to have
as a base for recovering the rest of the system.

Using files in /etc, for config, and /sbin for enough tools to recover
with, provided a basis for recovery.

In contrast, there is definitely risk to stowing all config in a DBMS
such that you may have the recursive problem that you can't get the
parts of the system up to help you recover it without having the DBMS
running, but since it's corrupted, you don't have the config needed to
get the system started, and so we recurse...
-- 
let name="cbbrowne" and tld="linuxdatabases.info" in name ^ "@" ^ tld;;
http://www3.sympatico.ca/cbbrowne/linuxdistributions.html
As of next Monday, TRIX will be flushed in favor of VISI-CALC.
Please update your programs.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] an other provokative question??

2007-09-06 Thread Chris Browne

[EMAIL PROTECTED] writes:
> Relational database pioneer says technology is obsolete
> http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9034619
> kindlt explain how??

There are several spins relevant to this:

  1.  He's trying to sell His New Thing, and it certainly makes good
  copy to say "your old stuff is obsolete - buy our new stuff!"

  2.  There are problems with SQL which cause many to want something
  better.

  The thing is, SQL isn't forcibly particularly "relational;"
  there is a purist view which says that it definitely *isn't*.

  http://en.wikipedia.org/wiki/RDBMS#Current_Usage

  They suggest that people interpret "relational" as implying:

 - System supports having collections of tables
 - System supports certain relational operators that work,
   in some contexts, on these tables.

  SQL provides that sort of thing, but is not as expressive as
  people would like.

  In effect, SQL has gotten hobbled so many ways over the years
  that people seem to find it easier to say "relational ==
  obsolete" than to try to explain that what they're trying to do
  is perhaps *more* faithful to the theoretical relational model
  than the existing products.
-- 
let name="cbbrowne" and tld="linuxdatabases.info" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/nonrdbms.html
Rules of  the Evil Overlord #14. "The  hero is not entitled  to a last
kiss, a last cigarette, or any other form of last request."

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Event-driven programming?

2007-09-12 Thread Chris Browne

[EMAIL PROTECTED] ("Jay Dickon Glanville") writes:
> On 9/12/07, Martijn van Oosterhout <[EMAIL PROTECTED]> wrote:
>> On Wed, Sep 12, 2007 at 12:36:22PM -0400, Jay Dickon Glanville wrote:
>> > Hello all.
>> >
>> > Is it possible for PostgreSQL to notify me of a successful transaction 
>> > commit?
>>
>> There is the LISTEN/NOTIFY mechanism. It's not automatic but I think it
>> does most of what oyu want.
>
> Thanks for the suggestion Martijn, but I get the impression from the
> documentation that LISTEN/NOTIFY is for client to client
> communication.  This type of communication isn't what I'm looking for.
>  I want server to client communication, where the server is notifying
> me on successful commit of transaction.

It may be closer to what you want than you think.

The server *does* notify listeners upon successful commit of a
transaction, and the timing is indeed right; the time of the
notification is COMMIT time, not before.
-- 
output = ("cbbrowne" "@" "cbbrowne.com")
http://cbbrowne.com/info/nonrdbms.html
And me, with this terrible pain in all the diodes down my left side...
-- Marvin the Paranoid Android

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Pgsql roles, SQL injection, and utility statements

2007-09-14 Thread Chris Browne

[EMAIL PROTECTED] (Chris Travers) writes:
> Since the utility statements are not parameterized, the easiest way to
> manage the roles in an application is to use stored procedures which
> EXECUTE strings to create SQL queries.   These EXECUTE statements
> include user-supplied data, and since these would generally run with
> some sort of administrative rights, I am worried about people doing
> things like:
> select * from add_user_to_role('username', 'rolename; drop table foo;');
>
> Is this a problem?  Is there a way to do this safely?

Sure - validate that 'rolename; drop table foo;' is the legitimate
name of a role, and raise an exception if it isn't.

And have the stored function use "double quotes" to make sure that the
names are suitably quoted.

That provides a "belt" and a "pair of suspenders" for this case...
-- 
output = ("cbbrowne" "@" "acm.org")
http://www3.sympatico.ca/cbbrowne/nonrdbms.html
I just removed the instructions in MC:COMMON;LINS > which specify that
it should be installed on AI.  We'll certainly  miss that machine, and
probably spend the rest of our lives fixing programs that mention it.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] ON INSERT => execute AWK/SH/EXE?

2007-09-19 Thread Chris Browne

[EMAIL PROTECTED] ("Bima Djaloeis") writes:
> Hi there,
> I am new to PostgreSQL, is it possible to create something so that
> 1) If I insert / update / delete an item from my DB...
> 2) ... an awk / shell / external program is executed in my UNIX System?
> If yes, how do I do this and if no, thanks for telling.
> Thanks for reading, any help is appreciated.

I Would Not try to do that directly, as that could lead to arbitrary
numbers of processes getting scheduled, which could cause Plenty O
Heartburn.

I would instead suggest having a trigger in place that would, upon
doing this:

 a) Insert an ID, if needed, into a work queue table.
(This may be optional.)

 b) Use NOTIFY to tell a process that uses LISTEN to wake up and
do whatever work is necessary, possibly processing *multiple*
items.

The LISTENING process needs to be prepared to process all the
queued-up work; that should lead to *vastly* more efficient processing
than spawning a worker for each item.
-- 
"cbbrowne","@","acm.org"
http://www3.sympatico.ca/cbbrowne/rdbms.html
Rules of the  Evil Overlord #60. "My five-year-old  child advisor will
also  be asked to  decipher any  code I  am thinking  of using.  If he
breaks the code  in under 30 seconds, it will not  be used. Note: this
also applies to passwords." 

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] It's time to support GRANT SELECT,UPDATE,...,...,... ON database.* to username

2007-10-02 Thread Chris Browne

[EMAIL PROTECTED] ("Scott Marlowe") writes:
> About 75% of the time I see that response, it comes with the actual
> code to do just that.  I.e. cut and paste and voila, you've got the
> functions.
>
>> You write the function. Fuck the standard and
>> wake up.
>
> Me?  What about you?  The fact is there's a limited number of hackers
> capable of writing what you're asking for cleanly and correctly, and
> they're working on other stuff.  Asking them politely has been know to
> work.  Using the F word not so much.

"Feel free  to contribute build  files.  Or work on  your motivational
skills, and maybe someone somewhere will write them for you..."
-- "Fredrik Lundh" <[EMAIL PROTECTED]>

This is the usual sort of *right* answer to this...

It has tended to turn into recommendations to "write a function"
because the desired functionality is almost never a constant.  People
*claim* that they want to grant access to everything, but there are
commonly exceptions.

"Oh, but that table needs to be kept secure from the users..."

- Does it cover all tables?  Really?
- How about views?
- How about functions?  Operators?  
- What about the security definer functions?  Are they exceptions?

- How to deal with the exceptions that there are sure to be?

The trouble is that "GRANT ON *.*" seems to be a lazy shortcut for
someone who *thinks* they're trying to secure their system, but that
would rather say "well, everything" as opposed to looking at things
properly.

That is, if you don't know what tables and other objects need to be
secured, how can you have any idea that you're handling the securing
of your application properly???
-- 
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://linuxdatabases.info/info/nonrdbms.html
Should vegetarians eat animal crackers? 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] improvement proposition

2007-10-16 Thread Chris Browne

[EMAIL PROTECTED] (Ron Johnson) writes:
> On 10/16/07 08:21, hubert depesz lubaczewski wrote:
>> hi,
>> would it be possible for someone to add "last query" for
>> pg_stat_activity view?
>> 
>> there is a lot of cases that we have "idle in transaction" sitting for
>> long time, and since we dont log all queries it is next to impossible to
>> tell where in app code the problem lies.
>> it would be very useful to get something like "previous query".
>
> Transaction analysis is the way to go here.  It requires a serious
> code review, though.
>
>> is adding something like this possible?

I don't think it's likely to happen any time soon, certainly not for
"legacy" versions of PostgreSQL.  (e.g. - this might *conceivably* be
material for version 8.4, but it won't be happening for any databases
presently in production)

On The Other Hand, the last time this happened and I got "called into
action" (which wasn't terribly long ago), I did a search on pg_locks
to see what locks the " in transaction" connection was holding
onto, and that provided enough information for us to infer which part
of the user code the system was in.
-- 
(format nil "[EMAIL PROTECTED]" "cbbrowne" "linuxdatabases.info")
http://linuxfinances.info/info/linuxdistributions.html
Signs of a Klingon Programmer - 7. "Klingon function calls do not have
'parameters' - they have 'arguments' -- and they ALWAYS WIN THEM."

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] DROP VIEW lasts forever

2007-10-17 Thread Chris Browne

[EMAIL PROTECTED] ("Christian Rengstl") writes:
> Whenever I try dropping a view using DROP VIEW myschema.myview the
> query runs forever. The last time I tried was ten minutes ago and the
> query is still running even though the view queries just data of one
> single table with around 5 fields.
> What could be the reason?
>
> By the way, I'm using psql 8.2

Chances are that the problem was that some connection had a lock on
the VIEW.

As a rule, when something "takes forever" when there is no reasonable
reason for it to be expected to run for a long time (e.g. - where
you're doing a cross join of two tables that each have a billion
tuples), it is entirely likely that you have run afoul of some locks,
and you can expect that NOTHING will happen until such time as
whatever is holding onto the locks lets go of them.
-- 
output = reverse("gro.mca" "@" "enworbbc")
http://cbbrowne.com/info/nonrdbms.html
"Unlike computers, guns don't have Y2K problems..."

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] How to use 'toast'

2007-10-22 Thread Chris Browne

Sascha Bohnenkamp <[EMAIL PROTECTED]> writes:
> I tryed to use large-object and saw some 'toast' while reading the
> documentation :)
>
> How do i use it?

You generally merely need to add data to your tables; if columns are
large enough, then PostgreSQL will consider TOASTing them without you
needing to take any action yourself.

In 8.4, it is fairly likely that there will be some extra "knobs" to
allow tuning TOAST usage so that you could encourage its use at lower
levels for particular tables/columns.

At present, the parameterization is at the #define level (e.g. - deep
inside the binaries); if a tuple is >4096 bytes in size, then TOASTing
will be considered.
-- 
select 'cbbrowne' || '@' || 'linuxfinances.info';
http://cbbrowne.com/info/lisp.html
When I met th'POPE back in '58, I scrubbed him with a MILD SOAP or
DETERGENT for 15 minutes.  He seemed to enjoy it ...

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] =?UTF-8?Q?select_random_order_by_random?=

2007-11-01 Thread Chris Browne

[EMAIL PROTECTED] ("=?UTF-8?Q?piotr=5Fsobolewski?=") writes:
> I was very surprised when I executed such SQL query (under PostgreSQL 8.2):
> select random() from generate_series(1, 10) order by random();
>
> I thought I would receive ten random numbers in random order. But I received 
> ten random numbers sorted numerically:
>   random
> ---
>  0.102324520237744
>   0.17704638838768
>  0.533014383167028
>   0.60182224214077
>  0.644065519794822
>  0.750732169486582
>  0.821376844774932
>   0.88221683120355
>  0.889879426918924
>  0.924697323236614
> (10 rows)
>
> I don't understand - why the result is like that? It seems like in each row 
> both random()s were giving the same result. Why is it like that? What caused 
> it?

At first, I thought this was unsurprising, but it's pretty easy to
show that there's more going on than meets the eye...  It is a bit
more clear that something interesting is going on if you add extra
columns, and name them all.

For instance, consider:

[EMAIL PROTECTED]:5433=# select random() as r1, random() as r2, random() as r3 
from generate_series(1,10) order by r3;
r1 |r2 |r3 
---+---+---
 0.246548388153315 | 0.700139089021832 | 0.119033687748015
 0.627153669018298 | 0.813135434407741 | 0.197322080843151
 0.306632998399436 | 0.545771937351674 |  0.25644090725109
 0.345610864460468 | 0.474996185861528 | 0.350776285864413
 0.580351672600955 | 0.673816084861755 | 0.443187412340194
  0.73298008274287 | 0.756699057295918 | 0.594754341989756
 0.932091740425676 | 0.383943342603743 | 0.749452064745128
 0.955010122619569 | 0.972370331641287 | 0.893978256732225
 0.675367069896311 | 0.800306641962379 | 0.922712546307594
 0.778622157406062 |  0.51328693702817 | 0.978598471730947
(10 rows)

You can see that it's ordering by the third column.

If I replicate your query, with extra columns, AND NAMES, I get the following:

[EMAIL PROTECTED]:5433=# select random() as r1, random() as r2, random() as r3 
from generate_series(1,10) order by random();
 r1 | r2 |r3 
++---
 0.0288224648684263 |  0.904462072532624 |  0.27792159980163
  0.144174488261342 |  0.406729203648865 | 0.452183415647596
  0.263208176475018 |  0.752340068109334 | 0.927179601509124
  0.443778183776885 |  0.197728976141661 | 0.556072968058288
  0.613984462339431 | 0.0589730669744313 | 0.472951539326459
  0.641100264620036 |  0.152739099226892 | 0.528443300165236
  0.700987075921148 |  0.160180815029889 | 0.752044326625764
  0.778274529613554 |  0.579829142428935 | 0.078228241764009
  0.849023841321468 |  0.570575307123363 | 0.742937533650547
  0.870425369590521 |  0.837595224380493 | 0.986238476354629
(10 rows)

It is indeed somewhat curious that the query parser chose to interpret
that the "order by random()" was referring to column #1.
-- 
(format nil "[EMAIL PROTECTED]" "cbbrowne" "linuxfinances.info")
http://cbbrowne.com/info/lisp.html
"...I'm not one  of those who think Bill Gates is  the devil. I simply
suspect that if Microsoft ever met up with the devil, it wouldn't need
an interpreter." -- Nicholas Petreley, InfoWorld, Sept 16, 1996

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] INSERT performance deteriorates quickly during a large import

2007-11-08 Thread Chris Browne

[EMAIL PROTECTED] ("=?UTF-8?B?VG9tw6HFoSBWb25kcmE=?=") writes:
> Try to one of these:
>
> a) don't use INSERT statements, use a COPY instead
>
> b) from time to time run ANALYZE on the "public" table (say 1000
>inserts, then one analyze)
>
> c) create the table without constraints (primary / foreign keys in this
>case), import all the data, and then create the constraints
>
> The (b) and (c) may be combined, i.e. import without constraints and
> analyze from time to time. I'd probably try the (a) at first, anyway.
>
> Try to gather some more statistics - is the problem related to CPU or
> I/O? Use 'dstat' for example - this might say give you a hint in case
> the advices mentioned above don't help.

I agree with all but b).

- If you use COPY, that copies data "in bulk" which is *way* faster
  than submitting individual INSERT statements that must each be parsed.

  So I certainly agree with a).

- There are two prime reasons to expect the data load to slow down:

  1. Because adding entries to the index gets more expensive the
 larger the table gets;

  2. Because searching through foreign key constraints tends to get
 more expensive as the target table grows.

  Those point to doing c).

If you put off evaluating indices and foreign key constraints until
all of the data is loaded, there should be no need to run ANALYZE
during the COPY process.

And there should be no reason for loading data to get more costly as
the size of the table increases.
-- 
let name="cbbrowne" and tld="linuxfinances.info" in name ^ "@" ^ tld;;
http://linuxdatabases.info/info/advocacy.html
Rules of the Evil Overlord #116.  "If I capture the hero's starship, I
will keep it in  the landing bay with the ramp down,  only a few token
guards on  duty and a ton  of explosives set to  go off as  soon as it
clears the blast-range." 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] select with recursive support

2007-11-21 Thread Chris Browne

Sascha Bohnenkamp <[EMAIL PROTECTED]> writes:
>> I saw on the todo list that the "with recursive" option for selects
>> (equivalent to oracle's connect by, as far as I know) is on the todo
>> list for postgresql, but apparently not for the upcoming 8.3 release.
> the postgresql derivate enterprisedb has this feature
> (exactly the oracle syntax)

What is intended for PostgreSQL is decidedly NOT the "Oracle syntax"
(which more or less predetermines a special case of recursion), but
rather the "WITH [definition] select [which can reference
[definition]]" approach seen both in DB2 and the SQL 2003 standard.

There have been a couple of abortive starts to this; I seem to recall
someone having offered to work on it in the 8.4 branch.
-- 
select 'cbbrowne' || '@' || 'linuxfinances.info';
http://linuxfinances.info/info/oses.html
"The  Unix philosophy  is  to  provide some  scraps  of  metal and  an
enormous roll of duct tape.  With those -- and possibly some scraps of
your own -- you can conquer the world." -- G. Sumner Hayes

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] replication in Postgres

2007-11-26 Thread Chris Browne

[EMAIL PROTECTED] ("Jeff Larsen") writes:
>> Alvaro Herrera wrote:
>> > Glyn Astill wrote:
>> >> Thanks everyone for your replies. EnterpriseDB looks like the way to
>> >> go if we want good replication.
>> >
>> > Sorry, this makes no sense to me -- EnterpriseDB has no replication
>> > solution that I know of.
>>
>> Yeah, there is:
>>
>> http://www.enterprisedb.com/products/enterprisedb_replication.do
>
> Yes, but I'd like something better than "near real time" as the above
> page describes. Or maybe someone could clarify that Besides,
> EnterpriseDB does not save me enough money. In my current commercial
> DB, if a transaction is committed on the master, it is guaranteed to
> be committed to the secondary. In our business, losing one customer
> order could lose us the customer for good.

I believe that what they are using is a version of Slony-I, which
certainly falls into the "near real time" replication category.

Historically, when people think they require "something better than
near-real-time," they frequently find that the "something better"
turns out to be too expensive to live with.

"Near real time" usually refers to the notion of asynchronous
replication, where it is a little bit nondeterministic how far behind
a replica may be.  (Which is definitely the case for Slony-I.)

Unfortunately, the only way to make things deterministic (or to get
from "near real time" to "*GUARANTEED* real time") is to jump to
synchronous replication, which is not much different from 2PC (Two
Phase Commit), and which is certain to be prohibitively expensive
across a WAN.

At this point, I tend to get visions of Tom Cruise telling Jack
Nicholson, "I want real time replication!", and getting the response:
"You can't HANDLE real time replication!"
-- 
(format nil "[EMAIL PROTECTED]" "cbbrowne" "linuxfinances.info")
http://cbbrowne.com/info/slony.html
"Any sufficiently complicated C or Fortran program contains an ad hoc
informally-specified bug-ridden slow implementation of half of Common
Lisp."  -- Philip Greenspun 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] replication in Postgres

2007-11-26 Thread Chris Browne

[EMAIL PROTECTED] (Erik Jones) writes:
> Since no one's mentioned it, and while I don't have any personal
> experience with it, I thought I'd mention the recently released
> Bucardo (http://bucardo.org/) as another Postgres replication option.

It's Yet Another Asynchronous Replication System, ergo as
unsatisfactory for "forcibly real time" requirements as any of the
other async systems...
-- 
let name="cbbrowne" and tld="linuxfinances.info" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/advocacy.html
Self Reference is its Own Reward
"If tautologies do not convey information, mathematicians would not be
surprised by them."
-- Mark Miller

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] replication in Postgres

2007-11-26 Thread Chris Browne

[EMAIL PROTECTED] (Glyn Astill) writes:
> It it possible to get a system that does syncronous replication and
> also allows slaves to catch up if they're down for a period of time
> like you can with asyncronous?

Well, a "modal approach" is possible - that's what Postgres-R tries to
do.

Of course, once you drop into a mode that "allows slaves to catch up,"
then you have given up on synchronicity, and have fallen back to
asynchronous replication.

If you systematically have a way to do that, then you no longer have a
replication system that can honestly be called "synchronous."  

If it's *truly* synchronous, then when nodes fall over, the system
MUST stop accepting transactions.
-- 
output = reverse("ofni.secnanifxunil" "@" "enworbbc")
http://www3.sympatico.ca/cbbrowne/linux.html
Signs of  a Klingon Programmer - 2.  "Specifications are  for the weak
and timid!"

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Import LDAP data to a Postgres database

2007-12-06 Thread Chris Browne

[EMAIL PROTECTED] (Marco Curtolo) writes:
> Hi, my question is this: it is possible to query, using SQL, an LDAP
> server and put these data to a table of a Postgres database (under
> linux) ???

Unfortunately, the LDAP model is more or less a "network model," which
doesn't fit terribly elegantly onto SQL.

You can in fact use PostgreSQL as the backend for OpenLDAP; there's a
HOWTO on this:

There are also ways of exporting SQL data via LDAP; see here:

-- 
select 'cbbrowne' || '@' || 'linuxdatabases.info';
http://www3.sympatico.ca/cbbrowne/sap.html
Microsoft Outlook: Deploying Viruses Has Never Been This Easy!

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Replication Monitoring

2007-12-07 Thread Chris Browne

[EMAIL PROTECTED] (Glyn Astill) writes:
> [posted again as it found it's way into another thread]
>
> Hi people,
>
> I intend to set up two slave servers, one using WAL shipping and one
> using Slony I.
>
> Are there any good tools, or scripts that'll help us check that both
> replication methods are working?
>
> I know theres Nagios - but what does this actually allow us to
> monitor?
>
> Also if I want to make backups from the slaves to save master
> downtime / load what are my options?

The Slony-I documentation describes how one might use MRTG to graph
the output of the "sl_status" view, which can be used to analyze, over
time, how far behind the Slony-I subscriber nodes are.

You could correspondingly set up a query against sl_status to warn of
problems.

In practice, we've often found it more useful to look at this from an
application standpoint, that is, to find some table that is always
getting updated, such as an ongoing transaction log, and raise alerts
if the age of the last entry grows higher than expected.

Unfortunately, that may not work for everyone - we have, in our online
systems, some mandated monitoring that introduces transactions every
5-10 minutes, whether anything "real" is going on or not.  If you
don't have that, and there is a quiet period when the system sees zero
activity for 8 hours, alerts are likely to go off :-(.

vis-a-vis grabbing backups from slaves...

You can NOT grab schema dumps there, because they're hacked up
(pg_catalog gets modified a fair bit), but you could grab the schema
from the master node, and do just a dump of data from a subscriber
node.

I'd want to test the reload process before signing off on any promises
that it's working, but it should all be fine...
-- 
(format nil "[EMAIL PROTECTED]" "cbbrowne" "acm.org")
http://www3.sympatico.ca/cbbrowne/lisp.html
"NT 5.0 is the last nail in the Unix coffin. Interestingly, Unix isn't
in the coffin... It's wondering what the heck is sealing itself into a
wooden box 6 feet underground..." -- Jason McMullan

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] slony question

2007-12-10 Thread Chris Browne

[EMAIL PROTECTED] ("Josh Harrison") writes:
> Does slony support postgres major version upgrade ? ie., will it
> replicate between different major versions?

Yes, that's one of the major "use cases" for Slony-I.

Version 1.0 and 1.1 support (loosely) from PostgreSQL 7.3 to 8.1-ish.

Version 1.2 dropped 7.3 support, but supports up to (just about;
there's a patch not in a 1.2 release that is needed for late-breaking
8.3 changes) 8.3.

You may have to be careful about which version of Slony-I you select
if you have a really old PG database.
-- 
select 'cbbrowne' || '@' || 'acm.org';
http://www3.sympatico.ca/cbbrowne/lisp.html
"What you  said you   want to do  is  roughly  equivalent to   nailing
horseshoes to the tires of your Buick."  -- [EMAIL PROTECTED] on
the question "Why can't Linux use Windows Drivers?"

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] slony question

2007-12-13 Thread Chris Browne

[EMAIL PROTECTED] ("Scott Marlowe") writes:
> On Dec 10, 2007 2:19 PM, Chris Browne <[EMAIL PROTECTED]> wrote:
>> [EMAIL PROTECTED] ("Josh Harrison") writes:
>> > Does slony support postgres major version upgrade ? ie., will it
>> > replicate between different major versions?
>>
>> Yes, that's one of the major "use cases" for Slony-I.
>>
>> Version 1.0 and 1.1 support (loosely) from PostgreSQL 7.3 to 8.1-ish.
>>
>> Version 1.2 dropped 7.3 support, but supports up to (just about;
>> there's a patch not in a 1.2 release that is needed for late-breaking
>> 8.3 changes) 8.3.
>
> Didn't 1.2 drop support for pg 7.4 as well?  I thought I remembered
> reading that some time ago.

Well, that would mean that some of the testing I have been continuing
to do would be pretty useless ;-).

We still have one instance of 7.4 in production, where we'll want to
migrate to 8.something in the new year.  It still seems nice to have
the option of possibly upgrading to new Slony-I before upgrading to
8.x, rather than being forced to do otherwise ;-).
-- 
output = reverse("ofni.sesabatadxunil" "@" "enworbbc")
http://linuxfinances.info/info/sgml.html
Change is inevitable, except from a vending machine. 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Partitioned tables & Slony

2007-12-18 Thread Chris Browne

Goboxe <[EMAIL PROTECTED]> writes:
> What need to be taken care of when replicating data from partitioned
> tables?
>
> I have several master tables that are inherited by date either daily,
> weekly and monthly.
>
> How to automate addition of newly created child tables into Slony
> cluster?

There's an outline of how to do it, in the partitioning test...

http://main.slony.info/viewcvs/viewvc.cgi/slony1-engine/tests/testpartition/
-- 
"cbbrowne","@","linuxdatabases.info"
http://linuxfinances.info/info/slony.html
"If we believe in data structures, we must believe in independent
(hence simultaneous) processing.  For why else would we collect items
within a structure?  Why do we tolerate languages that give us the one
without the other?" -- Alan J. Perlis

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Partitioned tables & Slony

2007-12-19 Thread Chris Browne

Goboxe <[EMAIL PROTECTED]> writes:
> Thanks Chris.
>
> How do I use *.sh & *.ik files?

The "*.sh" files are shell scripts that should run with any Bourne
shell; the *specific* intent is that they be invoked by the test
control script, "run_test.sh".

The "*.ik" files are (mostly) bodies of Slonik scripts; they require a
preamble that indicates cluster name and node connection information.

What you might want to do is to run the partitioning test.  You'd do
this by making sure you have PGPORT pointing to a postmaster with
Slony-I installed, and PGUSER set to a superuser, then, from the test
directory, run the command:

$ ./run_test.sh testpartition

This will run various scripts, using the ones in the "testpartition"
directory to control the test.

The script "generate_dml.sh" shows how any of the special scripts get
used.

The point of this is that it represents a regression test that
verifies that various aspects of replication work in conjunction with
table partitioning.  You won't be able to take the code directly, but
the code does represent an automated test that builds a cluster that
involves partitioning, and adds in new partitions "on the fly."
-- 
let name="cbbrowne" and tld="linuxfinances.info" in name ^ "@" ^ tld;;
http://linuxfinances.info/info/sgml.html
"The only thing  better than TV with the  sound off is  Radio with the
sound off." -- Dave Moon

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Hinting the planner

2008-01-02 Thread Chris Browne

[EMAIL PROTECTED] (Martin Gainty) writes:
> Not the planner but you can hint the query as in this example
> select /*+ ordered use_hash(code1)*/ * from table_1;

That might have some effect with Oracle; is there some reason why you
think this would be expected to have any effect on a PostgreSQL query?
-- 
select 'cbbrowne' || '@' || 'acm.org';
http://www3.sympatico.ca/cbbrowne/x.html
Rules of the Evil Overlord #178. "If  I have the  hero cornered and am
about to finish him off and he says "Look out behind you!!" I will not
laugh and  say "You  don't expect me  to fall  for that old  trick, do
you?" Instead I will take a step to the side and half turn. That way I
can still  keep my  weapon trained on  the hero,  I can scan  the area
behind me, and  if anything was heading for me it  will now be heading
for him." 

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Feature request: NOTIFY enhancement

2008-01-03 Thread Chris Browne

[EMAIL PROTECTED] (Bruce Momjian) writes:
>> I am not sure if I am asking too much but does it make sense, and is it
>> possible, to enhance NOTIFY that process "name/value" pair? Like this:
>> 
>> NOTIFY "MyName=MyValue";
>> 
>> With the capability of name/value pair, the listener will not have to
>> check notification details by reading tables.
>
> TODO already has:
>
>   * Add optional textual message to NOTIFY
>   
> This would allow an informational message to be added to the notify
> message, perhaps indicating the row modified or other custom
> information.

I keep wondering if this isn't just going to grow into pgq...

  

In effect, once you start defining more semantics for this, I think it
rapidly transforms into a message queueing system, and you might as
well go straight to something full-fledged like pgq...
-- 
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://www3.sympatico.ca/cbbrowne/lisp.html
STATED REASON DOES NOT COMPUTE WITH PROGRAMMED FACTS...

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] [OT] Slony (initial) Replication - Slow

2008-01-04 Thread Chris Browne

[EMAIL PROTECTED] (Andrew Sullivan) writes:
> On Thu, Jan 03, 2008 at 11:15:23AM +0800, Ow Mun Heng wrote:
>> I'm just wetting my hands with slony and during the setup of the slave,
>> I did and dump and restore of the master DB to the Slave DB.
>
> Nope, you don't need to do that.  You need a copy of the _schema_ on the
> target machine.  But slony will remove all the contents and build the
> replica anew.

Right.  The argument for doing so is that this approach (TRUNCATE +
COPY on the subscriber) is the only way that Slony-I can be certain
that it has all data on the subscriber that was on the provider.

That way, it doesn't need to trust any dodgy claims that "oh, I copied
all the data - honest!"

>> can someone confirm this? It _is_ taking long time (for slony) to do the
>> \copy (~60GB in multiple tables being replicated, including (on the fly)
>> index creation)
>
> It takes approximately the same time as it would to do a psql -h
> [remotehost] -f dumpfile.sql restore (i.e. copying the entire data
> contents across the network).

In 1.2.x, it should be a little bit quicker than the "pg_dump | psql"
approach as all index generation takes place together for each table.

When you do a restore of a pg_dump, the indexes are generated in a
somewhat arbitrary order, where there may be a separation in time
between when different indexes on a given table get created.

In contrast, Slony-I regenerates all the indexes on a given table in a
"one swell foop" fashion, which might be expected to allow cacheing to
provide a bit better performance than you could get with "pg_dump |
psql".
-- 
"cbbrowne","@","cbbrowne.com"
http://linuxdatabases.info/info/emacs.html
Microsoft Outlook: Deploying Viruses Has Never Been This Easy!

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] large objects,was: Restoring 8.0 db to 8.1

2008-01-08 Thread Chris Browne

[EMAIL PROTECTED] ("Harald Armin Massa") writes:
>> Not likely to change in the future, no.  Slony uses triggers to manage the
>> changed rows.  We can't fire triggers on large object events, so there's no
>> way for Slony to know what happened.
>
> that leads me to a question I often wanted to ask:
>
> is there any reason to create NEW PostgreSQL databases using Large
> Objects, now that there is bytea and TOAST? (besides of legacy needs)
>
> as much as I read, they take special care in dump/restore; force the
> use of some special APIs on creating, do not work with Slony 

They are useful if you really need to be able to efficiently access
portions of large objects.

For instance, if you find that you frequently need to modify large
objects, in place, that should be much more efficient using the LOB
interface than it would be using a bytea column.

It ought to be a lot more efficient to lo_lseek() to a position,
lo_read() a few bytes, and lo_write() a few bytes than it is to pull
the entire 42MB object out, read off a fragment, and then alter the
tuple.

That being said, I generally prefer bytea because it doesn't force me
into using a pretty weird "captive interface" to access the data.

If I found myself needing to make wacky updates on a large object, I'd
wonder if it wouldn't be better to have it expressed as a set of
tuples so that I'd not have a large object in the first place...
-- 
(format nil "[EMAIL PROTECTED]" "cbbrowne" "linuxdatabases.info")
http://www3.sympatico.ca/cbbrowne/x.html
"...  They are  not ``end  users'' until  someone presupposes  them as
such, as witless cattle." -- <[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-09 Thread Chris Browne

[EMAIL PROTECTED] (Zoltan Boszormenyi) writes:
> which will be fast and depending on the initial value of COUNT(*)
> it will be very close to the exact figure. You can extend the example
> with more columns if you know your SELECT COUNT(*) ... WHERE
> conditions in advance but this way you have to keep several administrative
> tables for different monitored tables. Again, this trades some disk space
> and INSERT/DELETE operation speed on the monitored tables for
> quicker count.

Actually, this approach will be Really Terrible for any cases where
multiple connections are adding/deleting tuples concurrently, as it
will force ALL updates to serialize behind the update to the central
table.

Occasionally, you'll have something even worse, namely a deadlock,
where two or more of the updates fighting over the single summary
tuple fall into a bad state, and one of them is forced to give up,
potentially rolling back its whole transaction.

[Waving hands for a moment]

What I would do *instead* would be for each INSERT to add a tuple with
a count of 1, and for each DELETE to add a tuple with count of -1, and
then to periodically have a single process walk through to summarize
the table.  There may be a further optimization to be had by doing a
per-statement trigger that counts the number of INSERTs/DELETEs done,
so that inserting 30 tuples (in the table being tracked) leads to
adding a single tuple with count of 30 in the summary table.

That changes the tradeoffs, again...

 - Since each INSERT/DELETE is simply doing an INSERT into the summary
   table, the ongoing activity is *never* blocking anything

 - You get the count by requesting 
SELECT SUM(rec_cnt) as record_count from rec_count where tablename = 'foo';

 - Note that the query is MVCC-consistent with the table!

 - Once in a while, you'll want to run a single procedure that, for
   each table, deletes all the existing records, and replaces them
   with a single one consisting of the sum of the individual values.

 - You can re-sync a table by running the query:
 begin;
delete from record_count where tablename = 'foo';
insert into record_count(tablename, rec_cnt) select 'foo', (select 
count(*) from foo);
 commit;
-- 
output = reverse("ofni.sesabatadxunil" "@" "enworbbc")
http://www3.sympatico.ca/cbbrowne/languages.html
Rules of  the Evil Overlord #174.  "If I am dangling  over a precipice
and the hero reaches  his hand down to me, I will  not attempt to pull
him down with  me. I will allow him to rescue  me, thank him properly,
then return to the safety of my fortress and order his execution."

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] LIKE and REGEX optimization

2008-01-15 Thread Chris Browne

Kico Zaninetti <[EMAIL PROTECTED]> writes:
> Hi all.
>
> I have a database with 62 million registers and I have to make a
> SELECT using LIKE.
>
> This is my select:
> SELECT * FROM phone WHERE name LIKE = '%ZANINETTI%' AND city = 'SAO
> PAULO' AND state = 'SP'
>
> I have an index created like this:
> CREATE INDEX "telefones_idx2" ON "public"."phone"
>   USING btree ("name" varchar_pattern_ops, "city", "state");
>
> When I explain the query I have this:
> QUERY PLAN
> Bitmap Heap Scan on telefones  (cost=1031528.27..2726942.75 rows=4
> width=145)
>   Recheck Cond: (((city)::text = 'SAO PAULO'::text) AND ((state)::text
> = 'SP'::text))
>   Filter: ((name)::text ~~ '%ZANINETTI%'::text)
>   ->  Bitmap Index Scan on telefones_idx2  (cost=0.00..1031528.27
> rows=1712760 width=0)
> Index Cond: (((city)::text = 'SAO PAULO'::text) AND
> ((state)::text = 'SP'::text))
>
>
> The cost is over than 1 million! It's to high and I have to reduce it.
> Does someone know how can I make it?

You may be misreading what the cost is...

It looks to me like the planner is estimating that there are 1712760
rows where city = 'SAO PAULO' and state = 'SP'; is that estimate way
off?

If it is, then maybe you need to ANALYZE the table, perhaps combined
with altering the histogram size for city/state.

(e.g. - alter table phone alter column city set statistics 100;
alter table phone alter column state set statistics 100;)

If altering statistics, then re-ANALYZEing the table helps, that may
resolve things.

Unfortunately, the LIKE clause isn't going to readily take advantage
of regular indices, because you have put in '%' at beginning and end,
which means that regular indices will not be usable.

I wouldn't be too surprised if that is the case; Sao Paulo is about
the most populous South American city, with ~11M people, so having a
database with 1.7M phone numbers in that city does not seem "out
there."

The only idea that comes to mind to follow that is to look into
tsearch.  On PostgreSQL 8.3, it's built in; on 8.2 and earlier, it's a
"contrib" module.  It allows indexing on words inside columns, which
would seem to fit your requirement.
-- 
select 'cbbrowne' || '@' || 'linuxfinances.info';
http://cbbrowne.com/info/sap.html
Signs  of  a  Klingon  Programmer #3:  "By   filing this  TPR you have
challenged the honor of my family. Prepare to die!"

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] LIKE and REGEX optimization

2008-01-15 Thread Chris Browne

fit
to come to Christ; they are so wicked that Christ will never accept
them. And then it may be they set themselves upon a new course of
fruitless endeavors, in their own strength, to make themselves better,
and still meet with new disappointments. They are earnest to inquire
what they shall do. They do not know but there is something else to be
done, in order to their obtaining converting grace, that they have never
done yet. It may be they hope that they are something better than they
were; but then the pleasing dream all vanishes again. If they are told
that they trust too much to their own strength and righteousness, they
cannot unlearn this practice all at once, and find not yet the
appearance of any good, but all looks as dark as midnight to them. Thus
they wander about from mountain to hill, seeking rest, and finding none.
When they are beat out of one refuge, they fly to another; till they are
as it were debilitated, broken, and subdued with legal humblings; in
which God gives them a conviction of their own utter helplessness and
insufficiency, and discovers the true remedy in a clearer knowledge of
Christ and His gospel.

When they begin to seek salvation, they are commonly profoundly ignorant
of themselves; they are not sensible how blind they are; and how little
they can do towards bringing themselves to see spiritual things aright,
and towards putting forth gracious exercises in their own souls. They
are not sensible how remote they are from love to God, and other holy
dispositions, and how dead they are in sin. When they see unexpected
pollution in their own hearts, they go about to wash away their own
defilements, and make the



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] [OT] Slony Triggers pulling down performance?

2008-01-18 Thread Chris Browne

[EMAIL PROTECTED] (Ow Mun Heng) writes:
> Just wondering if my 'Perceived' feeling that since implementing slony
> for master/slave replication of select tables, my master database
> performance is getting slower.
>
> I'm constantly seeing a very high amount of IO wait. ~40-80 according to
> vmstat 1
>
> and according to atop. (hdb/hdc = raid1 mirror)
> DSK | hdb | busy 83% | read1052 | write 50 | avio7 ms 
> |
> DSK | hdc | busy 81% | read1248 | write 49 | avio6 ms 
> |

The triggers generate some extra I/O, as they go off and write tuples
into sl_log_1/sl_log_2, so there's certainly a cost, there.

When you pull data from sl_log_1/sl_log_2, that will have a cost, too.

Replication does not come at zero cost...
-- 
(format nil "[EMAIL PROTECTED]" "cbbrowne" "linuxdatabases.info")
http://www3.sympatico.ca/cbbrowne/finances.html
"Power tends  to corrupt and absolute power  corrupts absolutely."  
-- First Baron Acton, 1834 - 1902

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] NoSQL -vs- SQL

2010-10-12 Thread Chris Browne

carlos.menn...@gmail.com (Carlos Mennens) writes:
> Just wondering how you guys feel about NoSQL and I just wanted to
> share the following article...
>
> http://www.linuxjournal.com/article/10770
>
> Looking to read your feedback and / or opinions.

There's a lot there to agree with, and a lot to "not disagree" with.

A couple of places I'd dispute a bit...

  "So, what is the opposite of ACID? BASE (Basically Available,
   Soft-state, Eventual consistency), of course."

No, that's not the "opposite," by any means.  

BASE represents a particular model for relaxation of the ACID model.
Indeed, it's more or less an academic codifying of the consequences of
systems like MQ-Series, which use message queues to allow several ACID
databases to talk to each other, relaxing the "total consistency" of
cross-DB transactions.

  "On the NoSQL side, the only thing in common is that there is nothing
  in common. Each NoSQL database has its own set of APIs, libraries and
  preferred languages for interacting with the data they contain. With
  an RDBMS, it is trivial to get data out in whatever format you need
  using whatever programming language you like best. Your choice of a
  NoSQL database might limit you to one or a handful of programming
  languages and access methods."

This is true, but is further, essentially, a step backwards into the way
databases were handled before SQL.  Back in the Old Days, your database
probably used IMS or CODASYL or some B-Tree library (hmm...  anything
similar here to SleepycatDB versus MongoDB versus Cassandra versus Tokyo
Cabinet?  :-)).

And the only way to extract that data was by writing a program to
traverse the data structure.  At best, you could choose between COBOL
and PL/1, and you'd have to write a separate program for each custom
retrieval.

The move from that to SQL was considered a big step *AHEAD*, because
SQL, being a declarative representation, was inherently a whole lot more
language/OS/framework-independent.  Whoops, "MongoDB is web scale"[1]
breaks all that.

"No schema" means there's no declarative representation, so the notion
of the same data being reusable without having to duplicate the
traversal logic goes away.

What I consider *particularly* bad is that this fails to recognize that
"NoSQL" represents, in considerable part, a throwback to the database
strategies of the 1960s that were consciously rejected in the adoption
of SQL.

[1]  http://andrewjstevens.com/2010/09/mongodb-is-web-scale/
-- 
(reverse (concatenate 'string "moc.liamg" "@" "enworbbc"))
"Any sufficiently complicated C or Fortran program contains an ad hoc
informally-specified bug-ridden slow implementation of half of Common
Lisp."  -- Philip Greenspun 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] NoSQL -vs- SQL

2010-10-19 Thread Chris Browne

dp...@pgadmin.org (Dave Page) writes:
> On Tue, Oct 12, 2010 at 2:58 AM, Peter C. Lai  wrote:
>> On 2010-10-11 05:57:37PM -0600, David Boreham wrote:
>>>   On 10/11/2010 5:46 PM, Carlos Mennens wrote:
>>> > Just wondering how you guys feel about NoSQL and I just wanted to
>>> > share the following article...
>>> >
>>> > http://www.linuxjournal.com/article/10770
>>> >
>>> > Looking to read your feedback and / or opinions.
>>> http://www.xtranormal.com/watch/6995033/
>>> (warning: may not be sfw).
>>
>> Someone should (or probalby has) made one that sounds exactly the same,
>> except for replacign the Mongo guy with MySQL and the MySQL guy with
>> PostgreSQL. That might be more apopros all around ;)
>
> Someone did indeed do that:
> http://nigel.mcnie.name/blog/mysql-is-a-database (also nsfw, iirc)

Alas, while it's somewhat funny, it's mighty clear that it's a
second-degree derivation, which rather diminishes its power.  

It gets confused as to who's the questioner, which yanks some sense out
of it.  At the start, the MySQL aficionado is the speaker, taking
questions, but at the end, somehow the Postgres guy ends up thanking
everyone for their questions.

To make this work requires that it be rather carefully done; sadly,
sufficient care doesn't seem to have been taken :-(.
-- 
output = ("cbbrowne" "@" "gmail.com")
http://linuxfinances.info/info/x.html
In case you weren't aware, "ad homineum" is not latin for "the user of
this technique is a fine debater." -- Thomas F. Burdick

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Database INNOVATION

2010-10-19 Thread Chris Browne

mcham...@gmail.com (Mauricio Chamati) writes:
> Postgree is the most amazing DB, even more it will be the only one that will
> remain free (the good ones) as MySQL has been taken. In order to move on with
> this project, as an Java Senior Architect, I am recomming to you guys to have
> an special "Section" for Audit tables. As we separate sequence in another seq,
> we should do the same for Audit tables. They have their own behavior, are less
> used, and so on. I got in the mailing list and am sending the email just to
> give this idea for you guys, keep in mind and implement as you want.

There is value to there being a tutorial (or such) available to explain
strategies for doing this.

I'm not so sure that this is something that properly belongs in the
Postgres reference documentation, as:

  - There are various strategies in the handling of auditing data, and
no forcibly normative/universal solutions;

  - This isn't an explanation of how to use Postgres (ergo part of the
"user guide" aspect of the documentation);

  - This isn't an aspect of the standard behaviour of Postgres (ergo
part of the "reference manual" aspect of the documentation).

It might be sort of neat to have another "Part" to the documentation
that consists of stuff like:
  - Design patterns
  - Design antipatterns (to avoid!)
  - Cookbook
  - Implementation strategies

(and there is overlap between those 4 things, to be sure!)

But that's not a portion of the Official Documentation that presently
exists, so I'm not sure it properly fits.

People have written articles on this sort of thing in the past,
including:

 - Elein Mustain, with material at varlena.com
 - Postgres Online Journal 
 - Articles blogged at Planet PostgreSQL
-- 
select 'cbbrowne' || '@' || 'gmail.com';
http://www3.sympatico.ca/cbbrowne/postgresql.html
"There are no threads in a.b.p.erotica, so there's no  gain in using a
threaded news reader."  -- (Unknown source)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Linux

2010-11-04 Thread Chris Browne

mgo...@intermodalsoftwaresolutions.net (Michael Gould) writes:
> What and why should I look at certain distributions?  It appears from
> what I read, Ubanta is a good desktop but not a server.

There are Ubuntu versions that don't promise support (e.g. - ongoing bug
& security fixes, and such) for nearly as long as one might like.

The sorts of distributions that do promise such things for longer
include:
  - Red Hat RHAS and such;
  - OpenSuSE;
  - CentOS;
  - Debian

You'll find people that are fans of each of these.  Not knowing any
particular basis to infer your preferences (and you mayn't be aware of
such, either!), it's tough to give any strong suggestions.

I don't think you'd be steered woefully wrong with any of them.
-- 
"I have  traveled the  length and breadth  of this country  and talked
with the best people, and can assure you that data processing is a fad
that won't  last out  the year".  --  Business books  editor, Prentice
Hall 1957

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Views - Under the Hood

2010-11-04 Thread Chris Browne

te...@chosen-ones.org (Terry Lee Tucker) writes:
> Lately, I've begun using views quite often especially when queries for 
> various 
> reports, etc. become complicated. I am now wondering if there is a price to 
> pay in terms of overhead for this. In truth, I don't really understand how a 
> view works. I know that it takes on many of the attributes of a table, but is 
> it a table? Is the data pulled together when one selects from the view or is 
> it maintained as a table all along. Guidance to the ignorant appreciated...

Under the hood, views represent a rewriting of the query.

   http://www.postgresql.org/docs/8.4/static/rules-views.html

If you have two tables that are joined together, in a view, then when
you query the view, you're really running a more complex query than
you're seeing, namely one that joins together the two tables, and does
whatever else you put into your query.

It *looks* like a table, for almost all intents and purposes, but what
it is, really, is a structure that leads to your queries being rewritten
to access the *real* tables that underly the view.

So the date is, as you suggest, "pulled together when one selects from
the view."
-- 
output = reverse("moc.liamg" "@" "enworbbc")
http://www3.sympatico.ca/cbbrowne/slony.html
"People  are more  vocally opposed  to fur  than leather  because it's
easier to harass rich women than motorcycle gangs." [bumper sticker]

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] experience with tags schemas

2010-11-05 Thread Chris Browne

matthieu.h...@wallix.com (Matthieu Huin) writes:
> Greetings,
>
> Does anyone here have any experience with tags schemas on postgresql ?
>
> I am struggling with a schema inspired by "scuttle" described here :
>
> http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html
>
> Performances drop dramatically when the data set increases. Is it
> "normal" and I should look at other data structures ? Or am I doing
> something wrong ?
>
> If you need more information, I'll be happy to give it to you.

I'd expect the "Toxi" solution (where do they get the names!??!?!) to be
the fastest one of the litter, as most of the "burden" of work goes into
the narrow table, "tagmap."  (It's also easily extended to cope with the
way Deli.cio.us does tagging, as you can attach user IDs to "tagmap")

In any case, it would make a lot of sense to run EXPLAIN ANALYZE on some
of the queries; that ought to provide some insight as to what parts are
behaving notably badly.

The article provides some good grist for the argument that proper
normalization frequently *IMPROVES* performance.  (Contrary to the all
too common "NoSQL is Web Scale" arguments that we must "denormalize for
performance.")
-- 
If this was helpful,  rate me
http://www3.sympatico.ca/cbbrowne/lsf.html
"Like I've always  said, if you don't have anything  nice to say, come
sit by me." -- Steel Magnolias

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Chris Browne

kamaual...@gmail.com (Allan Kamau) writes:
> I agree with Merlin, There is a surprising big number of "good"
> technology companies (including Google) out there using MySQL. For
> sometime I have been wondering why and have come up with a few
> (possibly wrong) theories. Such as: these companies are started by
> application developers not database experts, the cost (effort) of
> changing to other database engine is substantial given that that
> probably there is already so much inconsistencies in their current
> data setup coupled with considerable amount of inconsistency cover-up
> code at the application programs, and maybe the IT team is doubling up
> as a fire fighting department constantly putting out the data driven
> fires. This is then compounded by the rapid increase in data.

This wasn't a good explanation for what happened when Sabre announced
they were using MySQL:

   http://www.mysql.com/news-and-events/generate-article.php?id=2003_33

I used to work at Sabre, and what I saw was *mostly* an Oracle shop, but
with significant bastions of IMS, DB2, Teradata, and Informix.  Your
theory might fit with "dumb startups," but certainly not with Sabre,
which still has significant deployments of IMS!  :-)

I actually am inclined to go with "less rational" explanations; a lot of
decisions get made for reasons that do not connect materially (if at
all) with the technical issues.

One such would be that the lawyers and marketing folk that tend to be at
the executive layer do *their* thing of making deals, and when they're
busy "making deals," the only people interfacing with them are:

 - Salescritters from the Big O buying them lunch

 - Other Political Animals that Made The Decision to go with MySQL (or
   such) and are happy to explain, over golf, that "it went fine for us"
   (even if it didn't go entirely so fine; they didn't hear about it)

Lunch and golf can have material effects.
-- 
"cbbrowne","@","acm.org"
Rules of the Evil Overlord #67.  "No matter how many shorts we have in
the system, my  guards will be instructed to  treat every surveillance
camera malfunction as a full-scale emergency."

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pgadmin story...

2010-11-18 Thread Chris Browne

andrew  writes:
> hi all,
> I've just read that pgadmin team was hired by some company and the
> project will be closed...  It was a post from 2009.
> I'm just curious what's the story behind that. Did anyone leave?
> thanks.

Are you thinking about this announcement?
  http://pgsnake.blogspot.com/2009/04/goodbye.html

You might want to check the comments on it...  They'll shed some
additional light on the matter...
-- 
http://linuxfinances.info/info/finances.html
What do you mean "Why's it got to be built?"  It is a bypass.  You've got
to build bypasses.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Tool for data modeling and ER diagram

2010-12-07 Thread Chris Browne

gwch...@gmail.com (Gary Chambers) writes:
>> What is the best tool of data modeling and ER diagram for PostgreSQL.
>> http://wiki.postgresql.org/wiki/GUI_Database_Design_Tools
>
> Although it may not be considered a tool for use in the design (per
> se) of a database, I would highly recommend that SchemaSpy
> (http://schemaspy.sourceforge.net/) be added to the list.

I have added TCM to the list...
   http://wwwhome.cs.utwente.nl/~tcm/

I use it quite a lot when designing things.  It makes no attempt to do
automagic mapping of diagrams onto physical implementation, but when
trying to deal with reasonably deep modelling issues (e.g. - trying to
figure out what are the entities and what are the relationships between
them), the full detail of schemas of tables seem like a distraction.
-- 
let name="cbbrowne" and tld="gmail.com" in name ^ "@" ^ tld;;
http://linuxdatabases.info/info/slony.html
Rules of the Evil Overlord #2. "My ventilation ducts will be too small
to crawl through. 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] 2 versions of an entity worth distinct table?

2010-12-30 Thread Chris Browne

gvi...@gmail.com (gvim) writes:
> If a table representing contact details can have 2 but no more than 2 email 
> addresses is it really worth factoring-out email addresses to a separate 
> table. Technically it's a 1-to-many relationship so should be done this way 
> but what is the general practice out there in such "max. 2" situations? 
> Keeping them as:
>
> Primary Email
> Secondary Email
>
>  also preserves priority though not strictly normalised, I know.

I'd be inclined to normalize this, as it's:

a) difficult to guarantee that it will only ever be 2.

b) mighty nice to be able to attach validation rules to ONE simple email
   table, rather than having to put them on several columns possibly
   spread across more tables.
-- 
wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','gmail.com').
http://linuxfinances.info/info/internet.html
If nothing ever sticks to Teflon, how do they make Teflon stick to the
pan?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Standard schemas for common features?

2010-12-30 Thread Chris Browne

gvi...@gmail.com (gvim) writes:
> I'm putting together a database for a web project which has a lot of
> HR components which I imagine are fairly standardised so, to avoid
> re-inventing the wheel, is there a source of standard, downloadable
> schemas for common projects?

One might imagine so...

The last time I was looking for such (albeit not specifically in an HR
context), what I found was that there were enough local complications
resulting from the vagaries of local company structure that the bits
that could be treated as "standard" were minor in comparison to the
specialized structures.

There are things that get pretty ugly when you try to go "standardised":

 - You're sure to use addresses...  Will they be:
   a) Something localized to your country, complete with constraints
  encoded in the schema?  (e.g. - as with telephone number formats,
  zip codes, states, and such)
   b) Something intended to be generic across multiple countries?
  In that case, the schema generally CANNOT enforce country-specific
  rules, thereby increasing the risk of the data turning into crud

 - What normalization do you do concerning employees and their
   attributes?

   It's attractive to keep the model simple, but if there are legal
   requirements to track information about applicants (who may never
   actually become employees), simple mayn't be an option.

 - What genders do you store?  Male/Female actually *isn't* specific
   enough, with the simple counterexample where someone fails to specify
   a value.  

   (Which oughtn't be controversial even to folk that might be
   uncomfortable with more, erm, "outre" gender options...)

 - You operate in some legal jurisdiction.  (or perhaps several?)

   Different jurisdictions assortedly require or forbid the collection
   of particular pieces of information, and may require that to be
   encoded into your schema.  What is allowed/forbidden will vary, and
   if you operate in multiple jurisdictions, there's some need to follow
   *all* of the legal impositions.

A generic schema leads to considerable risk of allowing in nonsense
data, whilst *my* non-generic schema is quite likely to be terrible for
*you* when you have somewhat different business requirements.

There are enough conflicts available across these issues to mean that
you shouldn't just take the first sample schema you might happen across.
-- 
wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','linuxfinances.info').
http://linuxfinances.info/info/slony.html
Always remember that you're unique, just like everyone else. 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] UUID column as pimrary key?

2011-01-04 Thread Chris Browne

d.w...@computer.org (David Wall) writes:
> We're using UUID for primary keys in PG 8.4 without any issues.  I
> have no real insights into the details or performance issues, but
> always figured it was stored as a binary 128-bit value, but with added
> benefits of being able to enter and view them using a standard string
> format.  We don't sort them as they have no real meaning for us.

In principle, this might be a reason to want to do the long-outstanding
work on hash indexes; with UUIDs, it mayn't be useful to sort the
values, but you *do* want to be able to validate that they're unique.
-- 
output = ("cbbrowne" "@" "gmail.com")
http://www3.sympatico.ca/cbbrowne/
"Computers are  like air conditioners:  They stop working  properly if
you open windows."

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Chris Browne

dal...@solfertje.student.utwente.nl (Alban Hertroys) writes:
>>> From wikipedia, "only after generating 1 billion UUIDs every second for
>> the next 100 years, the probability of creating just one duplicate would
>> be about 50%. The probability of one duplicate would be about 50% if
>> every person on earth owns 600 million UUIDs."
>
>
> Even if the chances of one person encountering a UUID collision are
> about once in say 100 billion years, that could be tomorrow. It could
> also not happen at all in that time span. That's how chance works. You
> can never assume it won't happen. If the costs incurred by a collision
> are lower than the costs of preventing it, you can choose to just take
> the hit, but that doesn't go for _every_ problem domain. The medical
> systems mentioned up-thread are an example of that.
>
> Next to that, UUID's are generated by computers. I have no doubts that
> the numeric space that makes up a UUID allows for collision chances as
> low as described, but are computers capable of generating those
> numbers sufficiently random that they actually achieve that low a
> chance? I think that's pushing it.

I think I'd rather analyze this by:
a) Reading the RFC, and
b) Looking at some of the common implementations likely to get used

rather than to merely have "a doubt."

RFC 4122 does NOT point to randomness as the only criterion to
discourage collisions, and treating UUIDs as if they were merely about
being "sufficiently random to achieve low chance of collision" is
insulting to the drafters of the standard, because they were certainly
NOT so naive as to think that was sufficient.
-- 
"cbbrowne","@","acm.org"
"What I find most amusing  about com and .NET  is that they are trying
to solve a problem I only had when programming using MS tools."
-- Max M  (on comp.lang.python)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Chris Browne

a...@crankycanuck.ca (Andrew Sullivan) writes:
> On Wed, Jan 05, 2011 at 12:41:43PM -0700, Scott Ribe wrote:
>> I'm not sidestepping the point at all.
>
> You may be missing it, however, because. . .
>
>> The point is that the finiteness of the space is a red herring. The
>> space is large enough that there's no chance of collision in any
>> realistic scenario.
>> In order to get to a point where the probability
>> of collision is high enough to worry about, you have to generate
>> (and collect) UUIDs at a rate that is simply not realistic--as in
>> your second example quoted above.
>
> . . .the example was not that UUIDs are being generated and collected
> in one place at that rate, but that they're being generated in several
> independent places at a time, and if the cost of the collision is
> extremely high, there might be reasons not to use the UUID strategy
> but instead to use something else that is generated algorithmically by
> the database.  There's a trade-off in having distributed systems
> acting completely independently, and while I have lots of confidence
> in my colleagues at the IETF (and agree with you that for the
> overwhelming majority of cases UUIDs are guaranteed-unique enough),
> correctly making these trade-offs still requires thought and
> analysis.  It's exactly the kind of of analysis that professional
> paranoids like DBAs are for.

But it seems to me that some of the analytics are getting a little *too*
paranoid, on the "perhaps UUIDs are the wrong answer" side of the
column.

There's no panaceas, here; if the process that is using IDs is fragile,
then things can break down whether one is using UUID or SERIAL.

I prefer the "probably unique enough" side of the fence, myself.

And the process that uses the IDs needs to be robust enough that things
won't just fall apart in tatters if it runs into non-uniqueness.

I'd expect that to not need to be a terribly big deal - if there's a
UNIQUE index on a UUID-based column, then an insert will fail, and the
process can pick between things like:
 - Responding that it had a problem, or
 - Retrying.

And if the system isn't prepared for that sort of condition, then it's
also not prepared for some seemingly more likely error conditions such
as:
 - The DB connection timed out because something fuzzed out on the
   network
 - The DB server fell over and is restarting because (power failed,
   someone kicked the switch, disk ran out, ...)

It seems rather silly to be *totally* paranoid about the
not-infinite-uniqueness of UUIDs when there are plenty of other risks
lurking around that also need erro checking.
-- 
"cbbrowne","@","gmail.com"
http://linuxdatabases.info/info/slony.html
"How can you dream the impossible dream when you can't get any sleep?"
-- Sam Robb

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] UUID column as pimrary key?

2011-01-06 Thread Chris Browne

wmo...@potentialtech.com (Bill Moran) writes:
> If the chance of a duplicate is 1 in a hundred gazillion, then I can
> still hit a dupe the VERY FIRST TIME I USE IT.
>
> I'm writing software that is intended to be used to save lives in the
> event of an earthquake or flood or cosmic ray flipping bits or any
> other massive disaster.  The last thing I need while a bunch of EMTs
> are digging bodies out of the rubble trying to save their lives is to
> need to resolve a key conflict with a bunch of mobile devices, most of
> which are not reachable because the cell network is down because of
> the disaster.  There's going to be enough other shit going wrong ...
> my software is supposed to be part of the solution.
>
> I don't give a fuck how small the chance of conflict is, the only
> viable option for that chance is 0.  Period.  Any argument to the
> contrary is stupid, asinine and outright negligent.

If your system is sufficiently negligently designed that this particular
conflict causes it to kill people, then I wouldn't be too inclined to
point at this issue with UUIDs being the Real Problem with the system.

This is NOT the only risk that the system faces; you can't get *nearly*
as low probabilities attached to hardware and network issues such as:
 - Disks failing
 - Cosmic rays twiddling bits in memory
 - Network connections failing part way through the work
 - Dumb techs blindly cloning the same "host key" onto every one of the
   EMTs' data collection devices

That last item is an argument in *FAVOR* of using UUIDs for the "EMTs
digging bodies out of rubble" scenario...  The typical alternative to
UUIDs would be to have (Sequence + Host Data), and I'd expect there to
be a considerable risk of Dumb Techs making mistakes there.  It wouldn't
be difficult for such a scenario to induce systematic key collisions.

It's never just about one risk; it's about *all the risks.*

> And that's been my point all along, despite people trying to dilute it
> with nonsense numbers that they don't understand: UUIDs are sufficiently
> unique for 99.999% of the applications out there.  However, you
> should always take 5 or 10 minutes to consider whether your application
> is one of the .001% that can't tolerate the tiny risk.

If your application is so fragile that it cannot tolerate *that* tiny
risk, then I have to wonder if your system isn't Just Plain Too Fragile,
because there are other unavoidable risks likely to be of much greater
probability.
-- 
"Have  you noticed   that,  when we were   young,  we were   told that
`everybody else   is doing  it' was   a  really  stupid reason  to  do
something,  but now it's the standard  reason for picking a particular
software package?" -- Barry Gehm 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] UUID column as pimrary key?

2011-01-06 Thread Chris Browne

dennis.jenkins...@gmail.com (dennis jenkins) writes:
> The UUID itself is 128 bits.  Some of those bits are pre-determined.
> I don't recall, but I think that a "normal" UUID has 121 bits of
> randomness.

That doesn't match RFC 4122 very well...

It indicates 5 forms of UUIDs:

1) Time-based, where about 1/2 the data comes from local timestamp data,
   and 48 bits come from MAC address (or similar)

2) "DCE Security" (about which it says little)

3) Name-based, using MD5 hashing

4) Randomly generated UUIDs (which are quite likely what you're thinking
   about) have 122 bits of random data

5) Name-based, using SHA-1 hashing

The reasonable choices for a would-be artificial primary key seem to be
1 and 3; in a distributed system, I'd expect to prefer 1, as the time +
host data are likely to eliminate the "oh, it might just randomly match"
problem.

Note the set of functions in the uuid-ossp contrib module fit this, down
to omitting Version 2 :-).

t...@localhost->  \df+ public.uuid_generate*
 List of 
functions
 Schema |Name| Result data type |Argument data types|  
Type  | Volatility |  Owner   | Language |Source code | Description
++--+---+++--+--++-
 public | uuid_generate_v1   | uuid |   | 
normal | volatile   | postgres | c| uuid_generate_v1   |
 public | uuid_generate_v1mc | uuid |   | 
normal | volatile   | postgres | c| uuid_generate_v1mc |
 public | uuid_generate_v3   | uuid | namespace uuid, name text | 
normal | immutable  | postgres | c| uuid_generate_v3   |
 public | uuid_generate_v4   | uuid |   | 
normal | volatile   | postgres | c| uuid_generate_v4   |
 public | uuid_generate_v5   | uuid | namespace uuid, name text | 
normal | immutable  | postgres | c| uuid_generate_v5   |
(5 rows)
-- 
"I'm all for advancement.  However  rich text on an ephemeral media is
a totally brain-dead idea.  Nobody  in their right mind would take the
effort to prettyfy documents that are going to be gone in a few days."
-- Jay Denebeim 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] UUID column as pimrary key?

2011-01-06 Thread Chris Browne

dal...@solfertje.student.utwente.nl (Alban Hertroys) writes:
> On 6 Jan 2011, at 17:51, Chris Browne wrote:
>
>> wmo...@potentialtech.com (Bill Moran) writes:
>> If your system is sufficiently negligently designed that this particular
>> conflict causes it to kill people, then I wouldn't be too inclined to
>> point at this issue with UUIDs being the Real Problem with the system.
>> 
>> This is NOT the only risk that the system faces; you can't get *nearly*
>> as low probabilities attached to hardware and network issues such as:
>> - Disks failing
>> - Cosmic rays twiddling bits in memory
>> - Network connections failing part way through the work
>> - Dumb techs blindly cloning the same "host key" onto every one of the
>>   EMTs' data collection devices
>
> Let's say that you actually build a mission critical system for which
> you'd need to evacuate the country if it fails. You pick the best ECC
> RAM you can find, the most reliable type of disk storage available,
> your fallback network has a fallback network of it's own,
> etc. Basically you have done everything you could to ensure that the
> chances of the system failing are as small as technically possible.
>
> All those little failure chances add up to a certain number. Using
> UUID's for your ID's is not required for the design of the system, yet
> you chose to do so. You added a nearly infinite chance of UUID
> collisions to the accumulated chance of the system failing.

Infinite?  The probability can't conceivably exceed 1.

It's scarcely likely to exceed "infinitesimal."

I've built clustered systems, and frequently, the resulting Rube
Goldberg apparatus that tries to protect against failures of the other
apparatus trying to protect against failures of further apparatus that
tries to protect against failures introduces a tall and unwieldy stack
of intricately interwoven components such that operators need to be
*mighty* careful not to tip anything over lest the protective apparatus
collapse, knocking over the system it was supposed to protect.

> Now the system miraculously fails and the country needs evacuating. A
> committee is going to investigate why it failed. If the dumb techy
> above is responsible, they just found themselves a scape-goat. If they
> didn't, but stumble upon your unnecessary usage of UUID's
> instead... Let's just say I don't want to be that person.

If the system is that mission critical, then it well and truly warrants
doing enough proper analysis of the risks to *know* the risks of the
various expectable failure conditions, and to do so in rather more
detail than the oversimplification of characterizing them as
"infinitesimal" or "infinite."

> I have to agree with Bill here, if lives depend on your system then
> anything that adds to the failure chances is very hard to defend. In
> the end it often boils down to responsibility in case of failure, not
> to mention what it does to your own peace of mind.

It seems to me that using serially assigned values, along with manually
assigned server IDs, to construct a would-be-unique value, is likely to
introduce quite a lot *more risk* of system failure than would the use
of UUIDs.

So someone that rules out UUIDs based on some fallacious imagined
"infinite chance of collisions" is jumping away from a small risk, and
accepting one much more likely to take lives.

We haven't seen any indication that would distinguish between "infinite"
and "infinitesimal," beyond the fact that "infinite" is infinitely
larger than the largest probability that one can find for an event,
which is 1.
-- 
(format nil "~...@~s" "cbbrowne" "gmail.com")
"But   life  wasn't yes-no,   on-off.   Life was shades   of gray, and
rainbows not in the order of the spectrum."
-- L. E. Modesitt, Jr., _Adiamante_

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to generate unique invoice numbers for each day

2011-01-17 Thread Chris Browne

m...@smogura.eu (Radosław Smogura) writes:
> In any approach preventing gaps, locking is required. This is real life 
> situation; imagine you have two coworkers and then they need to create 
> invoices, so they looks in ledger (or a last day copy of ledger in their 
> offices; international company, but no Internet, only fax and telephone) and 
> checks last number used, what should be done next?

In this case, I don't imagine it's necessary to do terribly much
explicit locking.

The crucial thing to 'serialize' is the "ownership" of the unique
invoice number, and a PRIMARY KEY constraint does that perfectly nicely,
and with downsides that are pretty much unavoidable.

- In the simple, no-conflict case, the value is generated by querying
  the highest value for the specified day, adding 1 to it, and in the
  face of no conflicting requests going in at the same time, having
  everything succeed.

- On the other hand, if two workers are trying to generate invoices
  concurrently, both compute the highest value, adding 1, both getting
  the same value.

  The first worker clicks on whatever does the COMMIT, the work
  succeeds, and their invoice is successfully stored.

  The second worker selects whatever does the COMMIT, and, since an
  invoice is already committed with the same invoice number, the request
  fails based on the violation of the primary key.

  The solution to that problem is to resubmit the request, querying for
  a new invoice number, which (hopefully!) succeeds.  And there's not
  much possible as an alternative to this resubmission.

If this all is turning out badly, then I suggest a third possibility,
namely to:
  - Initially use a sequence (or similar) to generate an invoice number
in that is sure to be unique, but which doesn't conform to
expectations.

Suppose...  We set up a sequence, and initially assign invoice ID
values based purely on that sequence.  As it contains no '-'
characters, it never conflicts with the IDs that we *want*.  And as
increased values are assigned automatically, two workers entering
data never fight over specific values.

  - Then, have a single process that comes in afterwards and rewrites
the invoice IDs in accordance with policy.

In effect, this process looks for all invoices where the ID is
purely numeric.  And it renumbers them into the format desired.
Since this renumbering takes place within a single process, it
doesn't "fight" with other processes that might be holding onto
those not-yet-properly-formatted IDs.

The "magic" still needed is to run that rewrite process.  Something
needs to invoke it periodically to fix up the IDs.  Left as exercise
to the reader ;-). 
-- 
let name="cbbrowne" and tld="gmail.com" in name ^ "@" ^ tld;;
"Seriously, I think you really need to find out who took your sense of
humor, beat the crap out of  him, and take it back. It's getting old."
-- Michael J Ash , on comp.lang.objective-c

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Moving from SQL Anywhere to PostGres - First Time

2011-01-20 Thread Chris Browne

"Robert Paresi"  writes:
> Hello,
>
> We have 700 user install base using Sybase SQL Anywhere 9.02
>
> We are looking at migrating these installations over to PostGres
>
> 1.  Very Very Short Answer Please - why should we?

Cheaper?  Perhaps faster?  It's tough to guess, absent of metrics as to
why you might prefer Postgres.

> 2.  Does anyone have a utility or migration application to read SQL
> Anywhere to go to PostGres

Usually, what's done here is to use whatever tools are built-in to:
 a) Pull out the DDL defining the schema;
 b) Dump the data as a series of INSERT statements.

I suggest you take a peek at the "conversion" page on the development
wiki:

> 3.  Does PostGres handle column descriptions (ie: you can give each
> column a 50 character description) and then access it via SQL Result
> Set (like I can do in Sybase)

Sure.

You can attach comments to virtually any sort of object:

That page describes how comments can be pulled out.

> 4.  Is there any Date/TimeStamp issues and conversions I need to know
> about. I use simply a DATE field and a TIME field - but do not use
> DATE/TIME stamp fields together.

TIMESTAMP and TIMESTAMPTZ are the notable "date and time" types in
Postgres.  They have the fairly considerable merit over DATE + TIME that
you can do rather simpler comparisons using "full" timestamps.  

Thus, you can replace stuff like:

  select * from txn_table a, other_thing b
  where
 a.date_col > b.date_col or
 (a.date_col = b.date_col and a.time_col > b.time_col)

with
  select * from txn_table a, other_thing b
 where a.timestamp_col > b.timestamp_col

Probably the latter is faster, but the *important* part is that it's
enormously simpler, and your developers won't be prone to make mistakes
by missing comparisons, getting them wrong, or by having the queries so
much more complex that they just don't understand them.

> 5.  What UI/Developer tools (GUI) are available to manage the database
> as well as add/change columns rather than doing it via SQL commands.

There's a barrel-load of graphical tools:

Of the "free software" tools, PGAdmin III is the one that is most
strongly maintained alongside the Postgres project, so is quite well
commendable.
-- 
http://linuxdatabases.info/info/multiplexor.html
Rules of the  Evil Overlord #87. "My vats  of hazardous chemicals will
be covered when not in use.  Also, I will not construct walkways above
them." 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Adding ddl audit trigger

2011-01-27 Thread Chris Browne

guilla...@lelarge.info (Guillaume Lelarge) writes:
> Le 26/01/2011 23:13, Tom Lane a écrit :
>> Guillaume Lelarge  writes:
>>> Le 26/01/2011 22:29, Lawrence Cohan a écrit :
 All I need is to at least be able and save a userid(current_user),
>>> timestamp, action, and the name of the object and this could be
>>> done easily by adding triggers to these pg catalogs.
>> 
>>> Nope, sorry. You can't add triggers on system catalogs.
>> 
>> This has been discussed, and I think it's on the TODO list.  It's not
>> "easily done".
>
> Yes, AFAIR, it was discussed during pgcon dev meeting last year. Jan
> Wieck was interested to work on this (I suppose to help Slony work
> better with DDL). Still hope to see some progress on this :)

The notion there is that it would be Nice To Have DDL triggers where
you'd be able to get at stuff like (as I "wave hands frantically"):

  NEW.ddl_statement

which would capture the change that was requested.

This is further harder than it seems, because what you'd want isn't
simply the DDL that was submitted, but rather a somewhat post-processed
"canonical" form so that you've got:
  - consistent encoding of strings
  - consistent encoding of datestamps
  - fully qualified names

So that's not the query that was submitted, but rather some re-expansion
of the parse tree.

It would also be nice if one of the return results was some sort of...
  NEW.parsed_statement
so that you could have a function that walks the parse tree, grabbing
whatever bits it wants to grab.  It is surely unattractive to require
that drawing data out of the DDL requires re-parsing it.

See also: 

The latest "conclusion" seems to be that if someone really wants to
contribute that effort, they're free to do so.

-- 
Rules of  the Evil Overlord  #149. "Ropes supporting  various fixtures
will not be  tied next to open windows  or staircases, and chandeliers
will be hung way at the top of the ceiling."

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Book recommendation?

2011-02-01 Thread Chris Browne

hero...@unicell.co.il (Herouth Maoz) writes:
> As a result of my recent encounter with table bloat and other tuning
> issues I've been running into, I'm looking for a good resource for
> improving my tuning skills.
>
> My sysadmin ran into the following book:
>
> PostgreSQL 9.0 High Performance, by Gregory Smith, ISBN 184951030X
> http://amzn.com/184951030X
>
> Which covers versions 8.1 through 9.
>
> Any opinions on this book? Other suggestions?

It's the best thing in that vein that is available, and is rather good.

The only thing that's *somewhat* comparable is _PostgreSQL_ by Korry
Douglas & Susan Douglas, which, alas, is now 7 years old, and thus
rather dated.  It was the one reference that had substantive material on
query planning, though that's likely somewhat less relevant to you.

You should certainly take a look at Greg Smith's book.
-- 
(reverse (concatenate 'string "moc.liamg" "@" "enworbbc"))
http://linuxdatabases.info/info/finances.html
"what would  we do without C?  we   would have PASAL,  BASI, OBOL, and
Ommon Lisp." -- #Erik

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Why does my DB size differ between Production and DR? (Postgres 8.4)

2011-02-02 Thread Chris Browne

atsaloli.t...@gmail.com (Aleksey Tsalolikhin) writes:
> Situation:  Disk usage on production server root filesystem is at 68%
> utilization (80 GB used), on DR is at 51% (56 GB used).   We use
> SlonyII-1.2.x to keep the DR up to date.  I would like to account for
> the 24 GB difference.

It's more than likely a result of transactions failing on the origin,
leaving dead space around, where replication doesn't bother trying to do
any work for the "failed stuff," with the consequence that there's no
corresponding "clutter" on the replica.

I'm talking here about cases of failures that are expected.

Look to what activities you have that tend to lead to tranactions that
ROLLBACK.  Slony-I makes no attempt to replicate activity that is
terminated by ROLLBACK (explicit or implicit), so all that activity
won't be processed on replicas.

For instance, in our applications, operating domain registries,
intentionally failed database transactions occur heavily *common*
whenever customers are 'fighting' over domain names - one and only one
customer can win the name, while all others lose, and each losing
request leaves a certain amount of mess in its wake.  Common patterns of
this sort include transactions that fail because:

 - Customer has insufficient funds on account to pay for the transaction

 - Inventory request fails because there are insufficient items in stock

 - Attempt to insert a second instance of an object that is required to
   be unique

 - Rejection of partially processed transaction due to violation of some
   business policy (which is mighty open-ended!)

It's likely, as well, that there is some set of tables that you are not
vacuuming heavily enough.  Probably a table or three needs to have
CLUSTER run on it to bring them down to size, and you may need to fiddle
with autovacuum parameters to vacuum more frequently.
-- 
(reverse (concatenate 'string "moc.liamg" "@" "enworbbc"))
http://linuxdatabases.info/info/lisp.html
"Microsoft has world class quality control" -- Arthur Norman

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Why does my DB size differ between Production and DR? (Postgres 8.4)

2011-02-02 Thread Chris Browne

peter.geoghega...@gmail.com (Peter Geoghegan) writes:
> On 1 February 2011 03:52, Scott Marlowe  wrote:
>> You can reclaim that space by doing a cluster or vacuum full on the
>> subject table.
>
> Yes, but this is a fairly bad idea, particularly prior to PG 9.0 . 9.0
> has a new vacuum full implementation that makes it not so bad - it
> just rewrites the entire table.
>
> VACUUM FULL will take exclusive locks on tables being vacuumed. It
> also causes index bloat. You should be very careful about using it on
> a production system.
>
> I'm not sure why you'd advocate CLUSTER as a way to reclaim disk space.

Because it works pretty well; it reorganizes the table on the basis of
the order indicated by one index, and simultaneously:
 a) Shortens the table, removing all dead space;
 b) Regenerates all indices, so they too have no dead space.

Traditional VACUUM FULL tends to worsen the dead space problem on
indices, so adds the "insult to injury" problem that after running
VACUUM FULL, you might need to reindex, and that aftermath is nearly as
expensive as CLUSTER.

CLUSTER is likely to be quicker than VACUUM FULL, and it gives nice,
squeaky-tight indexes.

The new form of VACUUM FULL in 9.0 changes things, but it wasn't obvious
that the original poster was on 9.0.

> I wouldn't increase index fill factor as an optimisation, unless you
> had the unusual situation of having very static data in the table.

-- 
output = reverse("gro.mca" "@" "enworbbc")
http://linuxfinances.info/info/wp.html
"The world needs more people like us and fewer like them."  -- Unknown

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Database Design Question

2011-02-02 Thread Chris Browne

carlos.menn...@gmail.com (Carlos Mennens) writes:
> I was sitting down thinking the other day about when is it good to
> generate a new database or just use an existing one. For example, lets
> say my company name is called 'databasedummy.org' and I have a
> database called 'dbdummy'. Now I need PostgreSQL to manage several
> applications for my company:
>
> - webmail
> - software
> - mediawiki
> - phpbb forum
>
> Now what I've been doing is just creating multiple tables in the
> 'dbdummy' database but each table is owned by different users
> depending on their role. Is this bad? Should I be creating new
> databases for each application above rather than one single company
> database?
>
> Just trying to understand good DBA design practice. This is obviously
> a very general question but any feedback on what good or bad issues
> would come from me dumping all my tables for applications in one
> database or spread out across multiple databases on PostgreSQL.
>
> Thank you!

I think it's likely that these would properly have separate databases,
as...

 - There isn't *that* much data that is likely to be shared between
   these applications, so it probably doesn't add a lot of value to
   force them together.

 - If you integrate the databases together, then any maintenance on "the
   database" represents an outage for *ALL* those systems, whereas if
   they're separate, there's at least the possibility of outages being
   independent.

You'll have to think about the expected kinds of failure cases to
determine in which direction to go.
-- 
(format nil "~S@~S" "cbbrowne" "acm.org")
http://www3.sympatico.ca/cbbrowne/rdbms.html
Make sure your code does nothing gracefully.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] database instance creation

2011-02-18 Thread Chris Browne

prabakara...@zohocorp.com ("prabakaran.a") writes:
> Dear All,
>
>  In Mysql, If we create a directory under mysql/data, mysql treated as
> database instance without starting mysql server.
>  Is there any similar provision in postgres where we can create a
> database instance without starting postgresql server.

Well, the Postgres program initdb is used to create a database instance;
it does not automatically start a server against it.  That sounds like
that might be a similar provision.
-- 
"Programming today  is a race  between software engineers  striving to
build bigger and better  idiot-proof programs, and the Universe trying
to  produce  bigger  and  better  idiots.  So  far,  the  Universe  is
winning."  -- Rich Cook

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] disable triggers using psql

2011-02-18 Thread Chris Browne

li...@serioustechnology.com (Geoffrey Myers) writes:
> Greg Sabino Mullane wrote:
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: RIPEMD160
>>
>>
>>> cut and paste:
>>>
>>> set ON_ERROR_ROLLBACK;
>>
>> Should be
>>
>> \set ON_ERROR_ROLLBACK on
>>
>> You can also set this when calling psql like so:
>>
>> psql --set ON_ERROR_ROLLBACK=on
>>
>> But that's getting off-topic now, as we've got the problem narrowed:
>>
>>> INSERT 0 1
>>
>> This shows the session_replication_role is working as it
>> should. Double check where and how you are setting it; your foreign
>> key problems will go away once it is set correctly.
>
> I'm not sure how to address this.  I'm not exactly sure where to place
> session_replication_role.  It's very close to the top of the file:
>
> --
> -- PostgreSQL database dump
> --
>
> SET client_encoding = 'UTF-8';
> SET standard_conforming_strings = off;
> SET check_function_bodies = false;
> SET client_min_messages = warning;
> SET escape_string_warning = off;
>
> set session_replication_role = replica;
>
> I'm still getting the errors.  If it doesn't belong at the beginning
> of this process, I'm not exactly sure where it should go.

Hmm.  Are you sure 'replica' is the right value to set for
session_replication_role?  I'd expect that when pulling in data from
pg_dump, that 'local' might be the right value, since pg_dump isn't
acting as a replication manager.  

Don't trust me blindly on this - I could be wrong - but you should
certainly validate that you're setting that role GUC appropriately.
-- 
select 'cbbrowne' || '@' || 'gmail.com';
http://linuxfinances.info/info/slony.html
You shouldn't anthropomorphize computers; they don't like it.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] updating all records of a table

2011-03-04 Thread Chris Browne

robjsarg...@gmail.com (Rob Sargent) writes:
> On 03/04/2011 04:54 AM, Vibhor Kumar wrote:
>> 
>> On Mar 4, 2011, at 5:17 PM, Andrew Sullivan wrote:
>> 
>>> On Thu, Mar 03, 2011 at 08:03:59PM -0700, Gauthier, Dave wrote:
 Hi:

 I have to update all the records of a table.  I'm worried about
 what the table will look like in terms of fragmentation when this
 is finished.  Is there some sort of table healing/reorg/rebuild
 measure I should take if I want the resulting table to operate at
 optimal efficiency?  What about indexes, should I drop/recreate
 those?
>>>
>>> Is it really important that it happen in one transaction?
>>>
>>> In the past when I've had to do this on large numbers of rows, I
>>> always tried to do it in batches.  You can run vacuums in between
>>> groups, so that the table doesn't get too bloated.
>>>
>>> Otherwise, yeah, you're better off to do some of the cleanup Joshua
>>> suggested.
>>>
>>> A
>> +1 
>> 
>> If UPDATE is for all rows, then 
>> 1. CTAS with change value in SELECT
>> 2. Rename the tables. -- This will give zero Bloats.
>> 
>
> Elegant, but of course, this doubles the disk space consumed.  Not
> always tenable.

... But if you needed to do it in one Swell Foop, there really wasn't
any other choice.

The only way *not* to double (or more) space consumption is to do
incremental updates, vacuuming around each increment.
-- 
select 'cbbrowne' || '@' || 'linuxdatabases.info';
http://www3.sympatico.ca/cbbrowne/lisp.html
We  are MICROS~1.   You will  be assimilated.   Resistance  is futile.
(Attributed to B.G., Gill Bates)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Is PostGreSql's Data storage mechanism "inferior"?

2008-02-01 Thread Chris Browne

[EMAIL PROTECTED] ("Andrej Ricnik-Bay") writes:
> On 01/02/2008, Tony Caduto <[EMAIL PROTECTED]> wrote:
>> The part about the BSD license is bogus. A BSD license is the most
>> desirable of any Open Source license and gives you the right to use
>> PostgreSQL in your commercial apps without worry.

> While I'm a big fan of the BSD license (for varied reasons) I think that
> OpenSource hardliners like RMS would argue that the BSD license is *NOT*
> in the true spirit of OpenSource *BECAUSE* of what you list as a bonus
> of it ... the locking down of benefits reaped from OpenSource not getting
> back into the stream.

RMS wouldn't argue that, because he is uninterested in "OpenSource."
That's a term created/popularized by Eric Raymond, who is no friend of
RMS.

RMS is interested in "free software," and considers various
"BSD-related" licenses to be reasonable choices for free software.

   

He'd prefer that people use the GPL, but I don't think there's
anything overly "hard line" about the notion of the author of a
license preferring it to others.  It would seem mighty odd if he said
something like "I wrote the GPL, but think you should use the  instead."
-- 
"cbbrowne","@","cbbrowne.com"
http://cbbrowne.com/info/linuxxian.html
Jumping  off a  cliff doesn't  kill you!  It's only  when you  hit the
ground...

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] PostreSQL Single File ( Like M$ Access)

2008-03-10 Thread Chris Browne

[EMAIL PROTECTED] ("Christian Teguh") writes:
> Hi guys, i just wondering, i there a way to make PostgreSQL database
> like M$ Access. That have only single or several files without
> installing it.?? Thanks for your attention.

No, there isn't, just as there isn't a "make it faster" flag.

If you truly require that your database be stored in one file, then
you will need to consider some other database system.  I believe
SQLite and Firebird both are able to function as you evidently desire.
-- 
(reverse (concatenate 'string "gro.mca" "@" "enworbbc"))
http://linuxdatabases.info/info/oses.html
VERITAS AETERNA -- DON'T SETQ T.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] postgre vs MySQL

2008-03-13 Thread Chris Browne

[EMAIL PROTECTED] (David Wall) writes:
>> What then?  Could it be marketing or the sad results of a avalanche
>> effect? Geee, there's a thought.
>>
> What a wide variety of topics.  One big difference for me is that
> MySQL used to be open source, but it no longer is.  It's an odd hybrid
> OSS that barely makes sense to me since they claim to be open source
> under the GPL, and while you can contribute code to them (I did so in
> their JDBC driver many years ago before switching to Postgresql), they
> then own the code (fine!), but if you want to use it in any system
> that's not itself open source, you have to pay to get a license.  Pay
> for GPL software?
>
> But they proudly state they are part of LAMP, yet only the "M" charges
> to use their software.  The real leaders in these open source camps
> are Linux and Apache, neither of which have such absurd pseudo-open
> licensing terms.

Indeed.  If Linux had had the same sorts of obligations as MySQL(tm),
it would have wound up a mere curiosity, because there were plenty of
other OS projects around of fairly comparable functionality
(particularly if we step back to 1993!), and no one would have put up
with such in that context.
-- 
let name="cbbrowne" and tld="linuxfinances.info" in name ^ "@" ^ tld;;
http://linuxfinances.info/info/multiplexor.html
:FATAL ERROR -- VECTOR OUT OF HILBERT SPACE

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] postgre vs MySQL

2008-03-14 Thread Chris Browne

[EMAIL PROTECTED] (Craig Ringer) writes:
> Erik Jones wrote:
>> They've gotten around that by making MySQL "dual-licensed".  If
>> you're going to be using MySQL in a commercial application then you
>> can not use the GPL'd version, you have to use their paid,
>> commercial license.
>>
> My understanding is that's not quite true. The client libraries are
> GPL, so you can't use them directly, but I don't see what would stop
> you using their ODBC/JDBC drivers with your non-GPL application
> (especially if you support other ODBC databases as well). The server
> can't be bundled in your application, but you can still get the user
> to install it and use it with your application.

Well, there's a certain amount of distance between "expectations" and
"legal requirements," and lots of room for weasel wording...




According to the above things that MySQL AB has said/continues to say,
it is quite clear that the owners of the code *intend* that
"commercial users" should pay them a licensing fee,
-- 
(format nil "[EMAIL PROTECTED]" "cbbrowne" "cbbrowne.com")
http://linuxdatabases.info/info/x.html
"Blessed are those who backup daily; for they shall be restored."
-- Seen in .signature of Veritas rep

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PostgreSQL Replication with read-only access to standby DB

2008-03-26 Thread Chris Browne

[EMAIL PROTECTED] (Keaton Adams) writes:
> That is an interesting question.  If our organization were to help fund the
> development of such a feature, would that be something taken into
> consideration by the development team?

I seem to recall there being a relevant Google Summer of Code project
about this, last year.

http://code.google.com/soc/2007/postgres/appinfo.html?csaid=6545828A8197EBC6

http://archives.postgresql.org/pgsql-hackers/2007-03/msg00050.php

I do not recall how far it got.  It obviously didn't make it into 8.3
;-)!
-- 
(format nil "[EMAIL PROTECTED]" "cbbrowne" "acm.org")
http://cbbrowne.com/info/rdbms.html
Why are they called apartments, when they're all stuck together? 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Survey: renaming/removing script binaries (createdb, createuser...)

2008-03-26 Thread Chris Browne

[EMAIL PROTECTED] (Steve Atkins) writes:
> There are no existing clashes with system tools that I'm aware of. Are
> there any? Most of the clashes are with other installations of
> postgresql installed on the same machine, so if name clashes is the
> real reason for the change, then the version number or port number of
> the installation should be part of the command name - pg_8.3.1_psql,
> and so on.
>
> The normal way of dealing with multiple installations and name clashes
> would be to set your shell path appropriately, though, surely? It's a
> more normal way of dealing with that than renaming the actual binaries.

pg_8.3.1_psql??  That seems pretty awful to me...

My practice is to use PATH to control this.

Debian has some "more or less idiosyncratic" tools to handle having
multiple clusters around, with pg_wrapper controlling which of these
binaries you wind up pointing to.  I keep finding that I hate that,
though I haven't gotten so mad about it that I have done anything
about it ;-).

> If, on the other hand, the main reason behind name changes were to
> make the naming more intuitive for new users then changing the names
> to something more appropriate might be useful. Perhaps something like
> changing "postmaster" to "postgresqld", "pg_ctl" to
> "safe_postgresqld", change "psql" to "postgresql", replace "createuser
> -s monty" with "postgresql --user=root", followed by  "GRANT ALL
> PRIVILEGES ON *.* TO 'monty'@'localhost';" and so on.

pg_ctl is really more like the scripts in /etc/init.d; whatever it
"ought" to be called instead, I don't think "safe_postgresqld" is
it...
-- 
select 'cbbrowne' || '@' || 'linuxfinances.info';
http://cbbrowne.com/info/wp.html
All ITS machines now have hardware for a new machine instruction --
PFLTProve Fermat's Last Theorem.
Please update your programs.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Way to shutdown/freeze/stop an individual database without taking postmaster down?

2008-04-10 Thread Chris Browne

[EMAIL PROTECTED] ("W S") writes:
> I was asked this question, and I wasn't sure if it is possible:
>
>
> do you know of a way to stop just one database (not delete/drop) on
> our PostgreSQL 8.1 server?
>
>
> And, while I know how to shut down postmaster, and/or put in rules to
> pg_hba.conf to limit access to a certain database, is there any way to
> freeze or stop just one database and not others?  I'm attempting to
> RTFM it, but so far I've had no luck.

You could put in rules to pg_hba.conf to shut users out of a
particular database, and then run "pg_ctl reload" to signal the
postmaster to start applying the new rules.

That will prevent any new connections from coming into the database in
question.

But does not get rid of existing connections.  You could kill the
backends associated with the existing connections...  You can query
pg_catalog.pg_stat_activity to find the relevant list.
-- 
let name="cbbrowne" and tld="linuxdatabases.info" in String.concat "@" 
[name;tld];;
http://cbbrowne.com/info/advocacy.html
Whatever is  contradictory or paradoxical is called  the  back of God.
His face, where all exists in perfect harmony, cannot be seen by man.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pgcrypto and dblink

2008-04-10 Thread Chris Browne

[EMAIL PROTECTED] ("Roberts, Jon") writes:
> I am moving from Windows to Solaris and I need pgcrypto and dblink.
> Where are these?  I don't see anything in the configure that suggests it
> is even an option.

They are part of the set of "contrib" functions.

You head to directory "contrib", and, if those are the only ones you
need, head assortedly to:

a) contrib/pgcrypto, and run "make install" to install that, then

b) contrib/dblink, and (surprise!) run "make install" to install it.

That compiles anything that needs to be compiled, and stows the object
code in the installation's "lib" area, and stows scripts to activate
the respective services in the installation's "share" area.

So step c) and d) would be...

c) Load, into whichever databases you want to use these functions in,
the script
   share/contrib/pgcrypto.sql

d) Load, into whichever databases you want to use these functions in,
the script
   share/contrib/dblink.sql
-- 
(reverse (concatenate 'string "ofni.secnanifxunil" "@" "enworbbc"))
http://cbbrowne.com/info/languages.html
Monday is an awful way to spend one seventh of your life.  -- Unknown

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] SQL injection, php and queueing multiple statement

2008-04-11 Thread Chris Browne

[EMAIL PROTECTED] (Ivan Sergio Borgonovo) writes:
> Is there a switch (php side or pg side) to avoid things like:
>
> pg_query("select id from table1 where a=$i");
>
> into becoming
>
> pg_query("select id from table1 where a=1 and 1=1; do something
> nasty; -- ");
>
> So that every
> pg_query(...) can contain no more than one statement?

The conventional approach to this sort of thing is to use prepared
statements:

http://ca3.php.net/manual/en/function.pg-prepare.php

In effect, you set up the query beforehand, pre-parameterizing.



Assuming that PHP is actually using PostgreSQL prepared statements
(and not just faking things behind your back), this should nicely
address the problem of injection attacks.
-- 
(reverse (concatenate 'string "ofni.sesabatadxunil" "@" "enworbbc"))
http://linuxfinances.info/info/linuxdistributions.html
The  average woman would rather   have beauty than  brains because the
average man can see better than he can think.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Unacceptable postgres performance vs. Microsoft sqlserver

2008-04-14 Thread Chris Browne

"[EMAIL PROTECTED]" <[EMAIL PROTECTED]> writes:
> Let me just start off by saying that I *want* to use postgresql.
> That's my goal.  I do not want to use SQLServer.  I'm posting this
> message not to slam postgres, but to ask for someone to help me figure
> out what I'm doing wrong.
>
> I've used postgres for several years as the backend to web
> applications. I'm not entirely new to it, but I've never needed to
> bother with performance tuning.  Netflix is running a contest in which
> they will give you a set of movie ratings and you try to develop an
> algorithm to predict future ratings.  I've imported this data into a
> postgresql database, and the performance is abysmal.  Here's my setup:
>
> Core 2 Quad with 4GB RAM and two SATAII hard drives in RAID 0
>
> I install debian linux and postgresql 8.1 (this is the latest version
> that is available through the debian package manager).  I import the
> Netflix data into a table with the following characteristics:
>
> Create table rating (movieid int, userid int, rating int4, ratingdate
> date)
>
> There are 180 million rows.  I've done the import 3 times, it takes on
> average 815 seconds.  I'm not too worried about that.  Now for a test,
> I run the query, select count(*) from rating;
>
> This takes an average of 172 seconds.  Call it three minutes.
>
> Now, on the same machine, I install windows XP and Microsoft SQL
> server 2000.  The import took 742 seconds.  The count(*) query took 22
> seconds.
>
> 22 seconds.  What's gong on?
>
> Another test.  In postgres I added an index to the userid column and
> then counted distinct userids.  The average run time over three
> queries was 4666 seconds, or 78 minutes.  Unbelievable.
>
> On SQL Server, with *no* index, the same query takes on average 414
> seconds, or about 7 minutes.  Ten times faster!
>
> I'm sure someone will flame me and say that I'm a worthless noob and
> if only I was smart I would know what I'm doing wrong.  So let me just
> repeat: I want to use postgres.  That's my goal.  Please tell me what
> I can do to make the performance of this acceptable.  It's not a
> complicated setup.  One table.  A couple of count(*) queries.

This is a well-understood issue...

Some database systems are able to do some optimization where either:
a) They collect some statistics to answer such queries in O(1) time, or
b) They can use index accesses and only look at an index.

The MVCC strategy in PostgreSQL, which allows it to avoid the need for
readers to block writers, and vice-versa, has the result that running
"count(*)" without any WHERE clause requires a scan across the entire
table.

If the entire purpose of your application is to run COUNT(*) against
the entireties of a table with 180M rows, then PostgreSQL may not be
the right database for your application.

If, on the other hand, this is a poor benchmark of your actual
requirements, then it would be a bad idea to draw any conclusions
based on the performance of "select count(*) from some_table;"
-- 
(format nil "[EMAIL PROTECTED]" "cbbrowne" "linuxfinances.info")
http://www3.sympatico.ca/cbbrowne/oses.html
"If a word in the dictionary  were misspelled, how would we know?"  
-- Steven Wright

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] session_replication_role

2008-04-15 Thread Chris Browne

[EMAIL PROTECTED] (Terry Lee Tucker) writes:
> Is there a distinction between "ORIGIN" and "LOCAL" as related to 
> session_replication_role, and if so, what is it? I am unable to understand 
> from the documentation any distinction between the two settings.

The intent is that a system that is the "origin" for replication
changes (e.g. - a database where you'll be collecting
INSERT/UPDATE/DELETE information to replicate elsewhere) would run in
the "origin" role, normally.

The distinction from a practical perspective will take place when
stored functions that implement "replication stuff" detect what role
the system is in, and may behave differently.
-- 
(format nil "[EMAIL PROTECTED]" "cbbrowne" "acm.org")
http://linuxdatabases.info/info/multiplexor.html
Rules of the Evil Overlord #7. "When I've captured my adversary and he
says, "Look, before  you kill me, will you at least  tell me what this
is all  about?" I'll say, "No."  and shoot him. No,  on second thought
I'll shoot him then say "No."" 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ALTER TABLE DDL Triggers?

2008-04-16 Thread Chris Browne

[EMAIL PROTECTED] ("Richard Broersma") writes:
> I don't believe that DDL Triggers exist, correct?  

That is correct.[1]

The usual point is that you cannot attach triggers to pg_catalog
tables, which would be the "obvious" way of trying to notice DDL
changes.  (e.g. - by having triggers that would fire when pg_class or
pg_attribute are changed)

It would presumably not be _completely_ implausible to run a trigger
when a table was ALTERed; the trouble would be in evaluating the
semantics what OLD.* and NEW.* ought to contain, _if anything_.

If you took the [1] TRUNCATE approach, there isn't _anything_ (in
terms of OLD.*/NEW.* values) passed to the function; in that case, it
is sufficient to have something (like a function parameter) to
identify the table, and thus pass that info downstream (e.g. - with
replication, passing the TRUNCATE on to downstream nodes).  TRUNCATE
is pretty simple; we know well that all it's doing is to get rid of
all the contents of the table at a point in time.

In contrast, the effect of ALTER TABLE is to make near-arbitrary
alterations to pg_class, pg_attribute, and such, and there is, on the
one hand, no obvious semantic of what data to even imagine passing on,
and, on the other, a grand problem of reconstructing the change if you
*did* have access to those underlying tables.  That's usually where
the discussion peters out when people propose DDL triggers.

[1] Or about 99% so.  There is a change committed for 8.4 where
TRUNCATE can fire a trigger.  But it's somewhat disputable whether
TRUNCATE should properly be considered DDL or not.
-- 
let name="cbbrowne" and tld="linuxfinances.info" in name ^ "@" ^ tld;;
http://www3.sympatico.ca/cbbrowne/advocacy.html
When aiming for the common denominator, be prepared for the occasional
division by zero.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] table as log (multiple writers and readers)

2008-04-17 Thread Chris Browne

[EMAIL PROTECTED] (Andrew Sullivan) writes:
> Oh, one other thing
>
> On Thu, Apr 17, 2008 at 12:44:51PM +0800, Craig Ringer wrote:
>  
>> > One way I can think of doing it is to write a seen_log that notes what the
>> > client has already seen with a timestamp of (say) 1 minute.  Then you can
>> > say "go forward from this time excluding ids (ids here)".
>> 
>> It won't work with multiple concurrent writers. There is no guarantee
>> that an INSERT with a timestamp older than the one you just saw isn't
>> waiting to commit.
>
> Yeah, I spoke imprecisely.  The idea is, "Start at timestamp _t_, but don't
> re-process these ones, which I've seen."  The trick is to set your start _t_
> far enough back in time that it's incompatible with your business logic that
> anything could still be pending from then.  This is nasty and prone to bugs,
> but it can be coded up.

The alternative pattern is to put a trigger on the table that collects
the primary key ID in a "queue" table.

If some updates were "still in flight" when the queue processor runs,
then it can catch them next time.

Curiously enough, I have had *two* meetings today where this was germaine ;-).
-- 
select 'cbbrowne' || '@' || 'linuxfinances.info';
http://cbbrowne.com/info/wp.html
Do not worry  about the bullet that  has got your name on  it. It will
hit you and it will kill  you, no questions asked. The rounds to worry
about are the ones marked: TO WHOM IT MAY CONCERN.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] In the belly of the beast (MySQLCon)

2008-04-18 Thread Chris Browne

[EMAIL PROTECTED] ("Joshua D. Drake") writes:
> On Thu, 17 Apr 2008 22:19:23 -0500
> Josh Trutwin <[EMAIL PROTECTED]> wrote:
>  
>> Is your presentation available online at all?
>
> Blogging the bad boy up right now Will be available soon.

The presentation seems pretty good...

... But what is more interesting is what was the reaction?

I note in the blog that the "in place upgrade" issue came up.
(Interesting to observe that it *also* came up pretty prominently in
the intro session at PG East...  This is obviously a matter of Not
Inconsiderable Interest...)  It would be nice to hear what other
things got a reaction.  Of course, some of that may be yet to arrive,
as there are doubtless conversations yet to happen!
-- 
"cbbrowne","@","cbbrowne.com"
http://www3.sympatico.ca/cbbrowne/spiritual.html
FLORIDA: We've been Gored by the bull of politics and we're Bushed.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Slony-I for upgrades - was Re: In the belly of the beast (MySQLCon)

2008-04-18 Thread Chris Browne

[EMAIL PROTECTED] (Alvaro Herrera) writes:
> Geoffrey wrote:
>> Joshua D. Drake wrote:
>>> On Fri, 18 Apr 2008 14:30:18 -0400
>>> Geoffrey <[EMAIL PROTECTED]> wrote:
>>>
 I don't know that slony is the answer.  It was more of a question
 then an answer.  We are hoping to use that solution to migrate from
 7.4.19 to 8.3.1.  The primary reason is to reduce downtime.
>>>
>>> Well :) That is why I asked the questions I did. It very well may be
>>> the answer. It might not. For example, what size is your database? How
>>> many transactions per hour are you doing?
>>
>> We have 13 separate databases, not terribly large, my WAG would be that  
>> our largest database might have 30 million rows.  I couldn't honestly  
>> tell you how many trans/hour we are doing without doing a little 
>> research.
>
> I find it pretty unlikely that Slony would be unable to help you upgrade
> here.  Obviously you can upgrade one database at a time.
>
> Also, mostly it's not the database size what's a concern, but rather the
> size of the largest table.

Indeed.

There is the "big bang" approach where you subscribe to the Whole
Thang in One Swell Foop, and, if it takes 8h to dump data, 12h to load
it into the new DB (which will take place essentially in parallel with
the 8), and 16h to regenerate indices, then that means you've got a
transaction open against the source DB for something like 28h (e.g. -
12h + 16h), with the obvious attendant adverse effects on vacuuming
and such.

However, it is unusual for a database to consist of just one table of
that sort.  If you have a case like this, it will make plenty of sense
to split this set of tables into pieces, and add them in at least
somewhat incrementally.  

In a 28h subscription, there are likely to be 10 tables that are of
"significant size" that could be broken into independent
subscriptions, so that there's not the huge hit of one giant 28h
transaction.
-- 
output = ("cbbrowne" "@" "cbbrowne.com")
http://linuxdatabases.info/info/wp.html
Why isn't phonetic spelled the way it sounds?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] In the belly of the beast (MySQLCon)

2008-04-21 Thread Chris Browne

[EMAIL PROTECTED] (Csaba Nagy) writes:
> On Sun, 2008-04-20 at 11:32 -0600, Scott Marlowe wrote:
>> On Sun, Apr 20, 2008 at 11:12 AM, Scott Ribe <[EMAIL PROTECTED]> wrote:
>> > > I am going to play with this and see where it breaks, but it's going to 
>> > > be
>> >  > an enormous time investment to babysit it.
>
> One more suggestion: if you happen to use the alt_perl tools, be sure to
> uncomment/change the line:
>
> $SYNC_CHECK_INTERVAL = 1000;
>
> In my experience it made a big difference to set that to 6 (meaning
> sync events created once per minute instead of once per second) for the
> synchronizing after copy phase. The number of events generated while the
> copy over is running can be so big that it will never get in sync again
> otherwise...

+1

I'm adding a note about that to the "best practices" in the admin guide.
-- 
select 'cbbrowne' || '@' || 'linuxfinances.info';
http://cbbrowne.com/info/lsf.html
Disco is to music what Etch-A-Sketch is to art.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] pg_bulkloader (was Re: In the belly of the beast (MySQLCon))

2008-04-21 Thread Chris Browne

[EMAIL PROTECTED] ("Scott Marlowe") writes:
> What I keep dreaming of is a process that lets slony use pg_bulkloader
> or something like it to do the initial load...

Does there seem to be some likelihood of some portion of pg_bulkloader
getting added to core?  It sounds like it's worth looking at
pg_bulkloader...
-- 
select 'cbbrowne' || '@' || 'linuxdatabases.info';
http://www3.sympatico.ca/cbbrowne/wp.html
"It has every known bug fix to everything." -- KLH (out of context)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Need to update all my 60 million rows at once without transactional integrity

2008-04-23 Thread Chris Browne

[EMAIL PROTECTED] writes:
> How can I make a Update of a column in a very large table for all
> rows without using the double amount of disc space and without any
> need for atomic operation?

You may need to redefine the problem.

> I have a very large table with about 60 million rows. I sometimes
> need to do a simple update to ALL rows that resets a status-flag to
> zero.

That seems like a pretty useless update to me...

Why not instead redefine the "status-we-need-to-have" to be zero?
[e.g. - don't try to turn off the Sun; instead, define "brightness" as
the standard ;-)]

> I don't need to have transactional integrity (but of course if the system 
> crashes, there should be no data corruption. A separate flag in the file 
> system can well save the fact that that bulk update was in progress) for 
> this, I don't care or want a abort or "all or nothing" like SQL mandates. The 
> idea is basically that either this update succeeds or it succeeds or - there 
> is no "not". It must succeed. It must be tried until it works. If the update 
> is halfway finished, that's okay.
>
> If I just do an 
> UPDATE table SET flag=0;
> then Pg will make a copy of every row which must be cleaned up by vaccuum. I 
> understand - and don't like during this specific problem - that PG is a MVCC 
> database and will keep my old versions of the rows before the update alive. 
> This is very bad.
>
> If I do a batched loop like this:
> UPDATE table SET flag=0 where id>=0 and id <200;
> UPDATE table SET flag=0 where id>=200 and id <400;
> UPDATE table SET flag=0 where id>=400 and id <600;
> ...
>
> then PG will seek all over my harddrive I think.

If I *were* going to do this, I'd want to do:

   update table set flag=0 where flag <> 0 and id in (select id from table 
where flag <> 0 limit 5);

which will go through the table 50K rows at a time.

> It would be much better if it could just start in the beginning of
> the table and work it's way towards the end. But which sort-criteria
> would satisfy this? I don't think that there is any SQL that does
> something like that.

You couldn't guarantee physical ordering anyways, so that seems
futile.

> Another ideas (and I think it's a quite good idea) would be to
> drop the column and recreate it with a new default value.

The "column drop" would be virtually instantaneous; it would
essentially be hidden from view.

But the addition of the column would rewrite the table, doubling its
size :-(.

> And I don't think I should create new columns (the old one would be
> hidden and their internal column ids lost I think) all the time,
> that might have a limit.

That's correct.  It will eventually cause a problem.

> Is there any other way to go?

Step back to what value you're putting into that column, and why.

Perhaps you never need to change this value.

In the Slony-I replication system, we have something sort of similar;
we collect a table of entries that need to get applied to another
database.  (Table called sl_log_1.)  We *NEVER* alter its contents;
that is not necessary to indicate that data has been replicated (which
is the status of interest).

Instead of updating tuples to mark that they are processed, we instead
store information in another table that indicates up to which point in
time we have finished replicating.  (It's more complex and indirect
than that, but nonetheless, it's still a fair characterization...)

So, perhaps you should be storing a per-day value in the table, and
store, somewhere else, what point you're "done up to."
-- 
let name="cbbrowne" and tld="linuxdatabases.info" in name ^ "@" ^ tld;;
http://linuxdatabases.info/info/emacs.html
Last night  I played a  blank tape at  full blast. The mime  next door
went nuts.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Adding notes against fields

2008-04-24 Thread Chris Browne

[EMAIL PROTECTED] (Oliver Helm) writes:

> Hello,
>
>
>
> I was wondering if if is possible to add a note against a field on a 
> postgresql table?  
>
>
>
> For example when running "\d tablename" i would like to have and additional 
> column called 'notes' which i could add to by altering the table.  As the 
> note would
> be field specific, not row specific i would rather not have to do it with an 
> additional column in the table!
>
>
>
> I have tried searching though the documentation, but so far have no found 
> anything.  Can anyone suggest whether this is in fact possible, and point me 
> in the right
> direction?

Have you taken a look at the COMMENT statement?

[EMAIL PROTECTED]:5432=# \h comment   
Command: COMMENT
Description: define or change the comment of an object
Syntax:
COMMENT ON
{
  TABLE object_name |
  COLUMN table_name.column_name |
  AGGREGATE agg_name (agg_type [, ...] ) |
  CAST (sourcetype AS targettype) |
  CONSTRAINT constraint_name ON table_name |
  CONVERSION object_name |
  DATABASE object_name |
  DOMAIN object_name |
  FUNCTION func_name ( [ [ argmode ] [ argname ] argtype [, ...] ] ) |
  INDEX object_name |
  LARGE OBJECT large_object_oid |
  OPERATOR op (leftoperand_type, rightoperand_type) |
  OPERATOR CLASS object_name USING index_method |
  [ PROCEDURAL ] LANGUAGE object_name |
  ROLE object_name |
  RULE rule_name ON table_name |
  SCHEMA object_name |
  SEQUENCE object_name |
  TABLESPACE object_name |
  TRIGGER trigger_name ON table_name |
  TYPE object_name |
  VIEW object_name
} IS 'text'

-- 
let name="cbbrowne" and tld="acm.org" in name ^ "@" ^ tld;;
http://www3.sympatico.ca/cbbrowne/sap.html
"Much of this software was user-friendly, meaning that it was intended
for users who did not know anything about computers, and furthermore
had absolutely no intention whatsoever of learning."
-- A. S. Tanenbaum, "Modern Operating Systems, ch 1.2.4"

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to modify ENUM datatypes?

2008-04-24 Thread Chris Browne

[EMAIL PROTECTED] (Andrew Sullivan) writes:
> On Wed, Apr 23, 2008 at 07:46:07PM -0400, brian wrote:
>> 
>> Absolutely true. Which is odd, because this example is trotted out 
>> whenever there's a thread about ENUMs.
>
> I don't think it's odd at all.  In my view, the people who think enums are a
> good datatype for databases are exactly the sorts who'd think that their
> data is as static as this poor understanding of the vagaries of individuals'
> sex (gender is a different problem, given its association with social roles)
> would suggest.
>
> The world moves around in unexpected ways.  Your data model needs to
> accommodate itself to the world, because the alternative is not going to
> happen.

By the same token, a limited model, at least in this area, frequently
is sufficient to cover the set of ways in which legal systems
recognize and consider gender when constructing
 legislation.

And it's not evident that the simplification is a dramatic
oversimplification that causes a great deal of legal failures to the
extent to which it mandates that every system *needs* to track sex in
a more detailed fashion than (male, female, unknown, n/a).

I'm not sure, for instance, that I actually know what the word
"phenotype" means, and there's reason to imagine I might be somewhat
more "generally literate" than average.  If I'm not sure, there seems
little reason to expect that people with varying levels of
comprehension necessarily be able to choose from a more elaborate set
of options with accuracy.

After spending literal billions of dollars on security efforts, it
doesn't appear that security infrastructures that have *enormous*
incentive have had much luck successfully identifying who is a
terrorist and who is not; they consistently have extraordinary levels
of "Type II" (false positive) reporting errors.
-- 
let name="cbbrowne" and tld="acm.org" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/linuxdistributions.html
"There is no  reason anyone would want a computer  in their home".  
-- Ken Olson, Pres. and founder of Digital Equipment Corp.  1977

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to modify ENUM datatypes?

2008-04-28 Thread Chris Browne

[EMAIL PROTECTED] (Robert Treat) writes:
> I feel like some type of counter-argument is that this is probably longer 
> than one would expect thier database software to last. :-) 

That has the counterargument that if the database software works, it's
likely to get used for longer than one would expect.

I don't think I have ever seen a case where DB-based software got
replaced *earlier* than planned.
-- 
output = reverse("ofni.sesabatadxunil" "@" "enworbbc")
http://cbbrowne.com/info/emacs.html
Rules of the Evil Overlord #172. "I will allow guards to operate under
a flexible  work schedule. That way  if one is feeling  sleepy, he can
call for a replacement, punch out, take a nap, and come back refreshed
and alert to finish out his shift. 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to modify ENUM datatypes?

2008-05-01 Thread Chris Browne

[EMAIL PROTECTED] (Robert Treat) writes:
> On Thursday 01 May 2008 13:40, Tom Lane wrote:
>> 7.4 was released 2003-11-17, so I think that it will very likely get
>> obsoleted at the end of 2008.
>>
> If that's the case, it'd be nice to get an official statement of that now. :-)

People have been making noises suggesting the idea already; I expect
that the flurry of counterarguments will start diminishing at that
point.
-- 
let name="cbbrowne" and tld="linuxdatabases.info" in String.concat "@" 
[name;tld];;
http://www3.sympatico.ca/cbbrowne/sap.html
"My dog appears to require more PM than my car, although he also seems
to be cheaper to service." -- GSB

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] choiche of function language was: Re: dynamic procedure call

2008-05-12 Thread Chris Browne

[EMAIL PROTECTED] (Ivan Sergio Borgonovo) writes:
> On Sat, 10 May 2008 07:35:36 +0200
> "Pavel Stehule" <[EMAIL PROTECTED]> wrote:
>
>> your application different execution paths. Generally I can say, so
>> plpgsql isn't well language for this games, and better is using
>> plperl, plpython or other external language.
>
> It is clear what is at least one of the advantage of plpython or
> plperl over plpgsql, but then what are the advantages of plpgsql over
> the rest of the crowd other than resembling the language used in
> Oracle?

Well, plpgsql has the merit that its operations and control structures
are directly oriented towards "database stuff," so there's no need to
(for instance) invoke functions (e.g. - running queries via
spi_exec(), spi_query(), ...)  in order to perform database
operations.

One of the salutory effects is that there is a pl/pgsql debugger that
can automatically handle things like single-stepping, and it does not
need to be aware of any Perl/Python internals.
-- 
"cbbrowne","@","linuxfinances.info"
http://linuxfinances.info/info/sap.html
"...you  might  as well  skip  the  Xmas  celebration completely,  and
instead  sit  in  front  of  your  linux  computer  playing  with  the
all-new-and-improved linux kernel version." -- Linus Torvalds

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PG -v- MySQL

2008-05-13 Thread Chris Browne

[EMAIL PROTECTED] ("Joshua D. Drake") writes:
> Gauthier, Dave wrote:
>> Ive been invited to participate in a forum which discusses
>> technical issues for relational DBs used in the corporation.  The 3
>> DBs they are discussing are Oracle, SQL-Server and MySQL.  Id like
>> to introduce PG, but want to be able to address the Why not use
>> MySQL questions when they arise.  I know in the past there have
>> been many comparisons between the 2 DBs but would like to know if
>> there are any good, recent ones.  Also, any off-the-cuff comments
>> would be welcome, especially if you have used both DBs.
>
> I suggest you take a look at:
>
> http://www.scribd.com/doc/2575733/The-future-of-MySQL-The-Project
> http://sql-info.de/mysql/gotchas.html
> http://www.bytebot.net/blog/archives/2008/04/17/what-mysql-can-learn-from-postgresql
> http://www.commandprompt.com/files/mysql_learn.pdf
> http://people.planetpostgresql.org/mha/index.php?/archives/168-guid.html
> http://archives.postgresql.org/pgsql-hackers/2004-04/msg00992.php

I would add to that document list the following...

http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL:_Comparing_Reliability_and_Speed_in_2007
-- 
let name="cbbrowne" and tld="linuxdatabases.info" in name ^ "@" ^ tld;;
http://linuxdatabases.info/info/internet.html
If anyone ever  markets  a really  well-documented Unix that   doesn't
require  babysitting by a phalanx of  provincial Unix clones, there'll
be a  lot of unemployable,  twinky-braindamaged misfits out deservedly
pounding the pavements.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Open Source CRM - Options?

2008-05-28 Thread Chris Browne

[EMAIL PROTECTED] (Kaloyan Iliev) writes:
> And what about RT (Request Tracker - http://bestpractical.com/rt/)
> .
> AFAIK it is free and open-source, uses Postgres and is easy to setup.

RT has a very different purpose; it was designed to track work (e.g. -
"work tickets"), as opposed to managing web site content.

It *might* be used as a bug tracker, though with a considerably
different flavour from (say) Bugzilla; as a CRM, it would be pretty
unsuitable :-(.
-- 
output = reverse("moc.enworbbc" "@" "enworbbc")
http://cbbrowne.com/info/lisp.html
"Listen,  strange women, lyin'  in ponds,  distributin' swords,  is no
basis  for a  system of  government. Supreme  executive  power derives
itself from a mandate from  the masses, not from some farcical aquatic
ceremony."  -- Monty Python and the Holy Grail

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] New MS patent: sounds like PG db rules

2008-05-29 Thread Chris Browne

[EMAIL PROTECTED] (Martijn van Oosterhout) writes:
> On Fri, May 30, 2008 at 03:07:17AM +0930, Shane Ambler wrote:
>> Exactly. The real problem is that the first one to apply for a patent
>> gets it. It really doesn't matter who invents it. If we have patents
>> that cover our work then we can control who uses it and for what
>> purpose, also preventing others from patenting our ideas and stopping us
>> from using them.
>
> There are places that offer cheap alternatives which are not patents
> but more "declarations of prior art". The point being not so much that
> you get a patent but that you prevent others from getting one on the
> same thing. As in the patent office will actually use it when
> determining prior art, rather than just ignoring anything on internet.
>
> Cheaper, but still not cheap

My understanding is that this is one of the reasons for existence of
the _IBM Systems Journal_; IBM occasionally discovers things that, for
one reason or another, they do not wish to patent, but by publishing
such things in a published journal, that provides a well-documented
source of "prior art."

http://www.research.ibm.com/journal/sj/
-- 
"cbbrowne","@","acm.org"
http://cbbrowne.com/info/rdbms.html
Hail to the sun god, he sure is a fun god, Ra, Ra, Ra!! 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PostgreSQL and AMD?

2008-06-17 Thread Chris Browne

[EMAIL PROTECTED] ("John Tregea") writes:
> I have been asked to specify a pair of HP PC's to set up a
> PostGreSQL server (and backup) for a client. The HP model we are
> looking at has an AMD Phenomâ,,¢ Quad Core Processor (9600B). The
> machines would be running Windows XP Pro (our clients
> requirement). Can anyone tell me if PostgreSQL runs fine on the AMD
> platform and specifically does anyone have experience with the AMD
> Phenomâ,,¢ Quad Core Processors 9600B.

Get thee to the Build Farm.
http://pgbuildfarm.org/

In particular, PostgreSQL most certainly "runs fine on the AMD
platform;" the following nodes each ran full regression tests within
the last week.

http://pgbuildfarm.org/cgi-bin/show_status.pl?member=beaver&member=beluga&member=boodie&member=bustard&member=caribou&member=chamois&member=chinchilla&member=clownfish&member=codlin_moth&member=dove&member=dugong&member=dungbeetle&member=eland&member=emperor_moth&member=ermine&member=fennec&member=finch&member=ghost_moth&member=heron&member=impala&member=kite&member=leveret&member=luna_moth&member=ostrich&member=panda&member=platypus&member=rhea&member=rosella&member=Shad&member=shark&member=tapir&member=viper&member=viscacha&member=wasp&member=wolf&member=zebra

Mind you, not one of them are running Windows XP Pro.
-- 
output = ("cbbrowne" "@" "cbbrowne.com")
http://www3.sympatico.ca/cbbrowne/finances.html
Rules of  the Evil Overlord #76.  "If the hero  runs up to my  roof, I
will not run up after him and  struggle with him in an attempt to push
him  over the  edge. I  will also  not  engage him  at the  edge of  a
cliff. (In the middle of a  rope-bridge over a river of molten lava is
not even worth considering.)" 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Probably been asked a hundred times before.

2008-06-25 Thread Chris Browne

[EMAIL PROTECTED] (David Siebert) writes:
> Well I am kind of stuck using OpenSuse. Not a bad distro and is the one
> we use in our office for production work.
> I like CentOS myself for database work and tend to use that for test
> systems here since I manage them myself.
> I was more wondering if someone had made a Postgres centric distro yet.
> Sort of FreeNAS, OpenFiler, or what ever the Asterisk distro is called
> these days.
> Seems like you could build a nice little distro that was database
> centric. Maybe use FreeBSD, Solaris, or Centos as the base.
> Sort of a plug and play solution.

A "pretty minimalist" approach would be...

- Install Debian base (~20MB of "install")

- Figure out packages needed for PostgreSQL
  PKGS="postgresql-client-8.3 postgresql-8.3"

- Libraries, and such
  PKGS="${PKGS} libpq5 libdbd-pg-perl"

- Some tools
  PKGS="${PKGS} pgadmin3 pgadmin3-data"

- Some useful 3rd party bits
  PKGS="${PKGS} cfengine2 ntp ssh vim"

Then install that...

$ apt-get install ${PKGS}

That's going to draw in some dependancies, but is still quite, quite
minimal, moreso than anything that wasn't *expressly* customized for
the purpose.  That will, for instance, be *way* smaller than Centos.

You could do much the same using ports/openpkg on FreeBSD.
-- 
output = reverse("gro.mca" "@" "enworbbc")
http://cbbrowne.com/info/finances.html
"Computers are  like air conditioners:  They stop working  properly if
you open windows."

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Schema, databse, or tables in different system folder

2009-06-02 Thread Chris Browne

"Carlos Oliva"  writes:
> Is there a way to create a database or a table of a database in its own 
> folder?  We are looking for ways to backup the sytem files of the database 
> to tape and one to exclude some tables from this backup.  We can selectively 
> backup folders of the file system so we figure that if we can create a 
> schema or database or table in its own folder, we can backup our database 
> and exclude the tables selectively.  We are using Linux RedHat.  Thank you.

What you are describing is the use of tablespaces, which are documented here:
  http://www.postgresql.org/docs/8.3/static/manage-ag-tablespaces.html

There is, however, a severe problem with your backup plans, namely
that an attempt to selectively include/exclude tables in a physical
"file backup" is more or less certain to result in a totally
unrecoverable database.

Quoting from the section on filesystem level backup:

  "If you have dug into the details of the file system layout of the
  database, you might be tempted to try to back up or restore only
  certain individual tables or databases from their respective files
  or directories. This will not work because the information contained
  in these files contains only half the truth. The other half is in
  the commit log files pg_clog/*, which contain the commit status of
  all transactions. A table file is only usable with this
  information. Of course it is also impossible to restore only a table
  and the associated pg_clog data because that would render all other
  tables in the database cluster useless. So file system backups only
  work for complete backup and restoration of an entire database
  cluster."

Let me reiterate that last sentence:

  So file system backups only work for complete backup and restoration
  of an entire database cluster.

Your attempt to selectively backup specific directories will render
the backup effectively useless.
-- 
"cbbrowne","@","linuxdatabases.info"
http://linuxdatabases.info/info/unix.html
"Microsoft   builds   product  loyalty   on   the   part  of   network
administrators and consultants, [these are] the only people who really
count  in the  Microsoft scheme  of  things. Users  are an  expendable
commodity."  -- Mitch Stone 1997

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Schema, databse, or tables in different system folder

2009-06-02 Thread Chris Browne

"Carlos Oliva"  writes:
> Is there a way to create a database or a table of a database in its own 
> folder?  We are looking for ways to backup the sytem files of the database 
> to tape and one to exclude some tables from this backup.  We can selectively 
> backup folders of the file system so we figure that if we can create a 
> schema or database or table in its own folder, we can backup our database 
> and exclude the tables selectively.  We are using Linux RedHat.  Thank you.

If you want to backup specific tables, then you should make use of
pg_dump, which, as of 8.3, has the ability to:
 a) Dump named tables, via the -t option, or
 b) exclude named tables, via the -T option

That is the appropriate way to selectively backup portions of the
database.
-- 
let name="cbbrowne" and tld="linuxfinances.info" in String.concat "@" 
[name;tld];;
http://www3.sympatico.ca/cbbrowne/wp.html
--Despite Pending :Alarm--

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Schema, databse, or tables in different system folder

2009-06-03 Thread Chris Browne

"Carlos Oliva"  writes:
> Would the backup be unrecoverable if I shutdown the databse first?

If the backup includes pg_xlog and pg_clog, as well as all of the
database metadata files, then whatever portions *are* included are
likely to be somewhat usable.

The portions not included in the backup will obviously not be usable.

But supposing your backup does not include *all* metadata (e.g. -
pg_catalog), pg_xlog and pg_clog, then it is unlikely that you'll be
able to recover *anything at all* from it.

The intent is that filesystem backups only work for complete backup
and restoration of an entire database cluster.

Backing up *part* of your filesystem is Not Recommended in the
documentation.  If you set a policy of doing so, you have to accept
that you are doing this contrary to documented recommendations, and at
considerable peril to your data.
-- 
(reverse (concatenate 'string "ofni.secnanifxunil" "@" "enworbbc"))
http://cbbrowne.com/info/multiplexor.html
Rules of the Evil Overlord #25.  "No matter how well it would perform,
I  will never  construct any  sort  of machinery  which is  completely
indestructible  except  for   one  small  and  virtually  inaccessible
vulnerable spot." 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Can postgresql store its data on raw device now?

2009-06-09 Thread Chris Browne

lizz...@gmail.com (Lizzy M) writes:
>   I have an problem: can postgresql store its data on the raw disks now?
>
>   I have checked the mail list and manual, but havent found the
> answer. In some early mails, they mentioned pg didnt support this
> character. But how about now? Raw disk may reduce the risks brought by
> file system, so I want to use it to store my data.

No, this has not changed, and if anything, it is *less* likely to
change than ever.

Twenty years ago, filesystems were sufficiently fragile, unreliable
and slow that DBMS vendors found it worthwhile to avoid the risks by,
in effect, implementing their own filesystems as part of the DBMS.

- Since then, OS vendors have found it to be in their interests to
  improve both the robustness and performance of their filesystems,
  which has substantially diminished the would-be benefits.

- Compound this with the factor that implementing your own filesystem
  adds substantially to the complexity of managing and deploying the
  DBMS.

The two really interesting filesystems these days are ZFS and OCFS:
  http://en.wikipedia.org/wiki/ZFS
  http://en.wikipedia.org/wiki/OCFS

It seems much more useful to allow the OS vendors to improve their
filesystems, which helps us, than to use raw partitions, which
requires:

 a) Creating a portable framework for OS-agnostic access to raw
partitions

(because Linux != FreeBSD != Solaris != Windows NT != AIX != ...)

 b) Implementing our own filesystem atop raw partitions

 c) Implementing our own filesystem management tools (e.g. - like cp,
rm, ..., since our OS tools won't work)

As an aggregate, this would add an extraordinary amount of work, and
with the improvements that have taken place lately, it is not evident
that this would buy us *any* of:
 a) Improved performance
 b) Improved reliability

Indeed, I think we could reasonably expect for there to be a
significant period of time during which performance and reliability
would be *degraded* by such an effort.
-- 
let name="cbbrowne" and tld="linuxdatabases.info" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/rdbms.html
"I'm guilty of a lot of things, but I didn't ever do that.''  
-- Bill Clinton, on why he had astroturf lining the back of his pickup
   truck
[In fact, he _DID_ do this, thus making life creepier than fiction...]

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

1 2 3 4 >

1 - 100 of 309 matches

Mail list logo