from:"Ron Mayer"

Re: [GENERAL] Integrity on large sites

2007-05-30 Thread Ron Mayer

Tom Lane wrote:
> Dave Page <[EMAIL PROTECTED]> writes:
>> I can't imagine Flickr or Slashdot ...
> 
> I'm pretty sure I remember reading that Slashdot had to put enormous
> amounts of cacheing in front of their DB to keep it from falling over
> on a regular basis.

Yes, slashdot and flickr both use memcached:

http://lists.danga.com/pipermail/memcached/2005-November/001726.html
"...we use both memcached and squid at flickr. ..."

http://lists.danga.com/pipermail/memcached/2003-July/17.html
"...Slashdot's been using memcached to serve its comment text"

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Vacuum DB in Postgres Vs similar concept in other RDBMS

2007-05-30 Thread Ron Mayer

Tom Lane wrote:
> 
> Exactly.  VACUUM sucks (ahem) in all ways but one: it pushes the
> maintenance costs associated with MVCC out of the foreground query code
> paths and into an asynchronous cleanup task.  AFAIK we are the only DBMS
> that does it that way.  Personally I believe it's a fundamentally
> superior approach --- because when you are under peak load you can defer
> the cleanup work --- but you do need to pay attention to make sure that
> the async cleanup isn't postponed too long.  We're still fooling around
> with autovacuum and related tuning issues to make it work painlessly...
> 

Should this paragraph be added to the FAQ here?
http://www.postgresql.org/docs/faqs.FAQ.html

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Faster data type for one-length values

2007-05-30 Thread Ron Mayer

Alvaro Herrera wrote:
> André Volpato wrote:
> 
>> The ammount of space saved seems pretty clear to me.
> 
> Yeah, zero most of the time due to alignment.

So trading off more I/O for less CPU?

I wonder if for any I/O bound database servers
it might be worth packing tightly rather than
aligning indexes on one-byte data.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Linux distro

2007-08-02 Thread Ron Mayer

Chris Browne wrote:
> 
> The server does not need the overhead of having *any* of the "X
> desktop" things running; it doesn't even need an X server.
> 
> You don't need X running on the server in order use those "enterprise
> management" tools; indeed, in a "lights out" environment, that server
> hasn't even got a graphics card, which means that an X server *can't*
> be running on it.

Well, sure it can.   Nothing says a X server has to write directly
to a graphics card or anything.   It could write to some frame
buffer in memory and allow access to it through VNC, for example.
http://www.cl.cam.ac.uk/research/dtg/attarchive/vnc/xvnc.html

But of course I agree it makes no sense for a database server
to also be running a GUI server; and ideally not even wasting
electricity and space in the box for a graphics chip or monitor
connector.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] tsearch2: plainto_tsquery() with OR?

2007-08-13 Thread Ron Mayer

Tom Lane wrote:
> Oleg Bartunov <[EMAIL PROTECTED]> writes:
>> On Wed, 8 Aug 2007, cluster wrote:
>>> Does anyone know where I can request an OR-version of plainto_tsquery()?
> 
>> plainto_tsquery expects plain text, use to_tsquery for boolean operators.
> 
> Are either of these definitions really right?  If I type "foo bar baz"
> into Google, for instance, it seems to produce some sort of weighted
> result, neither a strict AND nor a strict OR.  Google didn't get where
> they are by misjudging what the simplest search behavior should be like.

For what it's worth, Google states [1]

   Automatic "and" queries

   By default, Google only returns pages that include
   all of your search terms. There is no need to
   include "and" between terms. Keep in mind that
   the order in which the terms are typed will affect
   the search results. To restrict a search further,
   just include more terms. For example, to plan a
   vacation to Hawaii, simply type vacation hawaii.

and also describes "OPERATOR EXAMPLE...vacation hawaii"
as "FINDS PAGES CONTAINING...the words vacation and Hawaii".

If I'm not mistaken, it sounds the same as what tsearch describes.

[1] http://www.google.com/intl/en/help/basics.html#and
[2] http://www.google.com/intl/en/help/cheatsheet.html

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Transactional DDL

2007-08-18 Thread Ron Mayer

Scott Marlowe wrote:
> On 8/14/07, Harpreet Dhaliwal <[EMAIL PROTECTED]> wrote:
>> Hi,
>> I read a few lines about SP compilation in postgres
>>
>> http://searchoracle.techtarget.com/originalContent/0,289142,sid41_gci1179016,00.html
>>
>> 1. stored procedure compilation is transactional.
>> "You can recompile a stored procedure on a live system, and only
>> transactions starting after that compilation will see the changes," he said.
>> "Transactions in process can complete with the old version. Oracle just
>> blocks on the busy procedure."

Really?

When I tried it [1] - changing a function definition during the
middle of a long-running-query that used the function gave
me the surprising result that some rows were processed using
the old definition of the function and some with the new one.

The explanation from Tom [2] was that there was some good
reason function lookups used SnapshotNow.

  Ron

[1] http://archives.postgresql.org/pgsql-bugs/2005-04/msg00163.php

  I have a long query something like

select slow_function(col) from large_table;

  and half way through the query, in a separate connection, I

CREATE OR REPLACE slow_function 

  I was surprised to see that some of the rows in my select
  were processed by the old definition and some by the new.

[2] http://archives.postgresql.org/pgsql-bugs/2005-04/msg00179.php

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Interpreting statistics collector output

2007-08-18 Thread Ron Mayer

Decibel! wrote:
> On Aug 15, 2007, at 2:11 PM, Gregory Stark wrote:
>> "Decibel!" <[EMAIL PROTECTED]> writes:
>>> On Wed, Aug 15, 2007 at 01:26:02PM -0400, Steve Madsen wrote:
 On Aug 15, 2007, at 11:52 AM, Decibel! wrote:
> I can't really think of a case where a seqscan wouldn't return all the
> rows in the table... that's what it's meant to do.
>>
>> LIMIT
> 
> Ok, you got me. :P But normally you wouldn't do a LIMIT without some
> kind of an ORDER BY, which would mean scanning the whole table.


Seems "where exists" does it too, no?



test=#  explain analyze select 1 where exists (select * from bigtbl);
QUERY PLAN
--
 Result  (cost=9681.61..9681.62 rows=1 width=0) (actual time=0.033..0.034 
rows=1 loops=1)
   One-Time Filter: $0
   InitPlan
 ->  Seq Scan on bigtbl  (cost=0.00..9681.61 rows=140461 width=443) (actual 
time=0.027..0.027 rows=1 loops=1)
 Total runtime: 0.177 ms
(5 rows)

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] language interface in postgresql

2007-08-18 Thread Ron Mayer

David Fetter wrote:
>> Dollar-quoting is a cute technical solution to that, but you can't
>> deny that it's simpler if you just restrict the function language to
>> be SQL-ish so that CREATE FUNCTION can parse it without any
>> interesting quoting rules.  So sayeth Oracle and the SQL standards
>> committee, anyway.
> 
> I think the aforementioned entities got it wrong where we got it right :)
> 

Though Oracle says[1] 'Java-based stored procedures are a standard (ANSI 
SQL-1999
SQLJ-Part-1 a.k.a. "SQL routines using Java") database-independent alternative
to proprietary procedural extensions to SQL'; so I guess they did both think
of some flexibility in that regard, no?

[1] 
http://www.oracle.com/technology/sample_code/tech/java/jsp/samples/jwcache/Abstract.html

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Enterprise Wide Deployment

2007-08-20 Thread Ron Mayer

Scott Marlowe wrote:
> On 8/14/07, john_sm <[EMAIL PROTECTED]> wrote:
>> Hey guys, for an enterprise wide deployment, what will you suggest and why
>> among - Red Hat Linux, Suse Linux and Ubuntu Linux, also, do you think, we
>> can negotiate the support pricing down?
> 
> It's more about your skill set and familiarity than whether any one of
> those will work well enough.

One more consideration - consider using one of the distros supported
by your hardware vendor for your model server.   There's nothing more
frustrating than having hardware tech support point fingers at a software
vendor and software tech support point fingers at the hardware vendor.

HP supports Debian, Red Hat, and Novell, at least on servers.  I'm
not sure what they'd do if you call them about Ubuntu or CentOS.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Autovacuum not vacuuming pg_largeobject

2007-08-30 Thread Ron Mayer

Denis Gasparin wrote:
>> Yeah, you're wrong.  The difference is that plain vacuum does not try
>> very hard to reduce the length of a table file --- it just frees up
>> space within the file for reuse.  vacuum full will actually move things
>> from the end of the file to free space nearer the head of the file,
>> so that it can shorten the file.
>>
>> What I suspect the above observations really prove is you don't have
>> max_fsm_pages set high enough, and so pg_largeobject was bloating because
>> the free space was being forgotten instead of reused.
>>
>>   
> I tried to issue the vacuum command on one of my database and i got the
> following log messages:
> 
> LOG:  max_fsm_relations(1000) equals the number of relations checked
> HINT:  You have at least 1000 relations.  Consider increasing the
> configuration parameter
> 
> I suspect I must increase max_fsm_relations.
> The value of max_fsm_pages is 2 (the default value).
> I suspect I must change this to a higher value... but how high should
> this value be?


First increase max_fsm_relations to more than the number of tables,etc
in your system.   Then, after it's running a while, do the same vacuum
command and see if it gives you a hint to increase that parameter too.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] an other provokative question??

2007-09-06 Thread Ron Mayer

Dann Corbit wrote:
> All of the database systems
> that I know of that use this column-oriented scheme are in-memory
> database systems.  I don't know if Mr. Stonebraker's is also.

KDB+ (http://kx.com/) is column-oriented and has both on-disk
and in-memory capabilities http://kx.com/faq/#6 .  It's around
since 1998 and both column and row oriented databases are still
around so I think it'd be more fair to say both are mature technologies.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] an other provokative question??

2007-09-12 Thread Ron Mayer

Greg Sabino Mullane wrote:
>> ...in favor of renaming the database "Horizontica".
> 
> ...should definitely be "HorizonticaSQL"

Surely that should be capitalized "HorizonticASQL", no.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] select random order by random

2007-11-02 Thread Ron Mayer

Chris Browne wrote:
> If I replicate your query, with extra columns, AND NAMES, I get the following:
> 
> [EMAIL PROTECTED]:5433=# select random() as r1, random() as r2, random() as 
> r3 from generate_series(1,10) order by random();
>  r1 | r2 |r3 
> ++---
>  0.0288224648684263 |  0.904462072532624 |  0.27792159980163
>   0.144174488261342 |  0.406729203648865 | 0.452183415647596
>  ...
> It is indeed somewhat curious that the query parser chose to interpret
> that the "order by random()" was referring to column #1.

And even more curiously, IMHO, even specifying
column names isn't enough.  Note that this:

li=# select * from (select (random()*10)::int as a, (random()*10)::int as b 
from generate_series(1,10) order by a) as x order by b;
 a | b
---+
 0 |  8
 1 | 10
 3 |  4
 4 |  8
 5 |  1
 5 |  9
 6 |  4
 6 |  5
 8 |  4
 9 |  0
(10 rows)

is sorted by "a" even though the outermost "order by"
clause explicitly said to order by "b".

Seems like it's a known odd behavior ...
http://archives.postgresql.org/pgsql-general/2006-11/msg01523.php
http://archives.postgresql.org/pgsql-general/2006-11/msg01539.php
http://archives.postgresql.org/pgsql-general/2006-11/msg01544.php

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Primary Key

2007-11-26 Thread Ron Mayer

Joshua D. Drake wrote:
> On Mon, 26 Nov 2007 10:28:03 -0800 (PST)
> Richard Broersma Jr <[EMAIL PROTECTED]> wrote:
>> --- On Mon, 11/26/07, Joshua D. Drake <[EMAIL PROTECTED]> wrote:
>>> In "theory" the item that would be a natural key
>>> in this instance is the VIN. 

And you then need to deal with cars that have mismatched
Body VIN numbers and Engine VIN numbers?  It's not uncommon
on older cars or crash damaged cars.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] top posting

2007-12-12 Thread Ron Mayer

Gregory Stark wrote:
> We're not goldfish, we can remember the topic of discussion for at least a few
> hours.

So can Goldfish.  Apparently they have a 3-month+ memory.

http://nootropics.com/intelligence/smartfish.html
http://en.wikipedia.org/wiki/MythBusters_(season_1)#Goldfish_Memory

With a memory like that, perhaps a goldfish should replace the elephant for the
mascot, or did the mysql guys already take that one.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Your opinion about PostgreSQL?

2007-12-20 Thread Ron Mayer

Cindy Parker wrote:
> ...choice of Oracle or PostgreSQL for the back-end...
> Can you help me compare PostgreSQL to SQL Server and/or Oracle?  Do
> you know of any websites or blogs that discuss these issues? ...
> I did look at http://sql-info.de/postgresql/postgres-gotchas.html, an
> excellent page.  Any others like this that you could recommend would
> be helpful.

I think this is one of the more useful database comparison pages.

  http://troels.arvin.dk/db/rdbms/

It compares all the major players (Oracle, MS, My, PostgreSQL) not
just against each other but against the SQL standards as well.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] count(*) and bad design was: Experiences with extensibility

2008-01-10 Thread Ron Mayer

Chris Browne wrote:
> [EMAIL PROTECTED] (Zoltan Boszormenyi) writes:
>> SELECT COUNT(*)
> [Waving hands for a moment]

Would what Chris describes below be a good candidate for
a pgfoundry project that has functions that'll create the
triggers for you?  (yeah, I might be volunteering, but would
undoubtedly need help)

Then when people ask it again and again the response could
be "install http://pgfoundry.org/fast_count_star"; rather
than "go read the mailing list archives and roll
your own - but remember to worry about deadlock and
contention on the table containing the counts".


> What I would do *instead* would be for each INSERT to add a tuple with
> a count of 1, and for each DELETE to add a tuple with count of -1, and
> then to periodically have a single process walk through to summarize
> the table.  There may be a further optimization to be had by doing a
> per-statement trigger that counts the number of INSERTs/DELETEs done,
> so that inserting 30 tuples (in the table being tracked) leads to
> adding a single tuple with count of 30 in the summary table.
> 
> That changes the tradeoffs, again...
> 
>  - Since each INSERT/DELETE is simply doing an INSERT into the summary
>table, the ongoing activity is *never* blocking anything
> 
>  - You get the count by requesting 
> SELECT SUM(rec_cnt) as record_count from rec_count where tablename = 
> 'foo';
> 
>  - Note that the query is MVCC-consistent with the table!
> 
>  - Once in a while, you'll want to run a single procedure that, for
>each table, deletes all the existing records, and replaces them
>with a single one consisting of the sum of the individual values.
> 
>  - You can re-sync a table by running the query:
>  begin;
> delete from record_count where tablename = 'foo';
> insert into record_count(tablename, rec_cnt) select 'foo', (select 
> count(*) from foo);
>  commit;

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Postgres and MySQL Rosetta stone??

2008-01-10 Thread Ron Mayer

Scott Marlowe wrote:
>> In particular, MySQl seems to have richer string functions to parse
>> out sub strings and als trim a string for automatic table insertion
>> from long multifield strings.
> 
> Have you read the postgresql manual on string functions?  Seriously,
> it's easily a match for MySQL in that arena, and beats it hands down
> in a few.

And if you need more - like extracting Noun Clauses from sentences
you can do that with PostgreSQL and Pl/Perl using natural language
processing libraries that are callable from perl.

I'd be interested to know how MySQL would handle that kind
of string processing function.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Understanding PostgreSQL Storage Engines

2010-10-13 Thread Ron Mayer

Pavel Stehule wrote:
> 2010/10/8 Carlos Mennens :
>> I know that MySQL uses MyISAM storage engine by default... what
>> storage engine does PostgreSQL use by default ...
> 
> PostgreSQL supports and uses just only one storage engine - PostgreSQL.

That said, ISTM one of Postgres's bigger strengths commercially seems
to be that vendors can reasonably easily plug in different storage engines.

Isn't the biggest SQL database in the world basically postgres using a
non-default different storage engine[note 1 below]?  Heck, companies have
even made FPGA/hardware-accelerated storage engines for postgres[2].
Bigger IT companies than Oracle have sold PostgreSQL using
different storage engines[3].

Couldn't one almost say that one of the big differences between
MySQL and Postgres is that MySQL only offers a couple storage
engines, while Postgres has many vendors offering engines?

[ 1/2 :-) ]

   Ron

[1] 
http://www.computerworld.com/s/article/9087918/Size_matters_Yahoo_claims_2_petabyte_database_is_world_s_biggest_busiest
"Yahoo brought the database in-house and continued to enhance
it, including tighter data compression, more parallel data
processing and more optimized queries. The top layer remains PostgreSQL"

[2] http://www.dbms2.com/2007/09/27/the-netezza-developer-network/
"My understanding is that they started with PostgreSQL and then
rewrote the back-end to embed in the FPGA. Query processing on
a SPU is split between the general purpose CPU and the FPGA,
with the latter mostly responsible for restricting rows and
projecting columns."

[3] http://www.fujitsu.com/downloads/MAG/vol40-1/paper15.pdf
"Fujitsu loaded the storage management mechanism of Symfoware
Server into PostgreSQL. "

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Why facebook used mysql ?

2010-11-14 Thread Ron Mayer

Lincoln Yeoh wrote:
> What's more important to such companies is the ability to scale over
> multiple machines.

That question - how much work it is to administer thousands of database
servers - seems to have been largely missing from this conversation.

Apparently back in 2008, Facebook had 1800 MySQL servers with 2 DBAs.[1]

I wonder how that compares with large-scale Postgres deployments.

  Ron

[1] http://perspectives.mvdirona.com/2008/04/22/1800MySQLServersWithTwoDBAs.aspx

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Survey on backing up unlogged tables: help us with PostgreSQL development!

2010-11-19 Thread Ron Mayer

Glen Parker wrote:
> As was already mentioned, application logs.  Unlogged tables would be
> perfect for that, provided they don't go *poof* every now and then for
> no good reason.  Nobody's going to be too heart broken if a handful of
> log records go missing, or get garbled, after a server crash or power
> outage.  Delete 'em all after every restart though, and that's a problem.

How often are you doing unintentional restarts?

I'd guess for many people it's "whenever I had so many backend crashes
that I get motivated to check if I'm running the latest minor release".

And if it's an intentional restart - surely you could archive your
application logs before doing the restart, no?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Survey on backing up unlogged tables: help us with PostgreSQL development!

2010-11-20 Thread Ron Mayer

Josh Berkus wrote:
>> With the current patches, the data survives a restart just fine.
> 
> Per -hackers, that's not guarenteed.

"Not guaranteed" is fine. What people are asking for is "often survives".

AFAIK we don't truncate the log file created by the log_filename GUC
on every unclean crash and every clean shutdown.

Should we? :-)

Why not?

For people who intend to use these tables to log application data,
they'd have the exact same reasons for not wanting them truncated
when they don't need to be.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres DOD Certification Common Criteria Level

2011-01-02 Thread Ron Mayer

Eric McDonald wrote:
> Greetings All:
> 
> Does anyone here have any insight on to what EAL level Postgres is at
> for DOD/Military installations?  I see that there's an SE-Linux
> fortified version on the Wiki, but no certifications are listed in the
> contents.
> 
> Any direction to certifications, STIG, or otherwise would be greatly
> appreciated--

Well, there's an (ancient) 8.1.5 which NTT got certified at EAL1
back in 07.

You can go here: http://www.commoncriteriaportal.org/products/
and expand "Databases" to see it.

It seems like there are some proprietary forks on the list
as well, at much higher levels (EAL4+); but I guess these
forks have diverged quite a bit.

I guess I'd be somewhat surprised to see the community
version on the list, since Wikipedia claims that getting
such certifications cost millions even back in the 90's.
http://en.wikipedia.org/wiki/Evaluation_Assurance_Level

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] arrays of floating point numbers / linear algebra operations into the DB

2008-02-01 Thread Ron Mayer

Webb Sprague wrote:
> On Feb 1, 2008 2:31 AM, Enrico Sirola <[EMAIL PROTECTED]> wrote:
>> I'd like to perform linear algebra operations on float4/8 arrays...
> 
> If there were a coherently designed, simple, and fast LAPACK/ MATLAB
> style library and set of  datatypes for matrices and vectors in
> Postgres, I think that would be a HUGE plus for the project!

I'd also be very excited about this project.

Especially if some GIST or similar index could efficiently search
for vectors "close" to other vectors.

I assume something like "within a n-dimensional bounding box"
would be possible with GIST

I'd be eager to help, test, debug, etc; but probably aren't qualified
to take the lead on such a project.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] arrays of floating point numbers / linear algebra operations into the DB

2008-02-01 Thread Ron Mayer

Webb Sprague wrote:
> On Feb 1, 2008 12:19 PM, Ron Mayer <[EMAIL PROTECTED]> wrote:
>> Webb Sprague wrote:
>>> On Feb 1, 2008 2:31 AM, Enrico Sirola <[EMAIL PROTECTED]> wrote:
>>>> ...linear algebra ...
>>> ... matrices and vectors .
>> ...Especially if some GIST or similar index could efficiently search
>> for vectors "close" to other vectors...
> 
> Hmm.  If I get some more interest on this list (I need just one LAPACK
> / BLAS hacker...), I will apply for a pgFoundry project and appoint
> myself head of the peanut gallery...

I think you should start one.  I'd be happy to help.

I'm rather proficient in C; somewhat literate about postgres' GIST
stuff (I think a couple of my bugfix patches were accepted in postgis);
and deal with a big database doing lots of similarity-based searches (a
6'2" guy with light brown hair being similar to a 6'1" guy with dark
blond hair) - and am experimenting with modeling some of the data as
vectors in postgres.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] arrays of floating point numbers / linear algebra operations into the DB

2008-02-01 Thread Ron Mayer

Ted Byers wrote:
> --- Webb Sprague <[EMAIL PROTECTED]> wrote:
>>> ...linear algebra ...
>> ... matrices and vectors .
> ...Especially if some GIST or similar index
>> could efficiently search
> for vectors "close" to other vectors...
> 
> I see a potential problem here, in terms of how one
> defines "close" or similitude.  I think, though,
> practical answers can be found in examples of applying
> quantitative methods in some subdisciplines of
> biology.

Even if the best GIST can give is selecting vectors
constrained by a n-dimensional bounding box (in much
the same way it does for postgis) it can help a lot.

Then your application can select everything in a
conservatively large box, and use whatever it's
favorite metric is to narrow the data further.

> Someone pointed to the potential utility of pl/R.  I
> would be interested at least in learning about your
> assessment of the two (postgis and pl/r.  

I think they'd be complimentary.

IMHO if a native postgresql datatype could allow indexes
to narrow the amount of data that needs to be processed;
it'd be great to do the rest of the work using R (though
we're perhaps foolishly using something else in our
application).

> If you're looking at similarity, and some practicality
> in the USE of quantitative procedures, you may want to
> look into the biogeography and numerical taxonomy
> literature, and to a lesser extent quantitative plant
> ecology.

Indeed.  Though the current literature I wade through
is crime analysis.   Ideally our software tries to
match witness descriptions of "black SUV" as substantially
similar to "dark green 4runner" - especially when seen
at night; and understand that a "Glock 19" is a
"9mm handgun" and that a "9mm handgun" might be but
isn't necessarily a "Glock 19".  Also - two person
records with a tattoo of a cross might contribute a little
similarity -- but two records with tattoos of Darth
Maul / Maori face art contribute much more to the
similarity scores because they're so much more distinctive.

And of course there are entire companies focused on similarity
metrics for fingerprints, DNA, and names (Bill is arguably more
similar to William than to Bell).

Any magic indexes to help such queries would be
very cool; but so far we do most of it in our
application logic.

> A good linear algebra library would be useful, but
> there are a lot of nonlinear analyses that would be of
> interest; and there are nonparametric, yet
> quantitative approaches that are of considerable
> interest in assessing similarity.

True, many things don't map cleanly to linear algebra;
but they would be quite useful.

> If I can manage the time, I hope to start a project
> where I can store description data for specimens of
> plants and animals, use analyses including but not
> limited to ordination, clustering, discriminant
> functions, cannonical correlation, to create a
> structure for comparing them, and for identifying new
> specimens, or at a minimum, if the specimen is truly
> something unknown, learn what known specimens or
> groups thereof it is most similar to, and how it is
> different.

Very cool.  Sounds like a somewhat similar issue to mine.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] [pgsql-advocacy] PostgreSQL Certification

2008-02-03 Thread Ron Mayer


Josh Berkus wrote:

Id really prefer my company be certified by the community rather than by
a company, despite the full respect I have in SRA's engagement in
PostgreSQL and that we all know their contributions.

What would it mean for a company to be certified?

I'd hope it'd mean that I can have some degree of confidence
hiring that organization for Postgresql support.  No?  


It seems to have very similar benefits as certifying individuals.

Microsoft seems to have something like that for their
partners in their "Database Management competency"
https://partner.microsoft.com/global/40012911



---(end of broadcast)---
TIP 6: explain analyze is your friend

[GENERAL] Dump format for long term archiving.

2008-03-13 Thread Ron Mayer


If one wanted to dump some postgres databases for long term
archival storage (maybe decades), what's the recommended
dump format?   Is the tar or plain text preferred, or is
there some other approach (xml? csv?) I should be looking
at instead?

Or should we just leave these in some postgres
database and keep upgrading it every few years?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] postgre vs MySQL

2008-03-15 Thread Ron Mayer


Greg Smith wrote:

On Fri, 14 Mar 2008, Andrej Ricnik-Bay wrote:
A silly question in this context:  If we know of a company that does 
use PostgreSQL but doesn't list it anywhere ... can we take the 
liberty to publicise this somewhere anyway?


I notice Oracle (and sleepycat before them) had a lot of fun
pointing out when Microsoft uses BDB.

http://www.oracle.com/technology/oramag/oracle/07-jan/o17opensource.html
   You'll find Oracle Berkeley DB "under the hood" in everything
   from Motorola cell phones, Microsoft/Groove's collaboration suite

and it seems unlikely Microsoft gave them their blessings.

Bad idea.  There are companies who consider being listed as a user of a 
product a sort of recommendation of that technology, and accordingly 


Other reasons a company might get offended by this:

 * They might consider it a trade secret and a competitive advantage
   over competitors; and internally enjoy giggling when they see
   their competitors sign deals with expensive databases.

 * They might have a close business partnership with Microsoft
   or Oracle that could be strained if they support other databases.

I suspect my employer would not like it announced for both reasons.


they will get really annoyed...asked to be removed from the list of
those using PostgreSQL.  ... PostgreSQL inside, it's best not to
publish the results unless you like to collect cease & desist letters.


While I agree companies are likely to get annoyed - just like fast
food companies do when you say how much trans-fats their products
contain; I'm rather curious what such a cease&desist letter would say.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Survey: renaming/removing script binaries (createdb, createuser...)

2008-03-26 Thread Ron Mayer


Zdeněk Kotala wrote:


1) What type of names do you prefer?


I'd prefer a "pg" program that took as arguments
the command.  So you'd have "pg createdb" instead
of "pg_createdb".

There are many precedents. "cvs update", "git pull"
"apt-get install".

Anyone else like this approach?

Of the choices, though, I prefer (d).



2) How often do you use these tools?
---

a) every day (e.g. in my cron)
b) one per week
c) one time
d) never


About monthly.  I guess (b) is closest.



3) What name of initdb do you prefer?
-- --

a) initdb
b) pg_initdb
c) pg_init
d) pg_ctl -d  init  (replace initdb with pg_ctl new functionality)
e) What is initdb? My start/stop script does it automatically.


pg initdb

Of the choices, though, I guess (d).



4) How do you perform VACUUM?
-

a) vacuumdb - shell command
b) VACUUM - SQL command
c) autovacuum
d) What is vacuum?


(c) on most databases
(b) on certain databases



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] IBM investing in EnterpriseDB

2008-03-26 Thread Ron Mayer


Clodoaldo wrote:

...IBM is investing...What does it mean for Postgresql?


One cool thing it means is that there are now *two*
companies (thanks again Fujitsu) bigger than
Oracle backing (to some extent) Postgres.

And now one company bigger than Microsoft.

Yeah, this doesn't affect the community much. But it
sure comes in useful when your CFO calls you into a
meeting and says "Hey, I just had lunch with
our Microsoft rep and he asked why we're running
some unsupported freeware database."

Your CFO wouldn't want to run your company on a
database - like Oracle 10i and MySQL and SQLServer - that
are only backed by little (under $50B revenue) guys, would he?

:-)


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Survey: renaming/removing script binaries (createdb, createuser...)

2008-03-26 Thread Ron Mayer


Tom Lane wrote:

"Leif B. Kristensen" <[EMAIL PROTECTED]> writes:

On Wednesday 26. March 2008, Ron Mayer wrote:

I'd prefer a "pg" program that took as arguments
the command.  So you'd have "pg createdb" instead
of "pg_createdb".


I like this too.  It'd be considerably more work than the currently
proposed patch, though, since we'd have to meld the currently
separate programs into one executable.


If it'd help make it happen, I could work on a patch as a
strawman.


One fairly serious objection is that doing so would eliminate the
current distinction between client-side and server-side applications,
at least if we wanted to fold both sets into one "pg" executable.
So a client-only install would be carrying some baggage in the form
of code that's useless if the server isn't local.

If we are OK with restricting the scope of the "pg" program to
client-side functionality, then there's no problem.


Could it detect if the server side components aren't installed
and give a "server components aren't installed" error message
if they aren't available?This could probably be handled
reasonably portably if the "pg" program called a separate
executable for the server functionality behind the scenes.

From a user point of view, I think the single executable's
nice.   From an installer point of view, I think keeping
them separate is nice.  Seems both would be possible this
way, tho.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Survey: renaming/removing script binaries (createdb, createuser...)

2008-03-26 Thread Ron Mayer


Tom Lane wrote:

"Leif B. Kristensen" <[EMAIL PROTECTED]> writes:

On Wednesday 26. March 2008, Ron Mayer wrote:

...a "pg" program that took as arguments
the command.  So you'd have "pg createdb" instead
of "pg_createdb".



I'll second that. ...


I like this too. 


Though I guess we might need to find a different name
if we want to go down that path.  "man pg" tells me
"browse pagewise through text files".

IMHO "postgres createdb" is still better than the
whole bunch of separate commands.

Or even "pgSQL createdb" just so people know it's
not a pre-SQL postgres database :-)


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Schema design question

2008-03-31 Thread Ron Mayer


Ben wrote:
I'm working on a project which requires me to keep track of objects, 
each of which can have an arbitrary number of attributes. Although there 
will be many attributes that an object can have,...
Anyway, this seems like a common problem without a perfect solution, and 
I'm sure people must have hindsight opinions on how they solved it. Your 
thoughts?


If each attribute can only occur once for any object, and if your
queries are based more on exact matches of the values rather than
range queries, you might want to look into the hstore module:
http://www.postgresql.org/docs/current/static/hstore.html

If attributes can occur more than once, you might even want to
serialize the whole object as some text format (yaml, xml, etc)
instead.





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to recovery data from folder data installation?

2008-04-16 Thread Ron Mayer


Magnus Hagander wrote:

Craig Ringer wrote:

with a version of PostgreSQL with the same minor version as the one
you were using on the server, eg if you were using 8.1.4 you should
get the latest PostgreSQL in the 8.1 series (NOT 8.2 or 8.3) to try
to read the data.


What you mean here is of course that you should be using the same
*major* version, but the latest available *minor* version.

Other than that, it's correct - and the version numbers in the example
is correct :-)


If we want to stop people from being confused about this terminology,
we would need to make our definitions of these terms more visible
on the main web site.

Many places talk about the "latest release" - but unfortunately
with differing definitions.  For example, "latest release" is
  defined as 8.3 in the english language FAQ;
  defined as 8.3.1 and 8.2.7, etc on the english language home page;
  defined as 8.2.3 on the traditional chinese FAQ, etc.

Some places (the english FAQ) talk about "major releases" happening every
year and "minor releases" happening every few months, but without defining
what that means to the numbering.

Perhaps there should be a FAQ that says that our numbering system
defines the digits as "marketing"."major"."minor".

It's no surprise to me that people don't use those terms consistently.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Probably been asked a hundred times before.

2008-06-26 Thread Ron Mayer


Lincoln Yeoh wrote:

At 10:30 PM 6/24/2008, David Siebert wrote:

Which disto is best for running a Postgres server?


Just to add one more slightly different philosophy.

For servers I manage, I run the most conservative
and slow changing distros that only update security
releases (Debian Stable, RHEL are good choices; no
doubt Solaris would be too;  Ubuntu updates too
frequently for my tastes).  For the components less
core to our business (ssh, munin, etc) we trust the
distro provider to provide security updates and to do
the very minimum of other changes that might have
compatibility issues.

For the components that are more core to our
business, though, we get the source from the projects
themselves (like postgresql.org) and compile from
source.   This gives us the advantages of being
totally in control of when updates occur, and of
having developers be able to attach debuggers if
need be.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] INTERVAL SECOND limited to 59 seconds?

2009-06-07 Thread Ron Mayer

Finally got around to looking at this thread.
Looks like the original questions from the thread
got resolved, but I found this behaviour surprising:

regression=# select interval '1' day to second;
 interval
--
 @ 1 hour
(1 row)

Should this be 1 second?
If so I can send a patch.





regression=# select version();
  version
---
 PostgreSQL 8.4beta2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.2.4 
(Ubuntu 4.2.4-1ubuntu3), 32-bit
(1 row)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] INTERVAL SECOND limited to 59 seconds?

2009-06-08 Thread Ron Mayer

Tom Lane wrote:
> Ron Mayer  writes:
>> Looks like the original questions from the thread
>> got resolved, but I found this behaviour surprising:
> 
>> regression=# select interval '1' day to second;
>>  interval
>> --
>>  @ 1 hour
>> (1 row)
> 
>> Should this be 1 second?
> 
> That is a bit odd, especially seeing that eg. '1' hour to second
> comes out as 1 second.  What's making it do that?

What from a design point of view?   Seems like it's a side
effect of the logic that makes:
   select interval '1 2';
know that the 2  means hours rather than seconds.

Code-wise, it seems because around line 2906 in DecodeInterval:
  switch (range) ...
case INTERVAL_MASK(DAY) | INTERVAL_MASK(HOUR) | INTERVAL_MASK(MINUTE) | 
INTERVAL_MASK(SECOND):
type=DTK_HOUR;
But if I naively change that by making it DTK_SECOND,
I'd break "select interval '1 2' day to second;".   I guess I'd need
to tweak it to say: if it follows a days filed it means hours; but
by itself it means seconds?

There's a bit of other odd stuff around there.  It seems CVS head accepts
"select interval '1 2' hour;" but not "select interval '1 2' hour to minute;"
regression=# select interval '1 2' hour;
interval

 1 day 02:00:00
(1 row)
and I would have guessed that either both should succeed or both should fail.
And if both succeed I wouldn't have expected 1 day 2 hours..

I'd still be happy to send a patch, but am still trying to figure out
what the desired behavior is.   My current impression:

What's the desired behavior for each of these:

  select interval '1' day to second;
--- should it be 1 second to be consistent with "select interval 1;"?
--- or an error as Sebastien argued in a different part of the thread?

  select interval '1 2' hour;
--- should be an error as "select interval '1 2' hour to minute" is?
--- should be "1 day 2 hours" as cvs head treats
"select interval '1 day 2 hours' hour to minute;"?
--- should be 2 hours?

  select interval '1 2' hour to minute;
--- should be an error as "select interval '1 2' hour to minute" is?
--- should be "1 day 2 hours" as cvs head treats
"select interval '1 day 2 hours' hour to minute;"?
--- should be 2 hours?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] INTERVAL SECOND limited to 59 seconds?

2009-06-09 Thread Ron Mayer

Tom Lane wrote:
> I wrote:
>> I'm inclined to say that these two cases are out of line with what
>> the rest of the code does and we should change them.
>  ...
> Now, all three of these cases throw "invalid input syntax" in 8.3,
> so this is not a regression from released behavior.  The question
> is does anyone think that these syntaxes should be valid?  They're
> not legal per spec, for sure, and they seem pretty ambiguous to me.

Seems to do a sane thing for all sane inputs I threw at it.

It still accepts one odd input that 8.3 rejected:
regression=# select interval '1 1' hour;
Perhaps the additional patch below fixes that?

***
*** 3022,3028  DecodeInterval(char **field, int *ftype, int nf, int range,
tm->tm_hour += val;
AdjustFractSeconds(fval, tm, 
fsec, SECS_PER_HOUR);
tmask = DTK_M(HOUR);
!   type = DTK_DAY; /* set for next 
field */
break;

case DTK_DAY:
--- 3022,3029 
tm->tm_hour += val;
AdjustFractSeconds(fval, tm, 
fsec, SECS_PER_HOUR);
tmask = DTK_M(HOUR);
!   if (range == 
(INTERVAL_MASK(DAY) | INTERVAL_MASK(HOUR)))
!   type = DTK_DAY; /* set 
for next field */
break;

case DTK_DAY:

It also gives different answers than 8.3 for "select interval '1 1:' hour"
but I guess that's intended, right?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] INTERVAL SECOND limited to 59 seconds?

2009-06-09 Thread Ron Mayer

Tom Lane wrote:
> Ron Mayer  writes:
>> regression=# select interval '1 1' hour;
> 
> Hmm, not sure about that one.  We decided a week or two back that we
> don't want the thing discarding higher-order field values, and this
> seems pretty close to that.  As the code is set up (plus my patch)
> I think it's the case that only the rightmost field specification
> of the interval qualifier makes any difference for parsing the value;
> the leftmost field doesn't really affect what we think the constant
> means.  That seems like a nice simple consistency property ...

Sounds good.  I'm happy with it then.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] generic modelling of data models; enforcing constraints dynamically...

2009-09-25 Thread Ron Mayer

Sam Mason wrote:
> It all depends on the problem domain of course, but this seems to work
> OK for us!  I really want to hack Samba around so that the users can
> view the files directly from inside the database, but I'm not sure how
> good an idea this really.

"hack Samba"?   Wouldn't it be easier to use one of the database-as-a
filesystem FUSE bindings and run stock samba over that?

The perl Fuse::DBI module's example  sounds pretty similar to the
system you described where he "file" seems to be a column in a table.
http://www.rot13.org/~dpavlin/fuse_dbi.html

If that doesn't suit your needs there are more elaborate ones(libferris)
that seem more complex and more flexible, and simpler ones (dumbofs) that
seem to be meant more as example code you could hack for your purposes
http://lwn.net/Articles/306860/
http://yiannnos.com/dumbofs

And then you could use unmodified samba out-of-the-box exporting
that to whatever the heck speaks SMB/CIFS these days.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] generic modelling of data models; enforcing constraints dynamically...

2009-09-25 Thread Ron Mayer

Drifting off topic so I'm no longer ccing the lists.

Sam Mason wrote:
> 
>> The perl Fuse::DBI module's example  sounds pretty similar to the
>> system you described where he "file" seems to be a column in a table.
>> http://www.rot13.org/~dpavlin/fuse_dbi.html
> 
> FUSE looks pretty easy to get going and I think I'd want more control
> over how files were presented than this gives so I'd probably end up
> rolling my own code.  Thanks for pointing out that FUSE though, not sure

I FUSE really more the framework that wraps around your code.

There are applications using fuse that expose gmail as a filesystem.

Here's a simple example that uses FUSE to expose a
perl HASH and a few hello-world-like perl functions.
http://cpansearch.perl.org/src/NOSEYNICK/Fuse-Simple-1.00/README

> why I'd not thought of it before.  I'll probably still never get around
> to it, but maybe I will!

It's actually easy enough that I wouldn't be surprised if you
try it, and get it working just for fun even if noone uses it.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Absolute value of intervals

2009-11-02 Thread Ron Mayer

On Mon, Nov 02, 2009 at 10:52:40AM +, Jasen Betts wrote:
> what's the absolute value of '1month -30 days'::interval

Curious what a use case for taking the absolute value
of such mixed intervals might be.

I could imagine such intervals being used for stuff like
"XXX is due in Y months but needs to be submitted Z days earlier".

But I can't really imagine worrying about if it's positive
or negative until it's evaluated at a particular date (i.e.
reporting "XXX has to be submitted in N days" or
"XXX was supposed to be submitted N days ago").


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgresql 8.3 statistic stadistinct value dead wrong even with default_statistics_target=1000

2008-08-29 Thread Ron Mayer


Tom Lane wrote:

Maxim Boguk <[EMAIL PROTECTED]> writes:

[ ndistinct estimates way off ]


Estimating the number of distinct values in a distribution with a long
tail is just a really hard problem :-(


If I have a table where I know it has this property, is there
any way I can tell autovacuum's ANALYZE to go ahead and scan
the whole table?ISTM a vacuum_cost_delay throttled full
table scan might be less harmful then a bad ndistinct guess.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PostgreSQL TPC-H test result?

2008-09-11 Thread Ron Mayer


Tom Lane wrote:
interval '1' year.  


...is SQL spec syntax, but it's not fully implemented in Postgres...

Or someone could try to make it work, but given that no one has taken
the slightest interest since Tom Lockhart left the project, I wouldn't
hold my breath waiting for that.


I have interest. For 5 years I've been maintaining a patch for a client
that allows the input of ISO-8601 intervals (like 'P1YT1M') rather than
the nonstandard shorthand ('1Y1M') that postgresql supports[1].

I'd be interested in working on this. Especially if supporting SQL
standard interval syntax could improve the chances of getting my
ISO-8601-interval-syntax replacing nonstandard-postgres-shorthand-intervals
patch accepted again, I'd be quite happy work on it.

Tom in 2003 said my code looked cleaner than the current code[2], and
the patch was accepted[3] for a while before being rejected - I believe
because Peter said he'd like to see the SQL standard intervals first.
I see it's still a TODO, though.


the grammar supports it but the info doesn't get propagated to
interval_in, and interval_in wouldn't know what to do even if it did
have the information that there was a YEAR qualifier after the literal.


Any hints on how best to propagate the needed info from the grammar?
Or should it be obvious to me from reading the code?

[1] http://archives.postgresql.org/pgsql-patches/2003-09/msg00119.php
[2] http://archives.postgresql.org/pgsql-patches/2003-09/msg00121.php
[3] http://archives.postgresql.org/pgsql-patches/2003-12/msg00253.php

  Ron Mayer
  (formerly [EMAIL PROTECTED] who
  posted those ISO-8601 interval patches)


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PostgreSQL TPC-H test result?

2008-09-11 Thread Ron Mayer


Ron Mayer wrote:

Tom Lane wrote:

Or someone could try to make it work, but given that no one has taken
the slightest interest since Tom Lockhart left the project, I wouldn't
hold my breath waiting for that.


I have interest. For 5 years I've been maintaining a patch for a client


Doh.  Now that I catch up on emails I see Tom has a patch
in a different thread.  I'll follow up there...


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-27 Thread Ron Mayer

You might want to try using a file system (ZFS, NTFS) that
does compression, depending on what you're trying to compress.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-29 Thread Ron Mayer


Grant Allen wrote:
...warehouse...DB2...IBM is seeing typical 
storage savings in the 40-60% range


Sounds about the same as what compressing file systems claim:

http://opensolaris.org/os/community/zfs/whatis/
 "ZFS provides built-in compression. In addition to
  reducing space usage by 2-3x, compression also reduces
  the amount of I/O by 2-3x. For this reason, enabling
  compression actually makes some workloads go faster.

I do note that Netezza got a lot of PR around their
compression release; claiming it doubled performance.
Wonder if they added that at the file system or higher
in the DB.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-10-31 Thread Ron Mayer


Chris Browne wrote:

There's a way that compressed filesystems might *help* with a risk
factor, here...
By reducing the number of disk drives required to hold the data, you
may be reducing the risk of enough of them failing to invalidate the
RAID array.


And one more way.

If neither your database nor filesystem do checksums on
blocks (seems the compressing filesystems mostly do checksums, tho),
a one bit error may go undetected corrupting your data without you
knowing it.

With a filesystem compression, that one bit error is likely to grow
into something big enough to be detected immediately.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres mail list traffic over time

2008-11-21 Thread Ron Mayer


Bruce Momjian wrote:

Tom Lane wrote:

Bruce Momjian <[EMAIL PROTECTED]> writes:

I also was confused by its flatness.  I am finding the email traffic
almost impossible to continue tracking, so something different is
happening, but it seems it is not volume-related.

Yes, my perception also is that it's getting harder and harder to keep
up with the list traffic; so something is happening that a simple
volume count doesn't capture.


If measured in "bytes of the gzipped mbox" it looks like there's a
*huge* increase of volume on Hackers in the past 3 months - well
over twice the historical levels; and maybe 4X 2002-2006.

Graphs of this metric can be seen here:

  http://0ape.com/postgres_mailinglist_size/

In some ways I think compressed mbox sizes are a more fair way
of measuring the bandwidth for these lists since it (correctly)
counts a large gzipped path as requiring more mental effort than
people top-posting brief messages on top of old threads.


(Data from commands like
HEAD http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.2008-09.gz 
| grep Content-Length
)

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres mail list traffic over time

2008-11-21 Thread Ron Mayer


Joshua D. Drake wrote:

On Fri, 2008-11-21 at 08:18 -0800, Ron Mayer wrote:

Bruce Momjian wrote:

Tom Lane wrote:

... harder to keep
up with the list traffic; so something is happening that a simple
volume count doesn't capture.

If measured in "bytes of the gzipped mbox" it ...


Its because we eliminated the -patches mailing list.


That's part of it.  I've added -patches to the graph at
http://0ape.com/postgres_mailinglist_size/ as well as
a graph of hackers+patches combined; and it still looks
like hackers+patches is quite high in the past 3 months.

With hackers+patches it looks like 2002-08 was the biggest
month; but the past 3 months still look roughly twice
late 2007's numbers.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres mail list traffic over time

2008-11-21 Thread Ron Mayer


Tom Lane wrote:

Ron Mayer <[EMAIL PROTECTED]> writes:

Joshua D. Drake wrote:

Its because we eliminated the -patches mailing list.



That's part of it.  I've added -patches to the graph at
http://0ape.com/postgres_mailinglist_size/ as well as
a graph of hackers+patches combined; and it still looks
like hackers+patches is quite high in the past 3 months.


One of the reasons we got rid of -patches was the frequency of
cross-posting to both -hackers and -patches.  Are you double-counting
cross-posted messages?


For the combined graph I just summed the output of:
 HEAD 
http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.2008-09.gz | 
grep Content-Length
 HEAD 
http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-patches.2008-09.gz | 
grep Content-Length

I didn't look to see if the downloadable mboxes had duplicate messages.

If people want the raw data, here's the script I used to get it.

#!/usr/bin/env ruby
%W{rubygems hpricot open-uri gruff}.each{|l| require l}
def chart(url)
  h   = Hpricot.parse(open(url){|f| f.read})
  mboxes = (h / "//a").map{|x| x.attributes['href']}. select{|x| x=~/\.gz/}
  mboxes.sort.each{|x|
y = `HEAD #{url}/#{x}` =~ /Content-Length: (\d+)/ && $1
puts "#{x} #{y}"
  }
end
patches = chart('http://archives.postgresql.org/pgsql-patches')
general = chart('http://archives.postgresql.org/pgsql-general')
hackers = chart('http://archives.postgresql.org/pgsql-hackers')


Perhaps some of the extra burden on the experienced hackers is
a larger volume of newer people trying to contribute that are needing
more handholding (and thus more re-posted updated patches, etc)?


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Interval Format

2008-11-21 Thread Ron Mayer


Zagato wrote:

I have som SQL that in 8.0.3 do:
# SELECT '32 hours'::INTERVAL;
interval
-
 @ 1 day 8 hours

And in 8.3.5 do:
seg_veh2=# SELECT '@ 32 hours'::INTERVAL;
  interval

 @ 32 hours

Why i unable to get my old style of interval, i really need to see the 
day part im using datestyle german in both im try with "SET 
IntervalStyle TO..." but i get this error:


Even IntervalStyle won't help you here; since recent releases consider
the values "1 day 8 hours" as distinct from "32 hours" (because
they are at certain times of the year in some timezones
where days are 25 hours, etc).


So.. how can i recover my "days" part...


If you tell it what time of year you are thinking of this trick can work:

dish=# set datestyle to german;
SET
dish=# select  '2001-01-01'::timestamp + (interval '32 hours') - 
'2001-01-01'::timestamp;
?column?
-
 @ 1 day 8 hours
(1 row)




Any help it appreciate... thanks :D



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Very large tables

2008-11-28 Thread Ron Mayer


Grzegorz Jaśkiewicz wrote:


true, if you don't want to search on values too much ,or at all - use 
float[]. But otherwise, keep stuff in a tables as such.
It might be humongous in size, but at the end of the day - prime thing 
when designing a db is speed of queries.


If he's worried about speed, he might look into porting contrib/intarray
to arrays of floats.   I've had good luck improving performance by
converting big tables of ints into intarray with indexes.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Excessive (and slow) fsync() within single transaction

2009-12-09 Thread Ron Mayer

Scott Marlowe wrote:
> Actually, it's usually the drives that lie about fsync, especially
> consumer grade (and some server grade) SATA / PATA drives are known
> for this.

I'm still looking for any evidence of any drive that lies.

Is there actually a drive which claims to support the
IDE "FLUSH_CACHE_EXT" feature (IDE command 0xEA) that doesn't?!?

You can see if your drive claims to - most this century do -
in Linux with "hdparm -I /dev/sdX" which should show FLUSH_CACHE_EXT
if the disk supports this extension.  And if your disk is using a
last-century IDE spec that predated that feature, I think your kernel
probably told you in the startup messages that your drive is
not so much lying, but obsolete and missing a feature.

Software on the other hand sucks at sending these commands.

Ext3 will only send the right IDE commands if the inode was
touched[1]. VirtualBox doesn't send them correctly[2].  I
hear LVM and MD raid can lose those events too.
Since so many layers don't send the right IDE commands to flush,
userspace utilities exist to send those commands[4].  I
sometimes wonder if postgres should add something like that.

But I still haven't found any drive that lies -- and only a couple
who's firmware openly admits not to support the feature.

[1] http://www.mail-archive.com/linux-ker...@vger.kernel.org/msg272253.html
[2] http://openquery.com/blog/host-flush
[3] 
http://forum.soft32.com/linux/flush-disk-write-cache-userspace-ftopict338477.html

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Possible causes for database corruption and solutions

2009-12-15 Thread Ron Mayer

Craig Ringer wrote:
> On 16/12/2009 9:07 AM, Scott Marlowe wrote:
>> I'd also recommend moving off of OSX as you're using a minority OS as
>> far as databases are concerned, and you won't have a very large
>> community to help out when things do go wrong
> 
> It sounds like PostgreSQL is being used as a DB bundled with an app -
> not quite embedded, but as close as Pg gets. ...
> For what its worth, there are clearly a fair few Mac OS X users of Pg
> out there...the things they ask about don't suggest to me that
> Mac OS X is a particularly untrustworthy platform for running Pg on.

Not to mention that Apple seems to use postgres bundled in some of
their high-end software:

http://support.apple.com/kb/TS1648
http://www.apple.com/finalcutserver/
http://www.devworld.apple.com/appleapplications/ardsql.html


If postgres had any significant trouble running on OSX, I'm confidant
Apple would love to hear about it and do something about it very quickly.




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Justifying a PG over MySQL approach to a project

2009-12-20 Thread Ron Mayer

Lincoln Yeoh wrote:
> Ten or so years ago MySQL was better than Postgres95, and it would have
> been easy to justify using MySQL over Postgres95 (which was really slow
> and had a fair number of bugs). But Postgresql is much better than MySQL
> now. That's just my opinion of course.

Really?!?

MySQL development started in '94; and their first internal release was May 
95.[1]

At that time Postgres's SQL language support was new, but didn't the underlying
database already have a half decade of history that surely was more mature
than MySQL at the time?

I thought the main justification for MySQL back then is that they had
better Win95 support (and a quality control philosophy that more matched
the old pre-NT windows that favored time-to-market over correctness).

[1] http://en.wikipedia.org/wiki/MySQL#cite_note-21

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Justifying a PG over MySQL approach to a project

2009-12-20 Thread Ron Mayer

Gauthier, Dave wrote:
> The arguments against PG are not technical. 

A few more points that I didn't see in this thread yet that might help
answer the non-technical questions:

* There seem to be more commercial vendors providing support
  for Postgres than MySQL - because most mysql support came
  from that one company.
  http://www.postgresql.org/support/professional_support

* There are bigger companies supporting Postgres than mysql.
  And yes, that'll still be true even if Oracle supports MySQL.
  http://postgresql.fastware.com/

* There are a number of extremely scalable commercial solutions
  based on postgres forks (greenplum, enterprisedb, aster, whatever
  yahoo uses, etc).  These run many of the largest databases
  in the world. If you expect your app to grow to that scale; it
  might make your migration easier.

* There are specialty commercial companies that support
  specific postgres features very well - such as Refractions
  specialized http://www.refractions.net/ which provide
  great postgis support.

* There are enough large companies that depend entirely
  on each of the databases that make either one a save
  choice from that point of view (Skype).   And the way
  Apple and Cisco use it for a number of their programs
  (google "cisco postgresql" or "apple final cut postgreesql"
  for links) are other nice datapoints of companies most
  managers would have heard of.

> Dear Santa, All I want for Christmas is to be able to keep my DB.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] postgres vs mysql conventional wisdom

2010-04-01 Thread Ron Mayer

On this first day of the month, I thought it might be interesting
to re-visit the conventional wisdom about postgres vs mysql.

Do these seem like fair observations?


Storage engines - Advantage Postgres for having far more available.

 Postgre has such a wide range of storage engines to choose from --
 ranging from whatever proprietary backend Yahoo uses,
 to Fujitsu's proprietary storage engine; to PowerGres Plus's
 proprietary storage engine; to Netezza's hardware accelerated
 storage engine.  In contrast, MySQL only has a few to choose from.

Broad Developer base - Advantage MySQL for having a more distributed team.

 Postgres developers seem to be concentrated in a handful of companies.
 After various acquisitions, MySQL developers are scattered to the four winds.

Large company support - Advantage Postgres for having larger companies support 
it.

 Fujitsu ($46 billion 2009 revenue; 185,000
 employees) much bigger than that little company  that bought
 whomever bought MySQL ($23 billion revenue; 73000 employees).

Any more?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Pet Peeves?

2009-01-29 Thread Ron Mayer

Gregory Stark wrote:
> I'm putting together a talk on "PostgreSQL Pet Peeves" for discussion at
> FOSDEM 2009 this year. I have a pretty good idea what some them are of course,

* The capitalization that makes everyone (customers, execs, etc) I introduce
  it to parse the name as Postgre-SQL.

* Last I checked, postgres keeps my laptop's hard drive spinning all the time
  by doing some I/O even when "totally idle".  For that matter, waking
  up the CPUs a few times a second too.

* pgfoundry - which was once described on the mailinglists as
  the "kiss of death" for projects.   Yes, I understand why it is
  what it is; but it's still a pet-peeve.   I can't even put my finger
  on what's annoying about it; but it is.

* Upgrades involving databases with extra modules like postgis.
  $ sh utils/postgis_restore.pl lwpostgis.sql newdb olddb.dump -E=UNICODE

* Finding extensions I might want.  For example:
  - Surely someone wrote a good count(*)-replacement-trigger before.
Now where can I find one?  Searching for "count" on pgfoundry
doesn't help me.   Searching varlena genralbits find shows me
a simple one, but IIRC is lacking when it comes to concurrency.
Googling just gets me lots of "read the archives" postings in
the archives.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Pet Peeves?

2009-01-30 Thread Ron Mayer

Gregory Stark wrote:
> One thing which has *not* been mentioned which i find positively shocking is
> VACUUM. This was once our single biggest source of user complaints. Between
> Autovacuum improvements and HOT previously and the free space map in 8.4 the
> situation will be much improved.

The other 2 features that made VACUUM much less painful for me was
vacuum_cost_delay (8.0?) and allowing concurrent GiST indexes (8.1?).

Before those features were in, VACUUM could choke even a very
lightly loaded moderately large database for arbitrarily long times.

I guess I'd still like some more convenient tuning of autovacuum (perhaps
specifying X mbps disk I/O); but I'd say vacuum fell off my pet-peeve list
around the 8.1 timeframe.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] where to divide application and database

2009-02-21 Thread Ron Mayer

Ivan Sergio Borgonovo wrote:
> On Fri, 20 Feb 2009 06:50:22 -0800
> David Fetter  wrote:
>>> ... moving some of the checks
>>> into the database and away from the application.
>> Since a useful database has *many* applications instead of "the"
>> application, I think this is an excellent move.
> 
> I was wondering if "checks" may have an impact
> on performances and if pg does some optimisation over them.

Are you suggesting thee would be a positive or negative impact
on performance.

Moving some checks in the database should *improve* performance
by giving the planner improved information.For example, unique
constraints indicate when only 0-1 rows may come out of a query;
and range constraints could let a database know when a partition
doesn't even need to be visited.
No doubt other checks (say, spellchecking a column) would have
have performance costs.

I'm with David Fetter's perspective of considering multiple
applications that can run on top of a database.

If a particular check should apply to all conceivable applications
that run on a database (say, foreign key constraints) it seems
to me they belong in a database.

If a particular check should apply to just one application, but
other applications might have reasons not to enforce such a check
(say, one app might do spell checking in english; another in a
different language) - that belongs in the app.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] High cpu usage after many inserts

2009-02-24 Thread Ron Mayer

Joshua D. Drake wrote:
> On Wed, 2009-02-25 at 09:21 +0900, Jordan Tomkinson wrote:
>> On Wed, Feb 25, 2009 at 12:05 AM, Aidan Van Dyk 
>> wrote:
>> * Greg Smith  [090201 00:00]:
>> > Shouldn't someone have ranted about RAID-5 by this point in
>> the thread?
>> You mean someone's actually still using RAID-5?
>> ;-)
>>
>> What exactly is wrong with RAID5 and what should we have gone with? 

On top of the stuff Joshua wrote, there's also the "RAID 5 Write Hole".
Quoting Wikipedia:
"In the event of a system failure while there are active writes, the
 parity of a stripe may become inconsistent with the data. If this is
 not detected and repaired before a disk or block fails, data loss may
 ensue as incorrect parity will be used to reconstruct the missing block
 in that stripe. This potential vulnerability is sometimes known as the
 write hole. Battery-backed cache and similar techniques are commonly
 used to reduce the window of opportunity for this to occur."
And in more detail from http://blogs.sun.com/bonwick/entry/raid_z
"RAID-5 write hole... What's worse, it will do so silently -- it has
 no idea that it's giving you corrupt data."

I sometimes wonder if postgres should refuse to start up
on RAID-5 in the same way it does on VFAT or running root.
:-)

> RAID5 outside of RAID 0 is the worst possible RAID level to run with a
> database. (of the commonly used raid level's that is).
> 
> It is very, very slow on random writes which is what databases do.
> Switch to RAID 10.
> 
> Sincerely,
> 
> Joshua D. Drkae
> 
> 
>>

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] speaking of 8.4...

2009-02-26 Thread Ron Mayer

Joshua D. Drake wrote:
> On Fri, 2009-02-27 at 10:19 +1300, Tim Uckun wrote:
>> 
>> [according to some page on the web site...]
>> 8.4 was scheduled to be released march 1.  Do we know what the
> All schedules are subject to change within the community :)
>>  tentative date of release is? 
> 
> When it is done of course.

Perhaps that should be the official position communicated in
the various places on the web site.

I do notice that the Press FAQ with it's Q4 2008 guess
is even more optimistic than the other page on the website
people seem to be finding that implies March.

http://www.postgresql.org/about/press/faq
  Q: When will 8.4 come out?
  A: Historically, PostgreSQL has released approximately
 every 12 months and there is no desire in the community
 to change from that pattern. So expect 8.4 sometime in
 the fourth quarter of 2008.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] speaking of 8.4...

2009-02-26 Thread Ron Mayer

Joshua D. Drake wrote:
> On Thu, 2009-02-26 at 15:27 -0800, Ron Mayer wrote:
>> Joshua D. Drake wrote:
>>> On Fri, 2009-02-27 at 10:19 +1300, Tim Uckun wrote:
>>>> 
>>>> 8.4 was scheduled to be released march 1. ...
>>
>> I do notice that the Press FAQ with it's Q4 2008 guess
>> is even more optimistic ...
> 
> Wow that must have been written some time ago I don't think anyone
> thought we would hit that date in any recent (say last 6-8 months).
> 

That press FAQ page is often sadly amusing.

Back in Jan 2008 that same page stated that 8.3 "would" come out
in July 2007:
http://archives.postgresql.org/pgsql-advocacy/2008-01/msg00235.php

But that's not nearly as sad as the Chinese FAQs that state that
the latest version of Postgres is 8.2.1 or 8.2.3 depending on
whether you prefer traditional or simplified writing styles.
http://www.postgresql.org/docs/faqs.FAQ_chinese_simp.html
http://www.postgresql.org/docs/faqs.FAQ_chinese_trad.html
And for german speakers, their newest is 8.2.5
http://www.postgresql.org/docs/faqs.FAQ_german.html
Basically for all the languages you get a different "lateset"
release.

Surely these FAQ entries are doing more harm than good.

Can we please just update all of these to link to some page
where the actual latest version is instead; and replace the
forward-looking expectations with something that is less
misleading?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Maximum transaction rate

2009-03-17 Thread Ron Mayer

Greg Smith wrote:
> There are some known limitations to Linux fsync that I remain somewhat
> concerned about, independantly of LVM, like "ext3 fsync() only does a
> journal commit when the inode has changed" (see
> http://kerneltrap.org/mailarchive/linux-kernel/2008/2/26/990504 ).  The
> way files are preallocated, the PostgreSQL WAL is supposed to function
> just fine even if you're using fdatasync after WAL writes, which also
> wouldn't touch the journal (last time I checked fdatasync was
> implemented as a full fsync on Linux).  Since the new ext4 is more

Indeed it does.

I wonder if there should be an optional fsync mode
in postgres should turn fsync() into
fchmod (fd, 0644); fchmod (fd, 0664);
to work around this issue.

For example this program below will show one write
per disk revolution if you leave the fchmod() in there,
and run many times faster (i.e. lying) if you remove it.
This with ext3 on a standard IDE drive with the write
cache enabled, and no LVM or anything between them.

==
/*
** based on http://article.gmane.org/gmane.linux.file-systems/21373
** http://thread.gmane.org/gmane.linux.kernel/646040
*/
#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc,char *argv[]) {
  if (argc<2) {
printf("usage: fs \n");
exit(1);
  }
  int fd = open (argv[1], O_RDWR | O_CREAT | O_TRUNC, 0666);
  int i;
  for (i=0;i<100;i++) {
char byte;
pwrite (fd, &byte, 1, 0);
fchmod (fd, 0644); fchmod (fd, 0664);
fsync (fd);
  }
}
==


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Maximum transaction rate

2009-03-18 Thread Ron Mayer

Marco Colombo wrote:
> Ron Mayer wrote:
>> Greg Smith wrote:
>>> There are some known limitations to Linux fsync that I remain somewhat
>>> concerned about, independantly of LVM, like "ext3 fsync() only does a
>>> journal commit when the inode has changed" (see
>>> http://kerneltrap.org/mailarchive/linux-kernel/2008/2/26/990504 )
>> I wonder if there should be an optional fsync mode
>> in postgres should turn fsync() into
>> fchmod (fd, 0644); fchmod (fd, 0664);
'course I meant: "fchmod (fd, 0644); fchmod (fd, 0664); fsync(fd);"
>> to work around this issue.
> 
> Question is... why do you care if the journal is not flushed on fsync?
> Only the file data blocks need to be, if the inode is unchanged.

You don't - but ext3 fsync won't even push the file data blocks
through a disk cache unless the inode was changed.

The point is that ext3 only does the "write barrier" processing
that issues the FLUSH CACHE (IDE) or SYNCHRONIZE CACHE (SCSI)
commands on inode changes, not data changes.   And with no FLUSH
CACHE or SYNCHRONINZE IDE the data blocks may sit in the disks
cache after the fsync() as well.

PS: not sure if this is still true - last time I tested it
was nov 2006.

   Ron

-
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] postgreSQL & amazon ec2 cloud

2009-03-19 Thread Ron Mayer

Tom Lane wrote:
> Adrian Klaver  writes:
>> Nothing. I have created a Postgres instance on an EC2 virtual machine with 
>> attached EBS(Elastic Block Storage)..[...]
> 
> ... I wonder whether you have any guarantees about database consistency
> in that situation?  PG has some pretty strong requirements about fsync

While I agree it shouldn't be taken on faith, their documentation
does take the time to point out that syncing of I/O's is pretty
expensive operation (even literally - they charge extra for individual
i/o operations and point out that those increase with syncs).
http://aws.amazon.com/ebs/

I have a couple databases there, and based on the rather high I/O-wait
times on their cheapest ($0.10/hr) instances, I'm wildly guessing that
they're doing something reasonable for sync :-).  Their higher priced
instances supposedly have better I/O performance.

> behavior etc, and I'd not want to take it on faith that a cloud
> environment will meet those requirements.

That said, even apart from any SLA of fsync itself, I imagine there's
a concern that an entire hosted cloud might vanish for any number of
reasons.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Maximum transaction rate

2009-03-19 Thread Ron Mayer

Marco Colombo wrote:
> Yes, but we knew it already, didn't we? It's always been like
> that, with IDE disks and write-back cache enabled, fsync just
> waits for the disk reporting completion and disks lie about

I've looked hard, and I have yet to see a disk that lies.

ext3, OTOH seems to lie.

IDE drives happily report whether they support write barriers
or not, which you can see with the command:
%hdparm -I /dev/hdf | grep FLUSH_CACHE_EXT
I've tested about a dozen drives, and I've never seen one
claims to support flushing that doesn't.  And I haven't seen
one that doesn't support it that was made less than half a
decade ago.  IIRC, ATA-5 specs from 2000 made supporting
this mandatory.

Linux kernels since 2005 or so check for this feature.  It'll
happily tell you which of your devices don't support it.
  %dmesg | grep 'disabling barriers'
  JBD: barrier-based sync failed on md1 - disabling barriers
And for devices that do, it will happily send IDE FLUSH CACHE
commands to IDE drives that support the feature.   At the same
time Linux kernels started sending the very similar. SCSI
SYNCHRONIZE CACHE commands.

> Anyway, it's the block device job to control disk caches. A
> filesystem is just a client to the block device, it posts a
> flush request, what happens depends on the block device code.
> The FS doesn't talk to disks directly. And a write barrier is
> not a flush request, is a "please do not reorder" request.
> On fsync(), ext3 issues a flush request to the block device,
> that's all it's expected to do.

But AFAICT ext3 fsync() only tell the block device to
flush disk caches if the inode was changed.

Or, at least empirically if I modify a file and do
fsync(fd); on ext3 it does not wait until the disk
spun to where it's supposed to spin.   But if I put
a couple fchmod()'s right before the fsync() it does.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Are there performance advantages in storing bulky field in separate table?

2009-04-08 Thread Ron Mayer

Sam Mason wrote:
> On Wed, Apr 08, 2009 at 04:56:35PM +0100, Ian Mayo wrote:
>> One more thing:  hey, did you hear?  I just got some advice from Tom Lane!
> 
> Statistically speaking; he's the person most likely to answer you by

Even so, this might be the #1 advantage of Postgres over Oracle (cost
being #2).

Unless you're one of their ten biggest customers, I imagine it'd take
quite some time to similar support from the core team's counterparts
of the other big databases.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Are there performance advantages in storing bulky field in separate table?

2009-04-08 Thread Ron Mayer

Robert Treat wrote:
> 
> You can be sure that discussion of this topic in this forum will soon be 
> visited by religious zealots, but the short answer is "nulls are bad, mmkay". 
>  
> A slightly longer answer would be that, as a general rule, attributes of your 
> relations that only apply to 1% of the rows are better represented as a one 

To fulfill your prophecy of zealotry, I've got a number of tables
with columns that are mostly null that I can't think of that nice a
way of refactoring.  I'd love ideas to improve the design, though.

One example's an address table. Most addresses have a few fields
that are typically present (building number, city, state, etc).
Others, as described in various government's address standards,
are fields that are typically absent.  For example in US addressing
rules, the "Urbanization Name" line:
http://www.usps.com/ncsc/addressstds/addressformats.htm
MRS MARIA SUAREZ  Name
URB LAS GLADIOLAS Urbanization name
150 CALLE A   House no. and st. name
SAN JUAN PR 00926-3232City, state, and ZIP+4
Similarly sparse columns in my address tables are,
titles, division/department Names and mailstop codes.
(described here: http://pe.usps.gov/text/pub28/pub28c3_011.htm)

While I realize I could stick in some string (empty string, or
some other magic string like "urbanization name doesn't apply to
this address") into a table, it sure is convenient to put nulls
in those columns.

I'm quite curious what you'd suggest a well-designed address table
would look like without nulls.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full text search strategy for names

2009-04-16 Thread Ron Mayer

Rick Schumeyer wrote:
> I want to be able to search a list of articles for title words as well
> as author names I'm not sure the best strategy for the names.  The
> full text parser "parses" the names giving undesirable results. 
> 
> For example,
> 
> select to_tsvector('claude Jones');
> to_tsvector
> 
>  'jone':2 'claud':1
> 
> 
> Is there a way to tell the parser to index the words in a column without
> trying to determine word roots? 

> Or what is the best way to index names for efficient searching?

I've got a similar question; but would love a dictionary that
could give me an efficient index that considers that
Bill/William and Bob/Rob/Robert and Khadaffi/Qaddafi might
share the same roots.  Ideally it'd return exact matches
first, followed by the similar terms.

I kludged up some many step queries to try to do this; but
wonder if this would work better as a tsearch dictionary,
and wonder even more if I'm re-inventing something that's
already out there.

Anyone know of such a tsearch dictionary that is aware of
the roots of names?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Difference between array column type and separate table

2009-05-03 Thread Ron Mayer

Alban Hertroys wrote:
> On May 2, 2009, at 9:33 AM, Mike Christensen wrote:
> 
>> ...
>>  create table Threads ( ...  Tags int2[], ...);
>>
>> To me this seems cleaner, but I'm wondering about performance.  If I
>> had millions of threads, is a JOIN going to be faster?  ...
> 
> ...I don't think array values are indexable either. ...

Of course they are indexable if you construct the right index
http://www.postgresql.org/docs/current/static/intarray.html

For certain queries, the array will probably be *much* faster
than the extra tables.  For example a query like this:

  SELECT * FROM Threads WHERE Threads.tags @@ '1&(2|3)'::query_int;

that does a single indexscan will be far far faster using an
extra table, where the query would look something like:

   select * from Threads T where
 id in (select threadid from threadtags where id = 1)
 and
 id in (select threadid from threadtags where id = 2 or id = 3);

requiring 3 indexscans on a table with many more rows, and
some potentially rather expensive joins.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] INTERVAL data type and libpq - what format?

2009-05-20 Thread Ron Mayer

Sam Mason wrote:
> You get an error because " 123 11" isn't a valid literal of an
> (undecorated) INTERVAL type.

Hmm. should it be?
Skimming the spec makes me think it might be a valid day-time interval.

Quoting the spec:
 ::=
  [  ] {  |  }
...
 ::=

  | 
 ::=
   [   [  
  [   ] ] ]

I can send a patch if this interpretation is right...



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] A way to let Vacuum warn if FSM settings are low.

2005-02-23 Thread Ron Mayer

Short summary:

  I find this tiny (9-line) patch useful to help my clients know 
  when FSM settings may need updating.

Some of the more frequently asked questions here are in regards to FSM
settings.  One hint I've seen is to run "vacuum verbose;".  At the end 
of thousands of lines of INFO and DETAIL messages vacuum verbose has 2
separate lines with some numbers to compare ("total pages needed" and 
"FSM size...pages") that help indicate too low fsm settings.


I've gotten into the habit of always installing the following patch
(below) that automatically does this comparison for me, and if
max_fsm_pages is too small, it logs a warning as shown here:

 patched=# vacuum;
 WARNING:  max_fsm_pages(1601) is smaller than total pages needed(2832)
 VACUUM

I find this much nicer than the existing output (
 clean=# vacuum verbose;
 [. thousands of lines of INFO and DETAIL messages ]
 INFO:  free space map: 77 relations, 470 pages stored; 2832 total pages needed
 DETAIL:  Allocated FSM size: 100 relations + 1601 pages = 19 kB shared memory.
) for many reasons:
 * First, because it's a warning, lots of people will notice it before 
   their asking the FAQ again.
 * Second, because all the information is on a single line and actually
   contains the string "max_fsm_relations", it gives people a clue what 
   to do about it. (note that vacuum verbose uses similar phrases but
   from the number of questions here, it must not be obvious)
 * Third, I don't need the 'verbose' setting.
 * And most importantly, our clients let us know about WARNINGs, 
   but not about INFOs or DETAILs in their log page; so it gives 
   us a chance to respond before their system drags to a halt.

If a patch like this could get into the standard distro, that'd be
awesome - just let me know what additional work is needed (I didn't
look at docs or internationalization yet).  If not, I'd like to post 
it here to patches just in case anyone else will benefit from the 
same thing.


   ==
% diff -u postgresql-8.0.1/src/backend/storage/freespace/freespace.c 
postgresql-patched/src/backend/storage/freespace/freespace.c
--- postgresql-8.0.1/src/backend/storage/freespace/freespace.c2004-12-31 
14:00:54.0 -0800
+++ postgresql-patched/src/backend/storage/freespace/freespace.c2005-02-23 
14:58:50.638745744 -0800
@@ -704,6 +704,15 @@
 
 /* Convert stats to actual number of page slots needed */
 needed = (sumRequests + numRels) * CHUNKPAGES;
+
+if (needed > MaxFSMPages)
+ereport(WARNING,
+(errmsg("max_fsm_pages(%d) is smaller than total pages 
needed(%.0f)",
+ MaxFSMPages, needed)));
+if (numRels > MaxFSMRelations)
+ereport(WARNING,
+(errmsg("max_fsm_relations(%d) is smaller than the number of 
relations (%d)",
+ MaxFSMRelations, numRels)));
 
 ereport(elevel,
 (errmsg("free space map: %d relations, %d pages stored; %.0f total 
pages needed",
   ==


  Thoughts? 
  Ron


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] [PATCHES] A way to let Vacuum warn if FSM settings are low.

2005-02-24 Thread Ron Mayer

On Wed, 23 Feb 2005, Tom Lane wrote:
> Ron Mayer <[EMAIL PROTECTED]> writes:
> > +if (needed > MaxFSMPages)... ereport(WARNING,  ...
> 
> An unconditional WARNING seems a bit strong to me for a case that is not
> necessarily wrong. 

How about a HINT, not unlike the "checkpoints are occurring too 
frequently" one?

> Depending on the needs of the installation, this might be a perfectly 
> acceptable situation --- for example if you have lots of large 
> read-mostly tables. 

OTOH, for other installations, it's nice to have some quick 
way for our customers to know when to call us.  For a system 
we install somewhere and may not have access to, would this 
be a safe, overly-conservative test that could be put in place?
Unfortunately we really don't know exactly how fast their 
tables will be growing, and I'm hoping some sort of monitoring 
like this - even if it's too conservative - can catch things 
before they cause problems.

I basing the comparison on this older thread:
 http://archives.postgresql.org/pgsql-www/2004-11/msg00078.php
if better comparisons can be made, I'd be interested into 
looking into them if someone points me in the right direction.

> On the other side of the coin, the test could pass (ie no warning) in
> situations where in fact MaxFSMPages is too small ...

I was hoping that second message would have caught this - but I 
clearly didn't understand what comparison to use there either. :(
Would the fixed (with == instead of > ) test on MaxFSMRelations 
catch most cases where it gives a false negative?

> > +if (numRels > MaxFSMRelations) ...
> This part is just plain dead code, since it's not possible for numRels
> to exceed MaxFSMRelations.

oops.  Sorry.

> I think it might be useful to warn when numRels == MaxFSMRelations,
> since if you don't have even one spare fsmrel slot then you probably
> have too few (it's unlikely you got it on the nose).  But I don't know
> how to produce a warning about MaxFSMPages that's worth anything.

I'm not sure I understood the false-positive with many large 
read-mostly tables you mentioned above.  

Even in that case, how would I know that the limited FSM space I 
had was used on the tables that needed it?

The database I have matches that description - 90 GB of GIS data
that changes rarely (when cities close streets, etc) - and a few
tables of quickly changing data (recent crimes in the cities
being analyzed).   I want to make sure the quickly changing
tables don't get starved for fsm space from the large tables.
Would this comparison insure against that risk?  (or am I 
misunderstanding completely, and I'll drop the subject)

   Ron

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [GENERAL] [PATCHES] A way to let Vacuum warn if FSM settings are low.

2005-02-24 Thread Ron Mayer



Thanks everyone for the feedback on my patch.

  Objections I've heard (both online and in email) included:

 * WARNING is too strong for possibly OK behavior
 * It's similar to "checkpoints occuring too frequently...
   consider increasing...checkpoint_segments" which
   is a LOG not a WARNING.
 * The end user can't do anything anyway; so the LOG
   file would be a better place.
 * My comparison for numRels was broken.
 * If we're hiting the user to do something (change
   settings) I should make it put a HINT in the log file.

   Praise I've heard included:

 * Even if it's too conservative, some admins want to know.
 * Unlike the current VACUUM VERBOSE info, all info is on 
   one line, so automated log monitoring software can more
   easily catch it.
 * Unlike the current VACUUM VERBOSE info, this one points
   the user in the right direction.

Would the updated patch below address most of the concerns?

The output now looks like:
 LOG:  max_fsm_pages(1601) is smaller than the actual number of page slots 
needed(2832)
 HINT:  You may want to increase max_fsm_pages to be larger than 2832
and only goes in the log file (like the "checkpoints" hint).


I think Tom's outstanding comment that "Depending on the installation,
this might be a perfectly acceptable situation ...  I don't know how
toproduce a warning about MaxFSMPages that's worth anything" is the
only objection left unaddressed.  I guess my defense to that statement
would be that I think for some installations this does provide value,
so by making it a LOG instead of a WARNING are both needs met?

   Thanks,
   Ron


% diff -U 10 postgresql-8.0.1/src/backend/storage/freespace/freespace.c 
postgresql-patched/src/backend/storage/freespace/freespace.c
--- postgresql-8.0.1/src/backend/storage/freespace/freespace.c2004-12-31 
14:00:54.0 -0800
+++ postgresql-patched/src/backend/storage/freespace/freespace.c2005-02-24 
13:44:52.361669928 -0800
@@ -704,20 +704,32 @@
 
 /* Convert stats to actual number of page slots needed */
 needed = (sumRequests + numRels) * CHUNKPAGES;
 
 ereport(elevel,
 (errmsg("free space map: %d relations, %d pages stored; %.0f total 
pages needed",
 numRels, storedPages, needed),
  errdetail("Allocated FSM size: %d relations + %d pages = %.0f kB 
shared memory.",
MaxFSMRelations, MaxFSMPages,
(double) FreeSpaceShmemSize() / 1024.0)));
+
+if (needed > MaxFSMPages)
+ereport(LOG,
+(errmsg("max_fsm_pages(%d) is smaller than the actual number of 
page slots needed(%.0f)",
+ MaxFSMPages, needed),
+ errhint("You may want to increase max_fsm_pages to be larger than 
%.0f",needed)));
+if (numRels == MaxFSMRelations)
+ereport(LOG,
+(errmsg("max_fsm_relations(%d) is equal than the number of 
relations vacuum checked (%d)",
+ MaxFSMRelations, numRels),
+ errhint("You probably have more than %d relations. You should 
increase max_fsm_relations. Pages needed for max_fsm_pages may have been 
underestimated as well. ",numRels)));
+
 }
 
 /*
  * DumpFreeSpaceMap - dump contents of FSM into a disk file for later reload
  *
  * This is expected to be called during database shutdown, after updates to
  * the FSM have stopped.  We lock the FreeSpaceLock but that's purely pro
  * forma --- if anyone else is still accessing FSM, there's a problem.
  */
 void





---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [GENERAL] [PATCHES] A way to let Vacuum warn if FSM settings are low.

2005-02-24 Thread Ron Mayer


On Thu, 24 Feb 2005, Tom Lane wrote:
> I preferred Simon's idea of not trying to produce a warning for pages
> when we've detected relation overflow.

Sounds good.  I'll make that update.

Should the relation overflow be a WARNING or a LOG?  It sounds like
if you have that problem it's almost certainly a problem, right?

> Making it a LOG rather than WARNING does address the issue of being
> too much in-your-face for an uncertain condition, though.

Great.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [GENERAL] [PATCHES] A way to let Vacuum warn if FSM settings are low.

2005-02-24 Thread Ron Mayer

On Thu, 24 Feb 2005, Ron Mayer wrote:
> Should the relation overflow be a WARNING or a LOG?  It sounds like
> if you have that problem it's almost certainly a problem, right?

And while I'm at it... what's the convention for INFOs vs LOGs?
The "checkpoint...too frequent" seemed similar, and is a LOG.

And do people think the HINT's I added add value or just noise?

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [GENERAL] [PATCHES] A way to let Vacuum warn if FSM settings are low.

2005-02-24 Thread Ron Mayer


On Thu, 24 Feb 2005, Tom Lane wrote:
> I'd go for making them both LOG, I think.  More consistent.

Ok, here's another try :)  With a couple more questions...

  1. If I read Simon's email correctly, it implied that he wanted to see
 the "free space map" message for a VACUUM even when VERBOSE is turned off. 
 
 I could just tweak it in PrintFreeSpaceMapStastics() as shown here...
 but now elevel (which depended on VACUUM VERBOSE or not) is no longer
 needed by PrintFreeSpaceMapStastics.   
 1a. Is that desired to always show this line as an INFO instead of
 a DEBUG2 (which it currently is when VERBOSE is not selected)?
 1b. Should I tweak vacuum.c (feels cleaner) or just freespace.c (minimal
 changes).

  2. If I read Simon's email correctly, it implied that he wanted to see
 these new lines when you type VACUUM.  This would suggest making
 them INFOs. Making them INFOs would slightly contradict another 
 goal of wanting to see them in the LOG for automated log
 grepping scripts to find, since that would require turning on INFOs
 that I think commonly aren't logged.   Also making them LOGs is
 consistent with the checkpoint hint.   I suppose they could be
 made NOTICEs; but that isn't consistent with either.



diff -u postgresql-8.0.1/src/backend/storage/freespace/freespace.c 
postgresql-patched/src/backend/storage/freespace/freespace.c
--- postgresql-8.0.1/src/backend/storage/freespace/freespace.c2004-12-31 
14:00:54.0 -0800
+++ postgresql-patched/src/backend/storage/freespace/freespace.c2005-02-24 
14:54:36.619566040 -0800
@@ -705,12 +705,25 @@
 /* Convert stats to actual number of page slots needed */
 needed = (sumRequests + numRels) * CHUNKPAGES;
 
-ereport(elevel,
+ereport(INFO,
 (errmsg("free space map: %d relations, %d pages stored; %.0f total 
pages needed",
 numRels, storedPages, needed),
  errdetail("Allocated FSM size: %d relations + %d pages = %.0f kB 
shared memory.",
MaxFSMRelations, MaxFSMPages,
(double) FreeSpaceShmemSize() / 1024.0)));
+
+if (numRels == MaxFSMRelations)
+ereport(LOG,
+(errmsg("max_fsm_relations(%d) is equal than the number of 
relations vacuum checked (%d)",
+ MaxFSMRelations, numRels),
+ errhint("You probably have more than %d relations. You should 
increase max_fsm_relations. Pages needed for max_fsm_pages may have been 
underestimated. ",numRels)));
+else
+if (needed > MaxFSMPages)
+ereport(LOG,
+(errmsg("max_fsm_pages(%d) is smaller than the actual number of 
page slots needed(%.0f)",
+ MaxFSMPages, needed),
+ errhint("You may want to increase max_fsm_pages to be larger than 
%.0f",needed)));
+
 }
 

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [GENERAL] [PATCHES] A way to let Vacuum warn if FSM settings are low.

2005-02-25 Thread Ron Mayer


On Fri, 25 Feb 2005, Bruce Momjian wrote:
> Tom Lane wrote:
> > Ron Mayer <[EMAIL PROTECTED]> writes:
> > > Should the relation overflow be a WARNING or a LOG?  ...
> > I'd go for making them both LOG, I think.  More consistent.
>
> Can we also update this wording:
> 
> INFO:  free space map: 52 relations, 61 pages stored; 848 total pages needed
> DETAIL:  Allocated FSM size: 1000 relations + 2 pages = 182 kB shared 
> memory.
> 
> The "pages needed" is confusing.  In fact it is the total pages used or
> allocated.  I looked in the code and got confused.  It needs clarity. 


Any preference?   To me, "allocated" has some risk of sounding like
it refers to the total free space map (memory allocated for fsm)
instead of just the used ones."Allocated" is actually used for
that other meaning on the next line.  I guess it's confusing there
too, so that line should be changed as well.

How about if I go for "used" in that first line; and simply remove the
word "Allocated" in the DETAIL line.

So instead of:
>
> INFO:  free space map: 52 relations, 61 pages stored; 848 total pages needed
> DETAIL:  Allocated FSM size: 1000 relations + 2 pages = 182 kB shared 
> memory.
>
it'll say
>
> INFO:  free space map: 52 relations, 61 pages stored; 848 total pages used
> DETAIL:  FSM size: 1000 relations + 2 pages = 182 kB shared memory.
>


With those changes, the patch now looks like this...


==

% diff -u postgresql-8.0.1/src/backend/storage/freespace/freespace.c 
postgresql-patched/src/backend/storage/freespace/freespace.c
--- postgresql-8.0.1/src/backend/storage/freespace/freespace.c2004-12-31 
14:00:54.0 -0800
+++ postgresql-patched/src/backend/storage/freespace/freespace.c2005-02-25 
16:45:26.773792440 -0800
@@ -705,12 +705,25 @@
 /* Convert stats to actual number of page slots needed */
 needed = (sumRequests + numRels) * CHUNKPAGES;
 
-ereport(elevel,
-(errmsg("free space map: %d relations, %d pages stored; %.0f total 
pages needed",
+ereport(INFO,
+(errmsg("free space map: %d relations, %d pages stored; %.0f total 
pages used",
 numRels, storedPages, needed),
- errdetail("Allocated FSM size: %d relations + %d pages = %.0f kB 
shared memory.",
+ errdetail("FSM size: %d relations + %d pages = %.0f kB shared 
memory.",
MaxFSMRelations, MaxFSMPages,
(double) FreeSpaceShmemSize() / 1024.0)));
+
+if (numRels == MaxFSMRelations)
+ereport(LOG,
+(errmsg("max_fsm_relations(%d) is equal than the number of 
relations vacuum checked (%d)",
+ MaxFSMRelations, numRels),
+ errhint("You probably have more than %d relations. You should 
increase max_fsm_relations. Pages needed for max_fsm_pages may have been 
underestimated. ",numRels)));
+else
+if (needed > MaxFSMPages)
+ereport(LOG,
+(errmsg("max_fsm_pages(%d) is smaller than the actual number of 
page slots needed(%.0f)",
+ MaxFSMPages, needed),
+ errhint("You may want to increase max_fsm_pages to be larger than 
%.0f",needed)));
+
 }
 
 /*
==


Getting closer?


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[GENERAL] multicolumn GIST index question

2005-02-28 Thread Ron Mayer

Did anyone get multi-column GIST indexes working using both
the gist_btree and postgis modules?
It seems to work fine for me on small test cases (shown
at the bottom), but seems to crash my database for large
ones.   Any advice is welcome - including pointers to better
lists to ask questions like this.
Output showing a crash (on a large database) and a
successful run on a tiny one follow.
  Ron
fl=# -- sessionid is a text;  the_geom is a GEOMETRY
fl=# create index testidx2 on user_point_features using gist 
(sessionid,the_geom);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
fl=#
fl=#
fl=#
fl=# create table test3(a text, b text, c text);
CREATE TABLE
fl=# select addgeometrycolumn ('','test3','the_geom','-1','POINT',2);
addgeometrycolumn
--
 public.test3.the_geom SRID:-1 TYPE:POINT DIMS:2
 geometry_column fixed:0
(1 row)
fl=# create index idx_text_text on test3 using gist (a,b);
CREATE INDEX
fl=# create index idx_text_geom on test3 using gist (a,the_geom);
CREATE INDEX
fl=#


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [GENERAL] [PATCHES] A way to let Vacuum warn if FSM settings are low.

2005-02-28 Thread Ron Mayer

On Sun, 27 Feb 2005, Simon Riggs wrote:
> On Fri, 2005-02-25 at 16:48 -0800, Ron Mayer wrote:
> > Getting closer?
> For me, yes.  [...]
> The not-warnings seem a little wordy for me, but they happen when and
> how I would hope for. 
> 
> So, for me, it looks like a polish of final wording and commit.

Thanks for the feedback.  How about I replace the grammatically poor:

 LOG:  max_fsm_relations(%d) is equal than the number of relations vacuum 
checked (%d)",
 HINT:  You probably have more than %d relations. You should increase 
max_fsm_relations. Pages needed for 
max_fsm_pages may have been underestimated. 

with this:

 LOG:  max_fsm_relations(100) equals the number of relations checked
 HINT:  You have >= 100 relations. You should increase max_fsm_relations.


and replace this:

 LOG:  max_fsm_pages(%d) is smaller than the actual number of page slots 
needed(%.0f)",
 HINT:  You may want to increase max_fsm_pages to be larger than %.0f"

with the slightly smaller

 LOG:  the number of page slots needed (2832) exceeds max_fsm_pages (1601)
 HINT:  You may want to increase max_fsm_pages to a value over 2832.


These updated messages would fit on an 80-column display if the numbers 
aren't too big.   Here's 80 characters for a quick reference.
 
01234567890123456789012345678901234567890123456789012345678901234567890123456789
The "pages needed...underestimate" in the first message was no longer 
useful anyway; since it's no longer logging fsm_pages stuff when the
max_fsm_relations condition occurred anyway

  Ron

The patch now looks like:


% diff -u postgresql-8.0.1/src/backend/storage/freespace/freespace.c 
postgresql-patched/src/backend/storage/freespace/freespace.c
--- postgresql-8.0.1/src/backend/storage/freespace/freespace.c2004-12-31 
14:00:54.0 -0800
+++ postgresql-patched/src/backend/storage/freespace/freespace.c2005-02-27 
11:54:39.776546200 -0800
@@ -705,12 +705,25 @@
 /* Convert stats to actual number of page slots needed */
 needed = (sumRequests + numRels) * CHUNKPAGES;
 
-ereport(elevel,
-(errmsg("free space map: %d relations, %d pages stored; %.0f total 
pages needed",
+ereport(INFO,
+(errmsg("free space map: %d relations, %d pages stored; %.0f total 
pages used",
 numRels, storedPages, needed),
- errdetail("Allocated FSM size: %d relations + %d pages = %.0f kB 
shared memory.",
+ errdetail("FSM size: %d relations + %d pages = %.0f kB shared 
memory.",
MaxFSMRelations, MaxFSMPages,
(double) FreeSpaceShmemSize() / 1024.0)));
+
+if (numRels == MaxFSMRelations)
+ereport(LOG,
+(errmsg("max_fsm_relations(%d) equals the number of relations 
checked",
+ MaxFSMRelations),
+ errhint("You have >= %d relations. You should increase 
max_fsm_relations.",numRels)));
+else
+if (needed > MaxFSMPages)
+ereport(LOG,
+(errmsg("the number of page slots needed (%.0f) exceeds 
max_fsm_pages (%d)",
+ needed,MaxFSMPages),
+ errhint("You may want to increase max_fsm_pages to a value over 
%.0f.",needed)));
+
 }
 
 /*
% 


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

[GENERAL] vacuum_cost_delay & VACUUM holding locks on GIST indexes

2005-02-28 Thread Ron Mayer


When you VACUUM a table with postgis indexes (perhaps GIST indexes 
in general?) it seems a lock is held on the table.   Setting 
vacuum_cost_delay seems to make vacuum hold this lock much longer.

Is this true?

If so, I assume that's not desirable behavior, right?   It makes
autovacuum harder to use on tables that have these indexes.

Any clever workarounds?

Ron


fli-lin1 /home/pg> while (1)
while? echo " explain analyze SELECT * from lines2 WHERE the_geom && 
setSRID('BOX3D(-84.31043 30.44341,-84.2954 30.45372)'::BOX3D, -1 );" | psql fli 
fli | grep runtime
while? sleep 5
while? end
 Total runtime: 23.355 ms
 Total runtime: 32.276 ms  [ "vacuum verbose" starts ]
 Total runtime: 36.080 ms
 Total runtime: 28.373 ms
 Total runtime: 114679.281 ms  [ bad but not horrible]
 Total runtime: 30.823 ms
 [...]
 Total runtime: 22.867 ms  [ "set vacuum_cost_delay=20"]
 Total runtime: 22.808 ms
 Total runtime: 23.288 ms  [ "vacuum vebose" again ]
 Total runtime: 23.366 ms
 [ dozens of lines ]
 Total runtime: 23.337 ms
 Total runtime: 764133.163 ms   [ YIPES ]
 Total runtime: 23.722 ms



fli=# select * from pg_locks;
 relation | database | transaction |  pid  |   mode   | granted
--+--+-+---+--+-
 36677268 |17230 | | 29039 | AccessShareLock  | t
 36677268 |17230 | | 29039 | ShareUpdateExclusiveLock | t
 36677268 |17230 | | 29039 | AccessExclusiveLock  | t
 33620188 |17230 | | 29039 | ShareUpdateExclusiveLock | t
  |  |   66414 | 30758 | ExclusiveLock| t
 36677268 |17230 | | 30731 | AccessShareLock  | f
 33620188 |17230 | | 29039 | ShareUpdateExclusiveLock | t
 36677269 |17230 | | 29039 | ShareUpdateExclusiveLock | t
16839 |17230 | | 30758 | AccessShareLock  | t
  |  |   66412 | 30731 | ExclusiveLock| t
 33620188 |17230 | | 30731 | AccessShareLock  | t
  |  |   66372 | 29039 | ExclusiveLock| t
(12 rows)

fli=# 
fli=# select * from pg_class   where oid=36677268;
 relname  | relnamespace | reltype | relowner | relam | relfilenode | 
reltablespace | relpages | reltuples  | reltoastrelid | reltoastidxid | 
relhasindex | relisshared | relkind | relnatts | relchecks | reltriggers | 
relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | 
relhassubclass | relacl
--+--+-+--+---+-+---+--++---+---+-+-+-+--+---+-+--+--+-+++-++
 tmp_lines2__gist |  2758256 |   0 |  100 |   783 |36677268 |   
  0 |34623 | 6.1128e+06 | 0 | 0 | f 
  | f   | i   |1 | 0 |   0 |0 | 
   0 |   0 | f  | f  | f   | f  |
(1 row)

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [GENERAL] vacuum_cost_delay & VACUUM holding locks on GIST

2005-02-28 Thread Ron Mayer

On Mon, 28 Feb 2005, Tom Lane wrote:
>
> You could shorten the intervals for which the lock is held by reducing
> vacuum_mem, but this might be counterproductive overall.

Does this work?  

I just tried:

setting vacuum_mem=1024
setting vacuum_cost_delay=10

ran a while loop that repeatedly executes a simple 
select statement that usually takes 0.03 seconds

and it still gave me a single extremely slow (7 minutes
long instead of 25 milliseconds) query that lasted until 
after the ...GIST... line was shown on vacuum verbose output.   

It feels like even with the minimal vacuum_mem it spent a very
long time (430 seconds) wihtout releasing the lock on the GIST 
index for my pretty large (relpages=94371 pages) table.


Or do I have something else broken?  Shown below is the output
of a while loop of a simple query using this index; and the
\d table output.

Ron

=
== Loop of small select()s 
== with vacuum_mem=1024 and vacuum_cost_delay=10
=
%while (1)
while? echo " explain analyze SELECT * from lines2 WHERE the_geom && 
setSRID('BOX3D(-84.31043 30.44341,-84.2954 30.45372)'::BOX3D, -1 );" | psql fli 
fli | grep runtime
while? sleep 5
while? end
 Total runtime: 24.375 ms   /* set vacuum_mem=1024 */
 Total runtime: 24.303 ms
 Total runtime: 25.370 ms   /* vacuum verbose */
 Total runtime: 27.332 ms 
 Total runtime: 26.628 ms 
 Total runtime: 26.001 ms 
 [many more like this]
 Total runtime: 27.437 ms
 Total runtime: 24.679 ms 
 Total runtime: 26.628 ms 
 Total runtime: 431265.868 ms
 Total runtime: 24.419 ms  /* INFO:  index "tmp_lines2__gist" ... */
 Total runtime: 24.375 ms
 Total runtime: 24.303 ms
 Total runtime: 24.294 ms
 Total runtime: 24.235 ms



=
== \d for the table.
=
fli=# \d lines2;
  Table "tmp.lines2"
  Column   | Type | Modifiers
---+--+---
 tigerfile | integer  |
 tlid  | integer  |
 cfcc  | character varying(3) |
 name  | text |
 the_geom  | geometry |
Indexes:
"lines2__tlid" btree (tlid)
"tmp_lines2__gist2" gist (the_geom)
Check constraints:
"enforce_geotype_the_geom" CHECK (geometrytype(the_geom) = 
'LINESTRING'::text OR the_geom IS NULL)
"enforce_dims_the_geom" CHECK (ndims(the_geom) = 2)
"enforce_srid_the_geom" CHECK (srid(the_geom) = -1)

fli=# 

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [GENERAL] [PATCHES] A way to let Vacuum warn if FSM settings are low. [final?]

2005-03-12 Thread Ron Mayer

Well, I was really hoping something would end up in the log file.
The situation is that our clients - sometimes not that computer
savvy - go perhaps a year without us being involved (unless
the log monitoring scripts show something abnormal; or if the
system breaks).
The primary motivation for tweaking this file in the first place
was so that the log file would catch the situation where their
database outgrows the FSM settings before it causes a problem.
What about at least sending the output to a log file
if VERBOSE or some GUC variable is set?
   Ron
Bruce Momjian wrote:
I have applied your patch with minor modifications.  Applied version
attached.
I think the "pages" message:

INFO:  free space map: 44 relations, 28 pages stored; 704 total pages 
used
DETAIL:  FSM size: 1000 relations + 2 pages = 182 kB shared memory.
should remain DEBUG2 for non-VERBOSE, and INFO for VERBOSE.  The
information is pretty complex and probably of little interest to a
typical vacuum user.  In fact, the new messages make the information
even less important because problems are now flagged.
I adjusted your output levels for the new messages.  I realize the
"checkpoint warning" is a LOG message, but it has to be because there is
no active session when a checkpoint is being run.  In the case of
VACUUM, there is an active session so I think the messages should be
sent to that session.  Sending them to the logs and not to the user
seems unusual because they are the ones who asked for the VACUUM.  I
realize they might not be able to change the server settings.
 
These new messages:
	
	NOTICE:  max_fsm_relations(1000) equals the number of relations checked
	HINT:  You have >= 44 relations.
	Consider increasing the configuration parameter "max_fsm_relations".
	NOTICE:  the number of page slots needed (704) exceeds max_fsm_pages (2)
	HINT:  Consider increasing the configuration parameter "max_fsm_relations"
	to a value over 704.
	VACUUM

should be NOTICE.  NOTICE is for unusual events that are not warnings,
and that fits these cases.  If the administrator wants those in the
logs, he can set log_min_messages to NOTICE.
I also adjusted your output strings to more closely match our checkpoint
warning message.
Another idea would be to send the output to both the client and the logs
by default.
-------
Ron Mayer wrote:
On Sun, 27 Feb 2005, Simon Riggs wrote:
On Fri, 2005-02-25 at 16:48 -0800, Ron Mayer wrote:
Getting closer?
For me, yes.  [...]
The not-warnings seem a little wordy for me, but they happen when and
how I would hope for. 

So, for me, it looks like a polish of final wording and commit.
Thanks for the feedback.  How about I replace the grammatically poor:
LOG:  max_fsm_relations(%d) is equal than the number of relations vacuum checked (%d)",
HINT:  You probably have more than %d relations. You should increase max_fsm_relations. Pages needed for 
max_fsm_pages may have been underestimated. 

with this:
LOG:  max_fsm_relations(100) equals the number of relations checked
HINT:  You have >= 100 relations. You should increase max_fsm_relations.
and replace this:
LOG:  max_fsm_pages(%d) is smaller than the actual number of page slots 
needed(%.0f)",
HINT:  You may want to increase max_fsm_pages to be larger than %.0f"
with the slightly smaller
LOG:  the number of page slots needed (2832) exceeds max_fsm_pages (1601)
HINT:  You may want to increase max_fsm_pages to a value over 2832.
These updated messages would fit on an 80-column display if the numbers 
aren't too big.   Here's 80 characters for a quick reference.
01234567890123456789012345678901234567890123456789012345678901234567890123456789
The "pages needed...underestimate" in the first message was no longer 
useful anyway; since it's no longer logging fsm_pages stuff when the
max_fsm_relations condition occurred anyway

 Ron
The patch now looks like:

% diff -u postgresql-8.0.1/src/backend/storage/freespace/freespace.c 
postgresql-patched/src/backend/storage/freespace/freespace.c
--- postgresql-8.0.1/src/backend/storage/freespace/freespace.c2004-12-31 
14:00:54.0 -0800
+++ postgresql-patched/src/backend/storage/freespace/freespace.c2005-02-27 
11:54:39.776546200 -0800
@@ -705,12 +705,25 @@
/* Convert stats to actual number of page slots needed */
needed = (sumRequests + numRels) * CHUNKPAGES;
-ereport(elevel,
-(errmsg("free space map: %d relations, %d pages stored; %.0f total pages needed",
+ereport(INFO,
+(errmsg("free space map: %d relations, %d pages stored; %.0f total pages used",
numRels, storedPages, needed),
- errdetail("Allocated FSM size: %d relations + %d page

Re: [GENERAL] [PATCHES] A way to let Vacuum warn if FSM settings are low. [final?]

2005-03-12 Thread Ron Mayer

Bruce Momjian wrote:
You didn't like server_min_messages = 'notify'?
I merely don't have a feeling for how much additional stuff
verbose would be putting in the log files.
If it's a good practice for production systems to be logging
NOTIFY's I'm happy with the change.
My reasoning why I thought the log file was more useful was
that only an admin with access to the log files could really
do anything about the message anyway.
Also since the message happing occasionally is probably OK,
yet if it happens a lot it's more likely worth looking
into - I think the historical record of when it happened
is more interesting than a one-time occurrence which is
all you seen in the active session.
   Ron
PS: I'm fine either way; and perhaps it's a good idea
for me to be logging NOTIFY's anyway -- I just thought
I'd explain my reasoning above.  I'm sure you guys know
a lot more than me what best practices would be.
---(end of broadcast)---
TIP 8: explain analyze is your friend

[GENERAL] contrib module intagg crashing the backend

2005-03-22 Thread Ron Mayer



If one of the contrib modules (int_array_aggregate in contrib/intagg) is 
crashing my backend, where's a good place to report a bug or get hints 
where to look further?

It seems to work fine on small arrays, but crashes on large ones.
The second query would have put about 500 rows in the aggregate.

  Thanks,
  Ron


% psql fli fli 
Welcome to psql 8.0.1, the PostgreSQL interactive terminal.
 [...]

fli=# select str_name,int_array_aggregate(tlid) from tlid_streetnames 
where str_name='civic center' and tigerfile=6001 group by str_name;
   str_name   |int_array_aggregate
--+---
 civic center | {125030363,125030026,125030039,125050707}
(1 row)

fli=# select str_name,int_array_aggregate(tlid) from tlid_streetnames 
where str_name='main' group by str_name;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> 


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [GENERAL] Pgpool questions

2005-04-23 Thread Ron Mayer

Tatsuo Ishii wrote:
Is there anyway to load balance selects to more than 2 servers? ex.  1 master with 2 slaves?  Run multiple instances?
It's on my TODO but will not be a trivial change.
Could I set up 2 pairs of pgpool-balanced servers, and use a
third pgpool to load balance across the two pools?
Or is that just silly.
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [GENERAL] Performance difference between ANY and IN, also array syntax

2005-04-26 Thread Ron Mayer

Tom Lane wrote:
Bart Grantham <[EMAIL PROTECTED]> writes:
# SELECT * FROM connections  WHERE connectee_node_id = ANY (ARRAY[28543,28542] 
);
[ performance sucks ]
Yeah :-(.  The = ANY (array) notation is new in the last release or so,
and is completely without optimization of any kind.  Turn it into IN
if you can.
You could also use the module in contrib/intagg to treat your array
as a table so you can JOIN against it like this:
  select * from connections
   join int_array_enum('{28543,28542}'::int[]) as iae(x)
 on (connectee_node_id = x);
This seems to work very well for me.  I show a real-life example
with a table about the same size as Bart's (about 200K rows) below.
It has the advantage of being able to do all the different join types
whenever it guesses it would be appropriate to do so.   I've never
seen "in" clauses use some of those tricks (though it might be possible
that in clauses can do that as well).  It seems to always think that
int_array_enum will return a 1000 row table (I wonder if there should
eventually be some way to tweak that), so when joining against very
large tables it happily does index scans - and when joining against
smaller tables it'll do merge joins or whatever.


=
=
=
== an example going against a larger table,
== the int_array_enum trick reasonably chooses nested loops and index scans
=
=
=
fli=# explain
 select *
   from lines
  where tlid = ANY ('{125060436,125060437}'::int[]);
   QUERY PLAN

 Seq Scan on lines  (cost=0.00..26636.03 rows=254229 width=102)
   Filter: (tlid = ANY ('{125060436,125060437}'::integer[]))
(2 rows)
fli=# explain
select *
  from lines
  join int_array_enum('{125060436,125060437}'::int[]) as iae(x)
on (tlid=x);
fli-# fli-# fli-# fli-#   QUERY PLAN
--
 Nested Loop  (cost=0.00..4228.99 rows=1000 width=106)
   ->  Function Scan on int_array_enum iae  (cost=0.00..12.50 rows=1000 width=4)
   ->  Index Scan using rtgr_lines__tlid on lines  (cost=0.00..4.20 rows=1 
width=102)
 Index Cond: (lines.tlid = "outer".x)
(4 rows)
=
=
=
== an example going against a smaller table
== the int_array_enum trick reasonably chooses a merge join.
=
=
=
fli=# explain analyze
select *
  from rt6
  join int_array_enum('{125060436,125060437}'::int[]) as iae(x)
on (tlid=x);
fli-# fli-# fli-# fli-# 
QUERY PLAN\
--\

 Merge Join  (cost=62.33..1226.41 rows=2848 width=82) (actual 
time=36.669..36.669 rows=0 loops=1)
   Merge Cond: ("outer".tlid = "inner".x)
   ->  Index Scan using rt6__tlid on rt6  (cost=0.00..1066.65 rows=21881 
width=78) (actual time=0.020..21.982 \
rows=10544 loops=1)
   ->  Sort  (cost=62.33..64.83 rows=1000 width=4) (actual time=0.042..0.046 
rows=2 loops=1)
 Sort Key: iae.x
 ->  Function Scan on int_array_enum iae  (cost=0.00..12.50 rows=1000 
width=4) (actual time=0.020..0.0\
23 rows=2 loops=1)
 Total runtime: 36.770 ms
(7 rows)
==
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [GENERAL] PRIMARY KEY on a group of columns imply that each column is

2005-04-28 Thread Ron Mayer

Marco Colombo wrote:
That is, given a variable A, you can always write a boolean expression
that evaluates True or False to test if A is _equal_ to None/undef/NULL
(in C):
And of course you can do this in SQL as well.
('a', NULL) is neither the same _nor different_ from ('a', 2).
Uh, I'm not sure what you mean by "the same"; but
certainly ('a',NULL) is distinct from ('a',2).
The result of comparison is NULL, no matter how you're testing it:
Unless you're testing with the IS DISTINCT FROM operator.

fli=# select ('a',NULL) IS DISTINCT FROM ('a',2);
 ?column?
--
 t
(1 row)
marco=# select ('a', 2) = ('a', NULL);
 ?column?
--
(1 row)
marco=# select ('a', 2) <> ('a', NULL);
 ?column?
--
(1 row)

fli=# select ('a',NULL) IS DISTINCT FROM ('a',NULL);
 ?column?
--
 f
(1 row)

That's why NULLs are not allowed in primary keys...
But in general I agree with most of the rest of what you said.
---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [GENERAL] Array manipulation/syntax question

2005-05-10 Thread Ron Mayer

Bart Grantham wrote:
Hello again.  I had a problem a few weeks ago with using IN ( some_array 
) having really rough performance.  Special thanks to Ron Mayer for the 
suggestion of using int_array_enum(some_array) to join against.  I had 
to upgrade to PG8 but that technique works really well.  Now I have a 
question about array manipulation.  Specifically, how can I build an 
array from the result of a query without resorting to array_append()?
Well, the same contrib module with int_array_enum() also has
aggregate called int_array_aggregate that turns a column into an array.
-- my_array is an INT[]
_my_array := select some_column from some_table;
so this:
 SELECT int_array_aggregate(some_column) from some_table;
might be what you want.
... in plpgsql?
Well... the function from the contrib module was in C, so I guess
it doesn't strictly meet what you asked, but it works very well
for me. ;-)
   Ron Mayer
PS: the stuff in /contrib/intarray may be useful too, for
indexing, merging, and many other tricks on the integer arrays.
---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [GENERAL] Postgres in government

2005-05-19 Thread Ron Mayer

Mark Steckel wrote:
I have also scrounged the Internet looking for examples of 
Postgres being used in government, preferably  in 24x7 capacities. 
I find googling for vendors who are known to use PostgreSQL and
searching for people's resumes is a good way to find descriptions
of Government projects along with contacts for references.
Here's a couple that look interesting.
US Navy, Northrop Grumman
  Use PostgreSQL in the Navy Enterprise Portal and the Fleet
  Numerical Meteorology and Oceanography Center(FNMOC) Portal:
  http://kennethbowen.com/kbresume.html
  " Develop J2EE application to store user profiles for the Navy
Enterprise Portal and the Fleet Numerical Meteorology and Oceanography
Center(FNMOC) Portal using JBoss application server and
PostgreSQL database.
  "
DOD, USGS, USDA, Army Corp of Engineers, Navy, through a company called 
Sanz:
  http://postgis.refractions.net/pipermail/postgis-users/2005-March/007399.html
  Sanz manages
"tens of terrabyte datasets of raster and vector
 data for the DOD, USGS, USDA, Army Corp of
 Engineers, Navy, etc."
  using postgresql and postgis
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

Re: [GENERAL] Converting MySQL tinyint to PostgreSQL

2005-07-13 Thread Ron Mayer


Martijn van Oosterhout wrote:


Well, you get another issue, alignment. If you squeeze your string
down, the next field, if it is an int or string, will get padded to a
multiple of 4 negating most of the gains. Like in C structures, there
is padding to optimise access.


Anecdotally I hear at least as many people say that their database
is more I/O bound than CPU bound; and it seems that adding bytes
for alignment is a way of reducing CPU for more disk I/O.

I guess unaligned access so expensive that it makes up for the extra i/o?

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] Converting MySQL tinyint to PostgreSQL

2005-07-16 Thread Ron Mayer

Jim C. Nasby wrote:

On Thu, Jul 14, 2005 at 11:29:23PM +0200, Martijn van Oosterhout wrote:

On Thu, Jul 14, 2005 at 11:30:36AM -0500, Jim C. Nasby wrote:

On Wed, Jul 13, 2005 at 07:52:04PM -0400, Bruce Momjian wrote:

This is a good point.  We have always stored data on disk that exactly
matches its layout in memory.  We could change that, but no one has
shown it would be a win.

[...]

My thought was to convert as pages were read and written. That should
minimize the code impact.

If that were practical, even more radical I/O saving tricks might be
possible beyond removing alignment bytes - like some compression algorithm.

Jim C. Nasby wrote:
> Or maybe as an alternative, would it be possible to determine how much
> space in a given relation was being wasted due to padding? That could be
> used to figure out how much IO could be saved on different transactions.

Well, I do notice that if I gzip my larger tables's data files they
tend to compress between 80-95% so it seems there's a fair amount of
redundancy in at least some tables.

Has anyone tried running postgresql on a filesystem that does compression?

It seems that would be an easy way to guess at the ultimate
potential I/O savings of separating memory layout from disk layout.

   Ron

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] How to create unique constraint on NULL columns

2005-07-16 Thread Ron Mayer


Andrus wrote:

Then redesign this as a many to many relation. ...


This means adding separate row for each department into permission table.


Not really.

You can of course store an Array of department IDs in that same table.
That would probably cause the minimum impact on your queries too.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [GENERAL] GUID for postgreSQL

2005-07-31 Thread Ron Mayer


Dann Corbit wrote:

There is a "privacy hole" from using the MAC address.  (Read it in the
WIKI article someone else posted).

Probably, it would be better to use a one way hash of the MAC address.



The chances of MAC addresses colliding (through some low-end network
card vendor's sloppy manufacturing process; or through the not
uncommon ability to edit your mac address) is surely much higher than
the chance of big random numbers colliding.   Sure, you won't hurt
anything if you add the MAC address to your entropy sources when
generating a GUID; but thinking a MAC address is more unique than
the bits you'd get from any less human-error-prone source is
almost certainly wrong.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] table clustering brings joy

2005-08-17 Thread Ron Mayer


Greg Stark wrote:


clustering...
That will only help if you're often retrieving moderately large result sets by
one particular index. If you normally only retrieve one record at a time or
from lots of different indexes then it probably won't really make much
difference.


It'll also help for columns whose values are related in some way.

For example, clustering a table of addresses based on "zip code"
will help lookups based on city or county or state (presumably
because all the disk pages for a given city will be grouped
together within the disk pages for the zip codes within the city).

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Finding nearest numeric value

2005-08-17 Thread Ron Mayer


Richard Huxton wrote:


While the other answers all do their job, and in one go too, I'd be 
surprised if you found anything faster than:


SELECT myval FROM mytable WHERE myval > 1234 ORDER BY myval LIMIT 1 


Really?   Aren't most things with ORDER BY O(n*log(n))?

Or is the optimizer smart enough to find an index on myval
and stop after the first one (assuming the index returned
things sequentially.

If not, it seems this could do things in O(n) time:
  select min(abs(value - CONSTANT)) from tablename
followed by
  select * from tablename where abs(value - CONSTANT) = [result]
though I'm sure someone could roll that up into a single statement.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] ctid access is slow

2005-08-23 Thread Ron Mayer


Tom Lane wrote:


It wouldn't be hard that I can see (just build hash and btree opclasses
for tid), but I'm pretty unclear on why bother.  There's no use-case for
cross-table joins involving ctid, since you couldn't usefully store a
ctid referencing another table.  The example Ilja showed was quite
artificial and should not convince anyone to expend effort on this.
Perhaps there are more convincing examples, but let's see one.


Would it be useful for extremely static (read-only) data?

The largest tables in my database are read-only for many months
at a time (geospatial data which the vendor updates annually).
I've occasionally wondered if storing ctids in tables that link
to these tables rather than the traditional id column would help.

(I never really bothered, though; since normal index scans were
fast enough; and any future performance optimization will probably
cache this data in memcached instead.)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

1 2 >

1 - 100 of 149 matches

Mail list logo