[GENERAL] very large tables

2009-05-26 Thread Ramiro Diaz Trepat
Thank you all very much for your thorough replies. To be honest, many of the
things you mention I had not heard of before.
It seems that the clustering by index must be the way to go.  I have to
consult now if most queries will benefit from it.
I do not control directly the installation parameters of the database, I do
not have root access to that server but I may be able to find another one
where I can controll these parameteres.
The configuratoin parameters I should look to increase are the ones related
to shared memory then?
I know it is some sort of worst combination, to have such a large database
with an inexperienced dba as myself :)



r.





On Mon, May 25, 2009 at 12:17 PM, Alban Hertroys <
dal...@solfertje.student.utwente.nl> wrote:

> On May 25, 2009, at 10:58 AM, Ramiro Diaz Trepat wrote:
>
>  The table with the atmosphere pixels, currently has about 140MM records,
>> and the one the values about 1000MM records.   They should both grow to
>> about twice this size.
>>
>
> Did you tune postgres to use the available resources? By default it comes
> with rather modest settings so it will run on low-spec hardware without
> trouble. For a dataset this size modest hardware is obviously out of the
> question, and so the default config will likely not be suitable.
>
>
>> A simple query with no joins like this:
>>
>> select trajectory from atmospherepoint where moment='1979-05-02 11:45:00'
>> and height >= 12000 and height <= 14000 and longitude >= 180 and longitude
>> <= 190 and latitude >= 0 and latitude <= 10;
>>
>
> I think the BETWEEN operator would make that query a bit easier on the
> eyes, like so:
>
> select trajectory from atmospherepoint where moment='1979-05-02 11:45:00'
> and height between 12000 and 14000 and longitude between 180 and 190 and
> latitude between 0 and 10;
>
>  is taking almost 4 minutes in a decent multi core server.  Also, the
>> moment equality test should yield no more than 2MM rows, so I thought that
>> should make things a lot faster.
>>
>> The explain returns the suspicious "Seq Scan" up front:
>>
>> Seq Scan on atmospherepoint  (cost=0.00..5461788.08 rows=134 width=8)
>>   Filter: ((moment = '1979-05-02 11:45:00'::timestamp without time zone)
>> AND (height >= 12000::double precision) AND (height <= 14000::double
>> precision) AND (longitude >= 180::double precision) AND (longitude <=
>> 190::double precision) AND (latitude >= 0::double precision) AND (latitude
>> <= 10::double precision))
>>
>
> That query is expecting 134 rows, how accurate is that? An explain analyse
> will show you that (and will take 4 minutes in this case).
>
> I think you'll need lots of memory if you want to keep those indexes in
> memory. If I do a rough byte-count of the datatypes involved times the
> number of records I think just the index data (w/o metadata) takes up 40B *
> 140M rows = 5.6GB. Scanning an index on disk is likely to be slow, which may
> well be one of the reasons PG decides on a sequential scan.
>
> What is the distribution of the data you're querying for? If it's all over
> the place then the planner made a pretty good guess a sequential scan is
> most optimal; random disk I/O isn't very optimal, sequential I/O is usually
> much faster.
>
> Before solving that you'll want to figure out whether this is a typical
> query, or that distributing the data differently will hurt other queries. If
> it's a typical query, then clustering your data on an index and/or
> partitioning will help. If it's not, it still will help this type of query
> but it may hurt other types of queries. You'll want to optimise into the
> right direction.
>
> As other people said already, partitioning is likely to help here. Your
> data seems very suitable for partitioning too. I think I'd divide the
> coordinates in a number of latitude/longitude slices (a grid seems suitable
> to me, but I don't deal with atmospheric data). With 32 slices you'll have
> <500k records per slice on average, whether that's sufficiently small you'll
> have to test.
>
> Alternatively, if your data is clustered (on disk) by say
> longitude/latitude it may already help to define partial indexes on
> longitude latitude ranges, for example:
>
> CREATE INDEX atm_section1_idx (longitude, latitude, height) WHERE longitude
> BETWEEN 180 AND 190 AND latitude BETWEEN 0 AND 10;
>
> This is very similar to partitioning your data, but it doesn't require
> splitting the data up across multiple partitions (tables). This could be
> faster if you would have queries spanning multiple table-partitions, but if
> your data isn't clustered appropriately finding it on disk will require more
> random I/O than a partitioned layout.
> Choices choices... The only way to find out what works best is to test, I'm
> afraid.
>
>  but I have created indices for every column in the table and I have also
>> runned ANALIZE and VACUUM:
>>
>>   Table "public.atmospherepoint2"
>>   Column   |Type | Modifiers
>> +-

[GENERAL] Excel and postgresql functions

2009-05-26 Thread Ivan Sergio Borgonovo
I've built some functions just for encapsulation reasons and
avoiding to pass the same parameter over and over.

I'd like to access the result from Excel but it seems (and I'm not
pretty sure it is the definitive answer) excel can suck data just
coming from views and tables.

- can someone confirm Excel/Access 2003 can't return result sets
  coming from functions?
- if Excel/Access can't return results from a function is there a way
  to masquerade the function (accepting parameters) behind a
  table/view.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Excel and postgresql functions

2009-05-26 Thread Ivan Sergio Borgonovo
On Tue, 26 May 2009 11:41:50 +0200
Ivan Sergio Borgonovo  wrote:

> I've built some functions just for encapsulation reasons and
> avoiding to pass the same parameter over and over.
> 
> I'd like to access the result from Excel but it seems (and I'm not
> pretty sure it is the definitive answer) excel can suck data just
> coming from views and tables.
> 
> - can someone confirm Excel/Access 2003 can't return result sets
>   coming from functions?
> - if Excel/Access can't return results from a function is there a
> way to masquerade the function (accepting parameters) behind a
>   table/view.

For all the people forced to fight with a software that has to be
tricked and not instructed...

The fault is Excel.

- Data -> Import External Data -> New Database Query
- Choose the odbc connection you created for Postgresql
- Don't add any query.
- Edit directly "SQL"
- save an easily "greppable" query (eg. select 'ermenegildo';)
- Ignore complaint.
- Close Microsoft Query.
- Say OK to "Import data" (just select a good "top left corner for
  your table)

- Tools -> Macro -> Visual Basic Script Editor
- search your query and change it with any valid SQL, add ? if you
  need parameters taken from cells
- save
- you'll be asked which cell contain the parameter (I didn't try to
  use more than one parameter)

Now you can have an arbitrary query returned in an Excel sheet.

So yes... PostgreSQL can happily work with Excel.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How should I deal with disconnects during insert?

2009-05-26 Thread Dimitri Fontaine
Hi,

Sergey Samokhin  writes:
> Problem I'm talking about in this letter is related to how we deal
> with extremely big amounts of data to be inserted into DB. Recently I
> switched to using buffers.

You might appreciate this blog entry:
  
http://www.depesz.com/index.php/2007/07/05/how-to-insert-data-to-database-as-fast-as-possible/

Regards,
-- 
dim

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] question on serial key

2009-05-26 Thread Jasen Betts
On 2009-05-22, Brandon Metcalf  wrote:
> g == gryz...@gmail.com writes:
>
>  g> you should use it, whenever you need db to keep its own key internally.
>  g> Advantage of sequence is also the fact, that you can have the sequence
>  g> value used on different columns/tables .
>
>  g> My rule of thumb is , in that case: as long as it is a short type (not
>  g> of toastable, or/and variable length), and as long as it won't change,
>  g> and is unique - I can use it. Otherwise, I use sequence to connect
>  g> rows internally for database.
>  g> First rule, is because of index access, and the way btree works.
>  g> Second is, because update of value will update other rows too - and
>  g> HOT won't help you here, so that's not efficient. And also, forcing it
>  g> to be unique is harder than.
>
>  g> Hth.
>
> That does help.  So, in my example of a table consisting of rows for
> each periodic table element, the atomic number would suffice as a
> unique key since, well, it's unique and not going to change.  Right?

Well, until some fool* wants to insert a row for deuterium into your
table, and finds that spot taken by hydrogen. 

If you can guarantee that you chosen natural key is sufficient and i
s not going to give trouble in the case of marriages, isomerism, twin
birth, isotopes, centegenarians, or some other condition that makes a
mockery of your chosen key then go for it.

*assuming you want it to be fool-proof.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] quoting values magic

2009-05-26 Thread Jasen Betts
On 2009-05-22, Brandon Metcalf  wrote:
> Assume I have an UPDATE statement that looks like
>
>   UPDATE foo
> SET
>   pattern = '$pattern',
>   shape   = '$shape',
>   length  = $length,
>   comment = '$comment'
> WHERE foo_id = $foo_id
>
> and length is defined as NUMERIC.  Is there any kind of magic that
> would allow me to use the SQL above as is even if $length is not
> defined?

no, but you can get the same effect in a different way.

> In other words, I'd like to avoid having to modify the SQL
> to include or not include "length = $length" based on whether or not
> $length is defined as it's acceptable for it to be NULL in foo.

> I can't say "length = '$length'" as '' is not valid input for NUMERIC.
>
> Hope that makes sense?

option 1: $length = "length"

If your language of choice (it appears to be similar to shell, PHP, or Perl)
allows you to store the string value "length" in your $length variable
then the existing values of length will be retained in the update.

option 2: case when '$length' = '' ...

you can use case like this:

   UPDATE foo
 SET
   pattern = '$pattern',
   shape   = '$shape',
   length  = case when '$length'='' then length else '$length' end,
   comment = '$comment'
 WHERE foo_id = $foo_id

here you can substitute any value you choose for the empty string,
0 or NULL may (or may not) be more apropriate.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Re: Re: Re: Can not decompress a compressed string under plpy!

2009-05-26 Thread Jasen Betts
On 2009-05-21, Timmy  wrote:
> Yes, most of your guess is correct.
> I'm using postgresql 8.3.x and ms windows 2000.
> The compressed string is saved to the table in binary 
> format using the psycopg. I had set the table field to bytea data type.
> I want to use the plpythonu to decompress the stored 
> string. 
> Althout I imported the zlib module, it still failed.
> I can successfully get back the decompressed string if 
> I use the psycopg either inside or outside the plpythonu.
> But if without psycopg, plpythonu itself can not decompress the string.
> I can use the psycopg inside the plpythonu but I must need to write security 
> information (including user 
> name,password,server ip,port number,...) to the connection string inside 
> plpythonu too. 
> This is a problem.

try running stripslashes on the returned value.
bytea values are returned double-slashed.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] quoting values magic

2009-05-26 Thread Brandon Metcalf
j == ja...@xnet.co.nz writes:

 j> On 2009-05-22, Brandon Metcalf  wrote:
 j> > Assume I have an UPDATE statement that looks like
 j> >
 j> >   UPDATE foo
 j> > SET
 j> >   pattern = '$pattern',
 j> >   shape   = '$shape',
 j> >   length  = $length,
 j> >   comment = '$comment'
 j> > WHERE foo_id = $foo_id
 j> >
 j> > and length is defined as NUMERIC.  Is there any kind of magic that
 j> > would allow me to use the SQL above as is even if $length is not
 j> > defined?

 j> no, but you can get the same effect in a different way.

 j> > In other words, I'd like to avoid having to modify the SQL
 j> > to include or not include "length = $length" based on whether or not
 j> > $length is defined as it's acceptable for it to be NULL in foo.

 j> > I can't say "length = '$length'" as '' is not valid input for NUMERIC.
 j> >
 j> > Hope that makes sense?

 j> option 1: $length = "length"

 j> If your language of choice (it appears to be similar to shell, PHP, or Perl)
 j> allows you to store the string value "length" in your $length variable
 j> then the existing values of length will be retained in the update.

 j> option 2: case when '$length' = '' ...

 j> you can use case like this:

 j>UPDATE foo
 j>  SET
 j>pattern = '$pattern',
 j>shape   = '$shape',
 j>length  = case when '$length'='' then length else '$length' end,
 j>comment = '$comment'
 j>  WHERE foo_id = $foo_id

 j> here you can substitute any value you choose for the empty string,
 j> 0 or NULL may (or may not) be more apropriate.


The issue here is that these reduce back to my original problem.  For
example, if I use a CASE statement and I fall through to the ELSE,
then the SQL is attempting to insert a "''" in a NUMERIC field which
is not valid.  That is, it's trying to do

  UPDATE foo SET length='' WHERE foo_id=$foo_id

This fails.  It's exactly this problem I'm trying to avoid by some
slick use of SQL.  I can always generate the SQL on the fly based on
the value of $length, but I'd like to avoid this if possible.

Maybe I'm misunderstanding your suggestion.


-- 
Brandon

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Need beginning and ending date value for a particular week in the year

2009-05-26 Thread Keaton Adams
PG 8.1.17

For a given week number (2009w22) I need to calculate the beginning and ending 
date that makes up that particular week in the year.  I want to use the 
beginning/ending date as part of a CHECK constraint on an inherited table, with 
each child table based on a week of the year.

This is a function in the system someone wrote to get the current week of the 
year, with our week starting on a Monday, not Sunday:

CREATE OR REPLACE FUNCTION get_weeknum (TIMESTAMP WITH TIME ZONE) RETURNS 
VARCHAR AS $$
DECLARE
tstamp ALIAS FOR $1;
dow INTEGER;
BEGIN
-- to_char(2005-01-01, "w"IW), for example, returns 2005w53 but we need 
2004w53 so
-- we return "w"IW for the thursday of the week of the specified date
SELECT INTO dow to_char(tstamp, 'D');
-- "D" returns sun=1 - sat=7 but we need dow to be mon=1 - sun=7
IF dow = 1 THEN
dow = 7;
ELSE
dow = dow - 1;
END IF;
RETURN to_char(tstamp - interval '1 Day' * (dow - 4), '"w"IW');
END;
$$ LANGUAGE 'plpgsql';

So for the calculated week value (i.e. 2009w22) I need to be able to calculate 
the first and last day of the week (05/25/2009 and 05/31/2009).  Is there a 
clean / fairly easy way to do this?  I can think of doing some string 
comparisons and walking through date values to figure it out but was wondering 
if there was a rather simplistic way to do this that I am overlooking.



Re: [GENERAL] quoting values magic

2009-05-26 Thread Alban Hertroys

On May 26, 2009, at 6:37 PM, Brandon Metcalf wrote:

j> option 2: case when '$length' = '' ...

j> you can use case like this:

j>UPDATE foo
j>  SET
j>pattern = '$pattern',
j>shape   = '$shape',
j>length  = case when '$length'='' then length else  
'$length' end,

j>comment = '$comment'
j>  WHERE foo_id = $foo_id

j> here you can substitute any value you choose for the empty string,
j> 0 or NULL may (or may not) be more apropriate.


The issue here is that these reduce back to my original problem.  For
example, if I use a CASE statement and I fall through to the ELSE,
then the SQL is attempting to insert a "''" in a NUMERIC field which
is not valid.  That is, it's trying to do


No it doesn't, read that statement again ;)


If $length = 'foo' it reads (leaving out the extra fields):

UPDATE foo
   SET length = CASE WHEN 'foo'='' THEN length ELSE 'foo' END
 WHERE foo_id = $foo_id;

Which evaluates to:

UPDATE foo SET length = 'foo' WHERE foo_id = $foo_id;



Whereas if $length = '' it reads:

UPDATE foo
   SET length = CASE WHEN ''='' THEN length ELSE '' END
 WHERE foo_id = $foo_id

Which evaluates to:

UPDATE foo SET length = length WHERE foo_id = $foo_id


Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4a1c2f7010091048315763!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] quoting values magic

2009-05-26 Thread Brandon Metcalf
d == dal...@solfertje.student.utwente.nl writes:

 d> On May 26, 2009, at 6:37 PM, Brandon Metcalf wrote:
 d> > j> option 2: case when '$length' = '' ...
 d> >
 d> > j> you can use case like this:
 d> >
 d> > j>UPDATE foo
 d> > j>  SET
 d> > j>pattern = '$pattern',
 d> > j>shape   = '$shape',
 d> > j>length  = case when '$length'='' then length else
 d> > '$length' end,
 d> > j>comment = '$comment'
 d> > j>  WHERE foo_id = $foo_id
 d> >
 d> > j> here you can substitute any value you choose for the empty string,
 d> > j> 0 or NULL may (or may not) be more apropriate.
 d> >
 d> >
 d> > The issue here is that these reduce back to my original problem.  For
 d> > example, if I use a CASE statement and I fall through to the ELSE,
 d> > then the SQL is attempting to insert a "''" in a NUMERIC field which
 d> > is not valid.  That is, it's trying to do

 d> No it doesn't, read that statement again ;)


Oops.  Indeed, you are correct.

-- 
Brandon

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Need beginning and ending date value for a particular week in the year

2009-05-26 Thread John R Pierce


So for the calculated week value (i.e. 2009w22) I need to be able to 
calculate the first and last day of the week (05/25/2009 and 
05/31/2009).  Is there a clean / fairly easy way to do this?  I can 
think of doing some string comparisons and walking through date values 
to figure it out but was wondering if there was a rather simplistic 
way to do this that I am overlooking.



well, a few things that come to mind.figure out how many days 
are in the first partial week prior to week 1 of this year, that would 
be something like 7 - day_of_week(-01-01), then take  '1 
day'::interval * (week-1)*7 + that offset, and add that to -01-01  
... add '6 day'::interval to get the last day of the week...




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Need beginning and ending date value for a particular week in the year

2009-05-26 Thread Alban Hertroys

On May 26, 2009, at 8:03 PM, Keaton Adams wrote:


PG 8.1.17

For a given week number (2009w22) I need to calculate the beginning  
and ending date that makes up that particular week in the year.  I  
want to use the beginning/ending date as part of a CHECK constraint  
on an inherited table, with each child table based on a week of the  
year.



What I ended up doing was generating a table with:
weekint
yearint
start   date
end date

The beauty is that you can constrain your data with foreign keys to  
that table. I'm not sure that would work for partitioning though (I  
think that's where you're heading?), now that I think of it.  
Nevertheless, you could copy that data into your table and put a check  
constraint on that instead.


I recall using generate_series() and EXTRACT(week FROM ...) to  
populate the table in one pass for several years, but I don't have the  
exact incantation at hand now. I'd have to experiment a bit to get  
that back again, I don't have access to it anymore.


You may like to know that Postgres 8.3 has native support for ISO8601  
week calculations (http://www.postgresql.org/docs/current/static/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT 
). Maybe you should upgrade.


Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4a1c33e310093700910733!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] quoting values magic

2009-05-26 Thread Tom Lane
Brandon Metcalf  writes:
> d == dal...@solfertje.student.utwente.nl writes:
>  d> On May 26, 2009, at 6:37 PM, Brandon Metcalf wrote:
>  d> > The issue here is that these reduce back to my original problem.  For
>  d> > example, if I use a CASE statement and I fall through to the ELSE,
>  d> > then the SQL is attempting to insert a "''" in a NUMERIC field which
>  d> > is not valid.  That is, it's trying to do

>  d> No it doesn't, read that statement again ;)

> Oops.  Indeed, you are correct.

I think there is a problem though.  If you have

case when '$length'='' then length else '$length' end

then what the parser is going to see is a CASE expression with a
variable (known to be NUMERIC) in one arm and an unknown-type literal
constant in the other arm.  So it's going to decide that the literal
must be NUMERIC too, and that type coercion will fail if the literal
is really just ''.

Some experimentation suggests that you might get away with

case when '$length'='' then length else '$length'::text::numeric end

so that the text-to-numeric conversion is delayed to runtime.  However
this is a bit fragile (it's dependent on some undocumented details of
the constant-expression-folding behavior) and it also requires
hardwiring knowledge that length is indeed numeric into your SQL
command.

On the whole I'd suggest going with NULL, not empty string, as your
representation of a missing update value if at all possible.  Then
the previously-suggested COALESCE solution will work, and you aren't
relying on any shaky assumptions about when and how the parser will
try to enforce validity of the datatype value.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Code tables, conditional foreign keys?

2009-05-26 Thread Benjamin Smith
"A deep unwavering belief is a sure sign that you're missing something."
-- Unknown

I had no intention of sparking an ideological discussion. 

I read Joe's article reference previously - a simple case for using a 
normalized database. I would pretty much agree with his assessment from 
beginning to end. However, he really doesn't address my scenario at all. 
I'm not trying to mash my database together into a single table, I'm 
trying to deal with the fact that we have hundreds of tables with nearly 
identical syntax, but further, deal with the concept of "code tables". 

See our product has to work in many venues, and each venue has their own 
set of code-table data that they'd like to support. Worse, they often use 
similar values for the different things, so the "natural key" is just not 
natural. Sometimes I've seen venues "re-using" the old code table value 
from previous years to mean new things in current/future years. Yes, this 
is a bad, bad, bad idea but it was still there and it's still my job to 
deal with it. 

Surrogate keys are used to make sure that 15 to mean "BS College Degree" 
in venue A aren't confused with 15 to mean "No High School Education" in 
another venue. They cover a similar value, EG: applicant's educational 
level. Some values don't translate at all, (EG: differing representations 
of vocational arts) so using our own code table set and then translating 
doesn't work consistently, either. 

So we have multiple, distinct sets of data to be used within a single 
field. Either that, or we create massive data tables with every possible 
different set of otherwise similar data, each of which has a foreign key 
to a table with a slightly different name, which is, far and away, even 
uglier. (EG: applicants.ca_edlevel, applicants.or_edlevel 
applicants.nv_edlevel, applicants.southca_edlevel...) 

educational level is one example, there are hundreds that we have to deal 
with! 

So back to the first question: is there a way to have a conditional 
foreign key? 

On Saturday 23 May 2009 17:22:36 Lew wrote:
> Conrad Lender wrote:
> > I didn't intend any disrespect to Joe Celko. I have read a number of his
> > articles, which tend to be well written and informative. Last year, when
> > I posted to comp.databases asking for advice on whether to refactor that
> > table, he wrote "You will have to throw it all out and start over with a
> > relational design", "Throw away the idiot who did the EAV. This is not a
> > good design -- in fact, it is not a design at all", and "This is basic
> > stuff!!" Then he copied the same EAV example that was linked earlier by
> > Rodrigo, claiming that "someone like me" had suggested it. With all the
> > respect I have for Mr. Celko, that was hardly helpful, as that example
> > and the situation I had described were quite different. It also did not
> > encourage me to follow his advice and start from scratch (and fire my
> > boss, who was the mentioned "idiot").
>
> If we fired every boss who actually is an idiot there would be about half
> the number of bosses.
>
> All kidding aside, why is the boss specifying a database architecture? 
> That is not the boss's job.
>
> > I understand the problems that can arise from bad design choices, and I
> > know that Celko is vehemently opposed to anything that resembles EAV,
>
> For good reasons.
>
> > but I felt that in our case "throwing it all away" would be excessive.
>
> Perhaps not.  I had a situation some years ago where a supervisor would not
> let me normalize a database and consequently the project nearly failed.
> Fortunately, the company assigned a new team lead/project manager who did
> the normalization or it would have been a disaster.  Trying to make a bad
> approach work is often, if not always, more expensive than replacing it
> with a good approach.
>
> > We had safeguards to ensure referential integrity, and keeping the
> > values in the same table allowed us to let users manage them all with
> > the same form. So I guess it's like Stefan Keller said in a different
> > thread today: "Know when to break the rules."
>
> Managing all the values in the same form is not intrinsically connected to
> whether one stores the values in an EAV layout.
>
> Telling oneself that one should know when to break the rules is not the
> same as knowing when to break the rules.  They are the rules for good
> reason.
>
> All I'm saying is that EAV is a very problematic approach.  I've been on
> projects that tried to use it, and while that didn't make me an expert on
> the matter by any means, it gave me some cause to trust Mr. Celko's opinion
> on the matter.
>
> --
> Lew
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clea

Re: [GENERAL] Need beginning and ending date value for a particular week in the year

2009-05-26 Thread Keaton Adams
This looks great and is a much easier solution to the problem than what I had 
planned.

Thanks!

Keaton


mydb=# select ( date('2009-01-01') + ('1 day'::interval * (21-1)*7)) - ('1 
day'::interval * (select 7 - extract(dow from timestamp '2009-01-01')));
?column?
-
 2009-05-18 00:00:00
(1 row)

mydb=# select ( date('2009-01-01') + ('1 day'::interval * (21-1)*7)) - ('1 
day'::interval * (select 7 - extract(dow from timestamp '2009-01-01'))) + '6 
days'::interval;
  ?column?
-
 2009-05-24 00:00:00
(1 row)


mydb=# select ( date('2009-01-01') + ('1 day'::interval * (22-1)*7)) - ('1 
day'::interval * (select 7 - extract(dow from timestamp '2009-01-01')));
?column?
-
 2009-05-25 00:00:00
(1 row)

mydb=# select ( date('2009-01-01') + ('1 day'::interval * (22-1)*7)) - ('1 
day'::interval * (select 7 - extract(dow from timestamp '2009-01-01'))) + '6 
days'::interval;
  ?column?
-
 2009-05-31 00:00:00
(1 row)


mydb=# select ( date('2009-01-01') + ('1 day'::interval * (1-1)*7)) - ('1 
day'::interval * (select 7 - extract(dow from timestamp '2009-01-01')));
?column?
-
 2008-12-29 00:00:00
(1 row)

mydb=# select ( date('2009-01-01') + ('1 day'::interval * (1-1)*7)) - ('1 
day'::interval * (select 7 - extract(dow from timestamp '2009-01-01'))) + '6 
days'::interval;
?column?
-
 2009-01-04 00:00:00
(1 row)


On 5/26/09 12:18 PM, "John R Pierce"  wrote:



> So for the calculated week value (i.e. 2009w22) I need to be able to
> calculate the first and last day of the week (05/25/2009 and
> 05/31/2009).  Is there a clean / fairly easy way to do this?  I can
> think of doing some string comparisons and walking through date values
> to figure it out but was wondering if there was a rather simplistic
> way to do this that I am overlooking.


well, a few things that come to mind.figure out how many days
are in the first partial week prior to week 1 of this year, that would
be something like 7 - day_of_week(-01-01), then take  '1
day'::interval * (week-1)*7 + that offset, and add that to -01-01
... add '6 day'::interval to get the last day of the week...





Re: [GENERAL] Code tables, conditional foreign keys?

2009-05-26 Thread Conrad Lender
On 26/05/09 20:48, Benjamin Smith wrote:
> "A deep unwavering belief is a sure sign that you're missing
> something." -- Unknown
> 
> I had no intention of sparking an ideological discussion.

I know, my apologies for going off-topic. I just had a deja-vu when I
saw Celko's article about EAV disasters mentioned again in a very
similar context.

> So back to the first question: is there a way to have a
> conditional foreign key?

I can only suggest what we've done in this situation. We had a table
setup similar to yours (tables like "customer" referencing many small
sets like customer type or education level, with the exact same
structure). All of the small sets were combined in one table (which is
what reminded people of EAV design). Using your original example ...

create table codetables
(
id serial primary key,
name varchar unique not null
);
create table codevalues
(
id serial primary key,
codetables_id integer not null references codetables(id),
value varchar not null,
unique(codetables_id, value)
);
create table customers
(
customer_types_id integer not null references codevalues(id),
customer_taxcode_id integer references codevalues(id),
)

... you need to make sure that customer_types_id references the correct
codetable set within codevalues. To do this, we added CHECK constraints
in our tables:

CREATE TABLE customer (
...
customer_type_id  INTNOT NULL,

-- this is the standard FK to codevalues
CONSTRAINT fk_customer_type_id
FOREIGN KEY (customer_type_id)
REFERENCES codevalues (id),

-- this makes sure that the correct set is referenced
CONSTRAINT check_customer_type
CHECK (belongs_to_codetable('customer_type', customer_type_id))
);

CREATE FUNCTION belongs_to_codetable (VARCHAR(255), INT)
RETURNS BOOLEAN
AS '
SELECT EXISTS (
SELECT 1
  FROM codetables ct
  JOIN codevalues cv
ON cv.codetables_id = ct.id
   AND ct.name = $1
   AND cv.id = $2
)
' LANGUAGE 'SQL';

We used different names, so this is untested, but in principle it should
do what you require.

Whether this is a good design or not... I'm still not sure. Joe Celko
would grill me for doing something like this.


  - Conrad

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] quoting values magic

2009-05-26 Thread Brandon Metcalf
t == t...@sss.pgh.pa.us writes:

 t> Brandon Metcalf  writes:
 t> > d == dal...@solfertje.student.utwente.nl writes:
 t> >  d> On May 26, 2009, at 6:37 PM, Brandon Metcalf wrote:
 t> >  d> > The issue here is that these reduce back to my original problem.  
For
 t> >  d> > example, if I use a CASE statement and I fall through to the ELSE,
 t> >  d> > then the SQL is attempting to insert a "''" in a NUMERIC field which
 t> >  d> > is not valid.  That is, it's trying to do

 t> >  d> No it doesn't, read that statement again ;)

 t> > Oops.  Indeed, you are correct.

 t> I think there is a problem though.  If you have

 t> case when '$length'='' then length else '$length' end

 t> then what the parser is going to see is a CASE expression with a
 t> variable (known to be NUMERIC) in one arm and an unknown-type literal
 t> constant in the other arm.  So it's going to decide that the literal
 t> must be NUMERIC too, and that type coercion will fail if the literal
 t> is really just ''.

 t> Some experimentation suggests that you might get away with

 t> case when '$length'='' then length else '$length'::text::numeric end


I think this is what I meant to say :)  If $length contains a number,
then the resulting statement will be $length = '5.8', for example, and
this will fail for type NUMERIC.


-- 
Brandon

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Need beginning and ending date value for a particular week in the year

2009-05-26 Thread Alvaro Herrera
Alban Hertroys wrote:

> I recall using generate_series() and EXTRACT(week FROM ...) to populate 
> the table in one pass for several years, but I don't have the exact 
> incantation at hand now. I'd have to experiment a bit to get that back 
> again, I don't have access to it anymore.

There's something similar in the Wiki:

http://wiki.postgresql.org/wiki/Date_and_Time_dimensions

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] How to speed up the first-time-searching in pgsql?

2009-05-26 Thread zxo102 ouyang
Hi all,
I have a table which has more than 10millions records in pgsql which is
running on window 2003. During night, nobody search the database.
In morning, when people start to the application, it will take more than 30
seconds to get database back. After several times of same searching, the
results can be returned in 8 seconds.
Is this related to some kinds of process priority setting in window
2003? If so, how do I set the pgsql processes in  highest priority?

Thanks in advance.

ouyang


Re: [GENERAL] How to speed up the first-time-searching in pgsql?

2009-05-26 Thread John R Pierce

zxo102 ouyang wrote:

Hi all,
I have a table which has more than 10millions records in pgsql 
which is running on window 2003. During night, nobody search the 
database. In morning, when people start to the application, it will 
take more than 30 seconds to get database back. After several times of 
same searching, the results can be returned in 8 seconds.
Is this related to some kinds of process priority setting in 
window 2003? If so, how do I set the pgsql processes in  highest priority?


are there other applications running on this same server?   my first 
guess is, other programs are doing disk IO when the postgres database is 
idle, and pushing the postgres stuff out of the in-memory cache.   
Windows is pretty aggressive about reclaiming idle memory, and there's 
not much you can do about it.


maybe schedule something to run in the AM shortly before the regular 
database users show up which will do queries that force a full table 
scan on each table, this will cause all the imporant data to swap back in. 


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] How to initiate a new log file?

2009-05-26 Thread ray
While debugging applications interacting with pg, the log file (under
pg_log) gets quite long.  The log file typically has a name such as
postgresql-2009-05-26_00.log; sometimes an new file will be
generated on the same day and the portion of the name that is all
zeroes will change to a larger number for each change during that day.

I would like to know what triggers the name change and how can I force
a change.

Thanks,
Ray

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to initiate a new log file?

2009-05-26 Thread John R Pierce

ray wrote:

While debugging applications interacting with pg, the log file (under
pg_log) gets quite long.  The log file typically has a name such as
postgresql-2009-05-26_00.log; sometimes an new file will be
generated on the same day and the portion of the name that is all
zeroes will change to a larger number for each change during that day.

I would like to know what triggers the name change and how can I force
a change.
  


by default, a new log is created at midnight (hence the _00) and 
when the server is restarted (which would have a timestamp related to 
when it was restarted)


you can force a new logfile with...

   SELECT |pg_rotate_logfile|();



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to speed up the first-time-searching in pgsql?

2009-05-26 Thread Scott Marlowe
On Tue, May 26, 2009 at 5:57 PM, zxo102 ouyang  wrote:
> Hi all,
>     I have a table which has more than 10millions records in pgsql which is
> running on window 2003. During night, nobody search the database.
> In morning, when people start to the application, it will take more than 30
> seconds to get database back. After several times of same searching, the
> results can be returned in 8 seconds.
>     Is this related to some kinds of process priority setting in window
> 2003? If so, how do I set the pgsql processes in  highest priority?

There are two types of "look how we've optimized our OS and now your
database server runs like crap" common scenarios.  The most common is
that the one where the OS has simply stopped caching your database
files because other things are happening.  Not much you can do about
that one.  The other is the VM in your OS slowly swapping out all of
pgsql's shared_buffers because they appear idle, making even more room
to cache files on the machine for processes that are not pgsql.  Which
is why the other poster has asked what other things this server does.

If the OS is busy swapping out idle chunks of memory for more cache,
you can stop it in linux by adjusting the vm.swappiness setting.  No
clue how to do that in windows, but googling on swappiness and windows
might help.

It's a good idea to put a database on its own server for these
reasons.  Also, in the morning, have a cron job crank up that does
"select * from mybigtable" for each big table to load it into cache.

This is possibly made worse if you've lowered your random_page_cost to
near 1, and have effective_cache_size cranked up.  Those settings are
likely right for your setup, but first thing in the morning they're
wrong.  Actual random page cost really is 10 or more, and the
effective cache size means nothing because the kernel cache is full of
stuff that's NOT pgsql files.  In which case the tendency towards
index access and not seq scan is really gonna cost you.  Hence the
need for the select * from bigtable queries to prime the pump.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to speed up the first-time-searching in pgsql?

2009-05-26 Thread Greg Smith

On Tue, 26 May 2009, Scott Marlowe wrote:

Also, in the morning, have a cron job crank up that does "select * from 
mybigtable" for each big table to load it into cache.


Just to clarify:  on 8.3 and later versions, doing this doesn't do what 
some people expect.  Sequential scans like that will continuously re-use a 
256KB section of the PostgreSQL shared_buffers space, so this won't cause 
all of that to get paged back in if the problem is related to it being 
swapped out.  It will pass everything through the OS buffer cache though 
and prime it usefully, which might be all that's actually needed.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to speed up the first-time-searching in pgsql?

2009-05-26 Thread Tom Lane
Greg Smith  writes:
> On Tue, 26 May 2009, Scott Marlowe wrote:
>> Also, in the morning, have a cron job crank up that does "select * from 
>> mybigtable" for each big table to load it into cache.

> Just to clarify:  on 8.3 and later versions, doing this doesn't do what 
> some people expect.  Sequential scans like that will continuously re-use a 
> 256KB section of the PostgreSQL shared_buffers space, so this won't cause 
> all of that to get paged back in if the problem is related to it being 
> swapped out.  It will pass everything through the OS buffer cache though 
> and prime it usefully, which might be all that's actually needed.

Bearing in mind that this is a Windows server ... I seem to recall that
the conventional wisdom is still to keep shared_buffers relatively small
on Windows.  So priming the OS cache is exactly what it's about.
(Keeping that down should also help avoid the other scenario Scott was
worried about, where shared memory itself gets paged out.)

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to speed up the first-time-searching in pgsql?

2009-05-26 Thread Scott Marlowe
On Tue, May 26, 2009 at 7:43 PM, Tom Lane  wrote:
> Greg Smith  writes:
>> On Tue, 26 May 2009, Scott Marlowe wrote:
>>> Also, in the morning, have a cron job crank up that does "select * from
>>> mybigtable" for each big table to load it into cache.
>
>> Just to clarify:  on 8.3 and later versions, doing this doesn't do what
>> some people expect.  Sequential scans like that will continuously re-use a
>> 256KB section of the PostgreSQL shared_buffers space, so this won't cause
>> all of that to get paged back in if the problem is related to it being
>> swapped out.  It will pass everything through the OS buffer cache though
>> and prime it usefully, which might be all that's actually needed.
>
> Bearing in mind that this is a Windows server ... I seem to recall that
> the conventional wisdom is still to keep shared_buffers relatively small
> on Windows.  So priming the OS cache is exactly what it's about.
> (Keeping that down should also help avoid the other scenario Scott was
> worried about, where shared memory itself gets paged out.)

Yeah, I thought it was pretty obvious I was talking OS cache up there.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Regular expression and array

2009-05-26 Thread Nick
I wont go into details about why im using this field as an array but
how would I select all the rows that have the first name 'Tom' out of
the 'names' field?

CREATE TABLE test (
id integer,
names character varying[]
);
INSERT INTO test VALUES (1, '{"''Josh Berkus''","''Peter
Eisentraut''","''Marc Fournier''"}');
INSERT INTO test VALUES (2, '{"''Tom Lane''","''Bruce
Momjian''","''Dave Page''"}');
INSERT INTO test VALUES (3, '{"''Jan Wieck''","''Oleg
Bartunov''","''Joe Conway''"}');

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general