Re: [GENERAL] Postgres mail list traffic over time

2008-11-23 Thread Craig Ringer
Gregory Stark wrote:
> Tom Lane <[EMAIL PROTECTED]> writes:
> 
>> Bruce Momjian <[EMAIL PROTECTED]> writes:
>>> Tom Lane wrote:
 So, to a first approximation, the PG list traffic has been constant
 since 2000.  Not the result I expected.
>>> I also was confused by its flatness.  I am finding the email traffic
>>> almost impossible to continue tracking, so something different is
>>> happening, but it seems it is not volume-related.
>> Yes, my perception also is that it's getting harder and harder to keep
>> up with the list traffic; so something is happening that a simple
>> volume count doesn't capture.
> 
> I've noticed recently that the mailing list traffic seems very "bursty". We
> have days with hundreds of messages on lots of different in-depth topics and
> other days with hardly any messages at all. I wonder if it's hard to follow
> because we've been picking up more simultaneous threads instead of all being
> on one thread together before moving on to the next one.
> 
> Another idea, I wonder if the project has gone more international and
> therefore has more traffic at odd hours of the day for everyone. It would also
> mean more long-lived threads with large latencies between messages and 
> replies.

I wouldn't be at all surprised if that were the case. Alas, it's not
possible to analyze usefully because so many companies use .com
addresses instead of addresses under a cctld, and because so many people
use webmail services like gmail that provide no geographical information
in the domain.

Certainly the variety of languages seen in error messages, the variation
in English language skills, etc would tend to suggest a pretty strong
user base outside the US/Uk/Au .

--
Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [Q]updating multiple rows with Different values

2008-11-23 Thread Andreas Kretschmer
V S P <[EMAIL PROTECTED]> schrieb:

> Hello,
> searched documentation, FAQ and mailing list archives
> (mailing list archive search is volumous :-) )
> 
> but could not find an answer:
> 
> I would like to be able to update
> several rows to different values at the same time
> 
> In oracle this used to be called Array update or 
> 'collect' update or 'bulk' update -- but those
> keywords did not bring anything for Postgresql.

test=# create table foo (id int, val int);
CREATE TABLE
Zeit: 0,837 ms
test=*# insert into foo values (1,1);
INSERT 0 1
Zeit: 0,434 ms
test=*# insert into foo values (2,2);
INSERT 0 1
Zeit: 0,298 ms
test=*# update foo set val = case when id=1 then 10 when id=2 then 20
end;
UPDATE 2
Zeit: 0,424 ms
test=*# select * from foo;
 id | val
+-
  1 |  10
  2 |  20
(2 Zeilen)


Is this okay for you?



Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.  (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.  N 51.05082°, E 13.56889°

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [Q]updating multiple rows with Different values

2008-11-23 Thread Gerhard Heift
On Sat, Nov 22, 2008 at 10:04:48PM -0500, V S P wrote:
> Hello,
> searched documentation, FAQ and mailing list archives
> (mailing list archive search is volumous :-) )
> 
> but could not find an answer:
> 
> I would like to be able to update
> several rows to different values at the same time
> 
> In oracle this used to be called Array update or 
> 'collect' update or 'bulk' update -- but those
> keywords did not bring anything for Postgresql.
> 
> for example tbl_1 has two columns id and col1
> 
> 
> update tbl_1  set
>col1=3  where id=25,
>col1=5  where id=26

Something like this?

UPDATE tbl_1 SET col1 = t.col1 FROM (VALUES
(25, 3)
(26, 5)
) AS t(id, col1)
WHERE tbl_1.id = t.id;

> I am using PHP PDO (and hoping that if there is a mechanism
> within postgresql to do that PDO will support it as well).
> 
> Thank you in advance,
> VSP

Regards,
  Gerhard


signature.asc
Description: Digital signature


Re: [GENERAL] date stamp on update?

2008-11-23 Thread hubert depesz lubaczewski
On Sat, Nov 22, 2008 at 05:34:26PM -0500, blackwater dev wrote:
> Is there a datatype in postgres that will automatically update the date when
> the row is updated?  I know I can do a timestamp and set the default to
> now() but once the row is inserted, and then edited, I want the column
> updated without editing my application code or adding a trigger.  Is this
> possible with Postgres?

http://www.depesz.com/index.php/2008/05/08/mysqls-timestamp-in-postgresql/

depesz

-- 
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: [EMAIL PROTECTED] / aim:depeszhdl / skype:depesz_hdl / gg:6749007

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] delete commands fails silently to delete primary key

2008-11-23 Thread Andrus

There have been a number of index-corruption bugs fixed since 8.1.4 ...

In particular, if it's possible that any of these clients abort before
committing these insertions, the vacuum race condition bug fixed in
8.1.10 is a pretty likely candidate for your problem.


I changed second statement to

INSERT INTO session ('MYCOMPNAME',ipaddress,logintime,loggeduser)
  SELECT 'MYCOMPNAME', 
inet_client_addr()::CHAR(14),current_timestamp::CHAR(28),CURRENT_USER
WHERE NOT EXISTS (SELECT 1 FROM session WHERE 
workplace='MYCOMPNAME')


where MYCOMPNAME is logging-in computer name.

Will this fix the isse or is it better to wait 100 ms  and re-try insert?

Andrus. 



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres mail list traffic over time

2008-11-23 Thread Gregory Stark

Craig Ringer <[EMAIL PROTECTED]> writes:

> Gregory Stark wrote:
>> Another idea, I wonder if the project has gone more international and
>> therefore has more traffic at odd hours of the day for everyone. It would 
>> also
>> mean more long-lived threads with large latencies between messages and 
>> replies.
>
> I wouldn't be at all surprised if that were the case. Alas, it's not
> possible to analyze usefully because so many companies use .com
> addresses instead of addresses under a cctld, and because so many people
> use webmail services like gmail that provide no geographical information
> in the domain.

I would be curious to see the average lifespan of threads over time.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Using Postgres to store high volume streams of sensor readings

2008-11-23 Thread Stephen Frost
* Ciprian Dorin Craciun ([EMAIL PROTECTED]) wrote:
> > Even better might be partitioning on the timestamp.  IF all access is
> > in a certain timestamp range it's usually a big win, especially
> > because he can move to a new table every hour / day / week or whatever
> > and merge the old one into a big "old data" table.
> 
> Yes, If i would speed the inserts tremendously... I've tested it
> and the insert speed is somewhere at 200k->100k.
> 
> But unfortunately the query speed is not good at all because most
> queries are for a specific client (and sensor) in a given time
> range...

Have you set up your partitions correctly (eg, with appropriate CHECK
constraints and with constraint_exclusion turned on)?  Also, you'd want
to keep your indexes on the individual partitions, of course..  That
should improve query time quite a bit since it should only be hitting
the partitions where the data might be.

Stephen


signature.asc
Description: Digital signature


Re: [GENERAL] Using Postgres to store high volume streams of sensor readings

2008-11-23 Thread Scara Maccai
> But unfortunately the query speed is not good at all
> because most
> queries are for a specific client (and sensor) in a given
> time
> range...

Maybe I'm wrong, I don't know a lot of these things; but defining the index as 
(timestamp, clientid, sensor) instead of (clientid, sensor, timestamp) should 
give you more "locality of access" in the index creation?
I think it would make more sense too, since you are not going to query the db 
without the timestamp, but maybe you want to query it without the clientid or 
probe (to get aggregates for example).

Plus: could you give us some numbers about the difference in performance of the 
selects between the index defined as (timestamp) and defined as (clientid, 
sensor, timestamp)? 





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [Q]updating multiple rows with Different values

2008-11-23 Thread V S P
Thank you very much 
this is exactly what I am looking for

As well as the example provided
'  case when id=1 then 10  '

- it will work as well.


Now just one more question:
I will not have a lot of values to update (less than a 1000
at a time) -- but the values for col1 will be text that is
up to 64K.  So I will not be able to construct SQL strings 
and just send them (because it will probably exceed the character
limits for the SQL statements).

Instead, what I plan to do is to generate an sql string as prepared
statement in PDO, and then bind values  to it, so I will have

UPDATE tbl_1 SET col1 = t.col1 FROM (VALUES
(':val1', ':id1')
(':val2', ':id2')
(':val3', ':id3')
 ) AS t(id, col1)

$count=0;
foreach ($upd_arr as $upd_row )
{
  bindValue(':val'.$count,$upd_row->val);
  bindValue(':id'.$count,$upd_row->id);
  $count=$count+1
}


Is this, aproximately, how I should be doing the update?
Is there a limit on the amount of total size of the statement
when gets out of PDO and into postgres

If yes, what is it?
I will just split the loop into chunks, 
just wanted to know.


Thank you again for such a quick help.





On Sun, 23 Nov 2008 10:11:56 +0100, "Gerhard Heift"
<[EMAIL PROTECTED]> said:
> On Sat, Nov 22, 2008 at 10:04:48PM -0500, V S P wrote:
> > Hello,
> > searched documentation, FAQ and mailing list archives
> > (mailing list archive search is volumous :-) )
> > 
> > but could not find an answer:
> > 
> > I would like to be able to update
> > several rows to different values at the same time
> > 
> > In oracle this used to be called Array update or 
> > 'collect' update or 'bulk' update -- but those
> > keywords did not bring anything for Postgresql.
> > 
> > for example tbl_1 has two columns id and col1
> > 
> > 
> > update tbl_1  set
> >col1=3  where id=25,
> >col1=5  where id=26
> 
> Something like this?
> 
> UPDATE tbl_1 SET col1 = t.col1 FROM (VALUES
>   (25, 3)
>   (26, 5)
> ) AS t(id, col1)
> WHERE tbl_1.id = t.id;
> 
> > I am using PHP PDO (and hoping that if there is a mechanism
> > within postgresql to do that PDO will support it as well).
> > 
> > Thank you in advance,
> > VSP
> 
> Regards,
>   Gerhard
-- 
  V S P
  [EMAIL PROTECTED]

-- 
http://www.fastmail.fm - Or how I learned to stop worrying and
  love email again


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Using Postgres to store high volume streams of sensor readings

2008-11-23 Thread V S P
While most of my experience with oracle/informix

I would also recommend 
a) partitioning on DB level
Put partitions on on separate hard disks, have the system to be 
at least dual core, and make the disks to be attached via SCSI
controller (not IDE) for parallel performance.


b) partitioning on application level (that is having
the insert code dynamically figure out what DB/and what table to go
(this complicates the application for inserts as well as for reports)



c) may be there is a chance to remove the index (if all you are doing
is inserts) -- and then recreate it later?

e) I did not see the type of index but if the value of at least
some of the indexed fields repeated a lot -- Oracle had what's called
'bitmap index'
Postgresql might have something similar, where that type of index
is optimized for the fact that values are the same for majority
of the rows (it becomes much smaller, and therefore quicker to update).

f) review that there are no insert triggers and
constraints (eithe field or foreign) on those tables
if there -- validate why they are there and see if they can
be removed -- and the application would then need to gurantee
correctness

VSP


On Sun, 23 Nov 2008 08:34:57 +0200, "Ciprian Dorin Craciun"
<[EMAIL PROTECTED]> said:
> On Sun, Nov 23, 2008 at 1:02 AM, Tom Lane <[EMAIL PROTECTED]> wrote:
> > Alvaro Herrera <[EMAIL PROTECTED]> writes:
> >> The problem is, most likely, on updating the indexes.  Heap inserts
> >> should always take more or less the same time, but index insertion
> >> requires walking down the index struct for each insert, and the path to
> >> walk gets larger the more data you have.
> >
> > It's worse than that: his test case inserts randomly ordered keys, which
> > means that there's no locality of access during the index updates.  Once
> > the indexes get bigger than RAM, update speed goes into the toilet,
> > because the working set of index pages that need to be touched also
> > is bigger than RAM.  That effect is going to be present in *any*
> > standard-design database, not just Postgres.
> >
> > It's possible that performance in a real-world situation would be
> > better, if the incoming data stream isn't so random; but it's
> > hard to tell about that with the given facts.
> >
> > One possibly useful trick is to partition the data by timestamp with
> > partition sizes chosen so that the indexes don't get out of hand.
> > But the partition management might be enough of a PITA to negate
> > any win.
> >
> >regards, tom lane
> 
> Thanks for your feedback! This is just as I supposed, but i didn't
> had the Postgres experience to be certain.
> I'll include your conclusion to my report.
> 
> Ciprian Craciun.
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
-- 
  V S P
  [EMAIL PROTECTED]

-- 
http://www.fastmail.fm - Email service worth paying for. Try it for free


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] delete commands fails silently to delete primary key

2008-11-23 Thread Tom Lane
"Andrus" <[EMAIL PROTECTED]> writes:
>> There have been a number of index-corruption bugs fixed since 8.1.4 ...
>> 
>> In particular, if it's possible that any of these clients abort before
>> committing these insertions, the vacuum race condition bug fixed in
>> 8.1.10 is a pretty likely candidate for your problem.

> I changed second statement to ...
> Will this fix the isse

No.  Look, as you've been told several times already you are running a
very old version with a lot of known bugs.  Just update to the latest
in that branch.  It's not hard.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Using Postgres to store high volume streams of sensor readings

2008-11-23 Thread Ciprian Dorin Craciun
On Sun, Nov 23, 2008 at 3:28 PM, Stephen Frost <[EMAIL PROTECTED]> wrote:
> * Ciprian Dorin Craciun ([EMAIL PROTECTED]) wrote:
>> > Even better might be partitioning on the timestamp.  IF all access is
>> > in a certain timestamp range it's usually a big win, especially
>> > because he can move to a new table every hour / day / week or whatever
>> > and merge the old one into a big "old data" table.
>>
>> Yes, If i would speed the inserts tremendously... I've tested it
>> and the insert speed is somewhere at 200k->100k.
>>
>> But unfortunately the query speed is not good at all because most
>> queries are for a specific client (and sensor) in a given time
>> range...
>
> Have you set up your partitions correctly (eg, with appropriate CHECK
> constraints and with constraint_exclusion turned on)?  Also, you'd want
> to keep your indexes on the individual partitions, of course..  That
> should improve query time quite a bit since it should only be hitting
> the partitions where the data might be.
>
>Stephen
>
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iEYEARECAAYFAkkpWpEACgkQrzgMPqB3kihvyACgm6ITdkodTqZvDLCjqavj9lkR
> w1oAnRrB1rbW+bF6Spr77VcH5/Mty4S6
> =G7aX
> -END PGP SIGNATURE-

Well, now that I've read the previous two emails better, I
understand what Scot and Stephen are talking about...

So if I understood it correctly: I should build indexes only for
certain parts of the data (like previous full hour and so). But I see
a problem: wouldn't this lead to a lot of indices beeing created (24 /
hour, ~150 / week, ...)?

Another question: wouldn't the index creation impact the insertion
and query speed during they are created?

Either case I don't think this is a very easy to implement solution...

Ciprian Craciun.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres mail list traffic over time

2008-11-23 Thread Tom Lane
Craig Ringer <[EMAIL PROTECTED]> writes:
> Gregory Stark wrote:
>> Another idea, I wonder if the project has gone more international and
>> therefore has more traffic at odd hours of the day for everyone.

> I wouldn't be at all surprised if that were the case. Alas, it's not
> possible to analyze usefully because so many companies use .com
> addresses instead of addresses under a cctld, and because so many people
> use webmail services like gmail that provide no geographical information
> in the domain.

You can often get a sense of where someone is by noting the timezone of
the Date: header in their messages.  That seems to get localized
correctly even by a lot of the big services like gmail.

FWIW, this project has always been pretty diversified geographically;
we've had major contributors in Russia, Japan, and Australia for as far
back as I can remember, not just Europe and the Americas.  I think there
are more people now, but I'm not convinced that the distribution has
changed much.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Using Postgres to store high volume streams of sensor readings

2008-11-23 Thread Ciprian Dorin Craciun
Thanks for your info! Please see my observations below.

By the way, we are planning to also try Informix (the time series
extension?)... Do you have some other tips about Informix?

Ciprian Craciun.


On Sun, Nov 23, 2008 at 6:06 PM, V S P <[EMAIL PROTECTED]> wrote:
> While most of my experience with oracle/informix
>
> I would also recommend
> a) partitioning on DB level
> Put partitions on on separate hard disks, have the system to be
> at least dual core, and make the disks to be attached via SCSI
> controller (not IDE) for parallel performance.

This I found out, but for now I'm not able to change the disk layout...


> b) partitioning on application level (that is having
> the insert code dynamically figure out what DB/and what table to go
> (this complicates the application for inserts as well as for reports)

We wanted to do this, and in this circumstance the winner for the
moment is BerkeleyDB as it's super fast for readings. (This is what
it's currently called sharding, right?)


> c) may be there is a chance to remove the index (if all you are doing
> is inserts) -- and then recreate it later?

Not possible, as we would like to use the same database (and
table) for both inserts and real time queries... Otherwise the
application would complicate a lot...


> e) I did not see the type of index but if the value of at least
> some of the indexed fields repeated a lot -- Oracle had what's called
> 'bitmap index'
> Postgresql might have something similar, where that type of index
> is optimized for the fact that values are the same for majority
> of the rows (it becomes much smaller, and therefore quicker to update).

For the moment the index type is the default one (btree), and from
the documentation I didn't see another matching (with the current
usage) one.


> f) review that there are no insert triggers and
> constraints (eithe field or foreign) on those tables
> if there -- validate why they are there and see if they can
> be removed -- and the application would then need to gurantee
> correctness

Nop, no triggers or constraints (other than not null).


> VSP
>
>
> On Sun, 23 Nov 2008 08:34:57 +0200, "Ciprian Dorin Craciun"
> <[EMAIL PROTECTED]> said:
>> On Sun, Nov 23, 2008 at 1:02 AM, Tom Lane <[EMAIL PROTECTED]> wrote:
>> > Alvaro Herrera <[EMAIL PROTECTED]> writes:
>> >> The problem is, most likely, on updating the indexes.  Heap inserts
>> >> should always take more or less the same time, but index insertion
>> >> requires walking down the index struct for each insert, and the path to
>> >> walk gets larger the more data you have.
>> >
>> > It's worse than that: his test case inserts randomly ordered keys, which
>> > means that there's no locality of access during the index updates.  Once
>> > the indexes get bigger than RAM, update speed goes into the toilet,
>> > because the working set of index pages that need to be touched also
>> > is bigger than RAM.  That effect is going to be present in *any*
>> > standard-design database, not just Postgres.
>> >
>> > It's possible that performance in a real-world situation would be
>> > better, if the incoming data stream isn't so random; but it's
>> > hard to tell about that with the given facts.
>> >
>> > One possibly useful trick is to partition the data by timestamp with
>> > partition sizes chosen so that the indexes don't get out of hand.
>> > But the partition management might be enough of a PITA to negate
>> > any win.
>> >
>> >regards, tom lane
>>
>> Thanks for your feedback! This is just as I supposed, but i didn't
>> had the Postgres experience to be certain.
>> I'll include your conclusion to my report.
>>
>> Ciprian Craciun.
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
> --
>  V S P
>  [EMAIL PROTECTED]
>
> --
> http://www.fastmail.fm - Email service worth paying for. Try it for free

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Using Postgres to store high volume streams of sensor readings

2008-11-23 Thread Scara Maccai
> If you watch the speed, you'll see that the insert
> speed is the
> same, but the scan speed is worse (from 32k  to 200).

As I said, I don't know a lot about these things.
But I would like someone to comment on this (so that maybe I will know 
something!):

1) I thought the poor insert performance was due to a "locality of access" in 
the index creation, hence I thought that since the timestamp is always 
increasing putting it as first column in the index should give a better insert 
speed, but it didn't: why

2) I thought that given a query like:

select * from taba where clientid=2 and sensor=4 and timestamp between 
'start_t' and 'end_t'

there shouldn't be a huge difference in speed between an index defined as 
(timestamp, clientid, sensorid) and another one defined as (clientid, sensor, 
timestamp) but I was VERY wrong: it's 1000 times worst. How is it possible???

It's obvious I don't know how multicolumn indexes work...
Can someone explain?




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Using Postgres to store high volume streams of sensor readings

2008-11-23 Thread Scott Marlowe
On Sun, Nov 23, 2008 at 10:01 AM, Ciprian Dorin Craciun
<[EMAIL PROTECTED]> wrote:
> On Sun, Nov 23, 2008 at 3:28 PM, Stephen Frost <[EMAIL PROTECTED]> wrote:
>> * Ciprian Dorin Craciun ([EMAIL PROTECTED]) wrote:
>>> > Even better might be partitioning on the timestamp.  IF all access is
>>> > in a certain timestamp range it's usually a big win, especially
>>> > because he can move to a new table every hour / day / week or whatever
>>> > and merge the old one into a big "old data" table.
>>>
>>> Yes, If i would speed the inserts tremendously... I've tested it
>>> and the insert speed is somewhere at 200k->100k.
>>>
>>> But unfortunately the query speed is not good at all because most
>>> queries are for a specific client (and sensor) in a given time
>>> range...
>>
>> Have you set up your partitions correctly (eg, with appropriate CHECK
>> constraints and with constraint_exclusion turned on)?  Also, you'd want
>> to keep your indexes on the individual partitions, of course..  That
>> should improve query time quite a bit since it should only be hitting
>> the partitions where the data might be.
>>
>>Stephen
>>
>> -BEGIN PGP SIGNATURE-
>> Version: GnuPG v1.4.9 (GNU/Linux)
>>
>> iEYEARECAAYFAkkpWpEACgkQrzgMPqB3kihvyACgm6ITdkodTqZvDLCjqavj9lkR
>> w1oAnRrB1rbW+bF6Spr77VcH5/Mty4S6
>> =G7aX
>> -END PGP SIGNATURE-
>
>Well, now that I've read the previous two emails better, I
> understand what Scot and Stephen are talking about...
>
>So if I understood it correctly: I should build indexes only for
> certain parts of the data (like previous full hour and so). But I see
> a problem: wouldn't this lead to a lot of indices beeing created (24 /
> hour, ~150 / week, ...)?

No, not exactly what I'm talking about.  I'm talking about
pre-creating partitions that the data will soon go into (let's say a
new one every hour) with indexes in place, and having a trigger that
fires on insert to put the data into the right partition.  Once that
partition is no longer being inserted into, and we aren't running a
bunch of queries on it, we migrate it to a historical partition.

So, your table looks something like this all the time:

|**|^^|##|##|##|##|$$...|

Where:

** is a partition we have created in advance of needing it.
^^ is the partition we are currently writing to
## are the partitions we're still using in select queries a lot
$$$... are the old data stuffed into the monolithic history table.

When it's time to switch to writing to the new partition (i.e. **) we
make a new one ahead of that, and the trigger starts writing to what
was a ** partition but is now the new ^^, and the ^^ becomes a ##.  At
the end of the day / week whatever, we take all the old ## partitions
and move their data into the $$$ and drop the ## partitions.

Note that we only need to put data into an archive partition to keep
from having hundreds or thousands of partitions.  There's a limit of a
few hundred partitions where things start getting slow again due to
planner overhead.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [Q]updating multiple rows with Different values

2008-11-23 Thread Scott Marlowe
On Sun, Nov 23, 2008 at 8:43 AM, V S P <[EMAIL PROTECTED]> wrote:
> Thank you very much
> this is exactly what I am looking for
>
> As well as the example provided
> '  case when id=1 then 10  '
>
> - it will work as well.
>
>
> Now just one more question:
> I will not have a lot of values to update (less than a 1000
> at a time) -- but the values for col1 will be text that is
> up to 64K.  So I will not be able to construct SQL strings
> and just send them (because it will probably exceed the character
> limits for the SQL statements).

There are no character limits for sql statements in pgsql

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] mail list traffic

2008-11-23 Thread Daniel Verite
Gregory Stark wrote:

> I would be curious to see the average lifespan of threads over time.

I happen to have the mail archives stored in a database, so I've
expressed this in SQL and below are some results for hackers and
general, 2007-2008. count is the number of distinct threads whose
oldest message is in the specified month. A thread is started as
soon as a message has an In-Reply-To field pointing to an
existing Message-Id.

Results for pgsql-hackers:

  month  | avg span |   median span   | count
-+--+-+---
 2007-01 | 7 days 10:00:00  | 1 day 04:18:00  |   211
 2007-02 | 7 days 10:00:00  | 1 day 00:23:48  |   186
 2007-03 | 16 days 30:00:00 | 1 day 05:45:37  |   171
 2007-04 | 13 days 26:00:00 | 19:07:00|   142
 2007-05 | 19 days 30:00:00 | 1 day 04:46:36  |   122
 2007-06 | 15 days 19:00:00 | 23:38:13|   111
 2007-07 | 19 days 25:00:00 | 21:04:04|   106
 2007-08 | 13 days 30:00:00 | 20:26:39|   133
 2007-09 | 21 days 32:00:00 | 1 day 16:43:10  |   121
 2007-10 | 13 days 19:00:00 | 17:23:24|   148
 2007-11 | 16 days 15:00:00 | 16:23:00|   140
 2007-12 | 17 days 16:00:00 | 1 day 07:28:05  |81
 2008-01 | 13 days 12:00:00 | 23:02:33|   127
 2008-02 | 9 days 11:00:00  | 12:44:28|   130
 2008-03 | 10 days 14:00:00 | 22:57:18|   140
 2008-04 | 10 days 14:00:00 | 1 day 00:32:34  |   132
 2008-05 | 13 days 09:00:00 | 1 day 20:57:57  |   113
 2008-06 | 7 days 27:00:00  | 1 day 05:42:46  |   102
 2008-07 | 13 days 26:00:00 | 2 days 07:43:34 |   133
 2008-08 | 9 days 33:00:00  | 1 day 07:47:09  |   121
 2008-09 | 7 days 25:00:00  | 1 day 19:00:50  |   125
 2008-10 | 6 days 14:00:00  | 1 day 10:31:01  |   178

 Results for pgsql-general:

  month  |avg span | median span | count
-+-+-+---
 2007-01 | 1 day 25:00:00  | 10:57:11|   329
 2007-02 | 2 days 28:00:00 | 10:50:38|   295
 2007-03 | 3 days 08:00:00 | 14:54:08|   310
 2007-04 | 6 days 18:00:00 | 17:40:55|   244
 2007-05 | 3 days 22:00:00 | 16:43:54|   287
 2007-06 | 2 days 13:00:00 | 11:26:46|   297
 2007-07 | 2 days 19:00:00 | 11:59:40|   263
 2007-08 | 3 days 14:00:00 | 16:35:16|   335
 2007-09 | 3 days 14:00:00 | 13:23:09|   245
 2007-10 | 2 days 16:00:00 | 08:46:09|   302
 2007-11 | 3 days 07:00:00 | 08:28:06|   294
 2007-12 | 2 days 31:00:00 | 10:25:14|   255
 2008-01 | 2 days 14:00:00 | 13:23:12|   248
 2008-02 | 2 days 14:00:00 | 10:02:16|   257
 2008-03 | 1 day 25:00:00  | 13:20:06|   245
 2008-04 | 1 day 30:00:00  | 08:26:06|   238
 2008-05 | 3 days 22:00:00 | 18:58:27|   211
 2008-06 | 2 days 24:00:00 | 14:46:02|   191
 2008-07 | 1 day 29:00:00  | 10:37:17|   221
 2008-08 | 1 day 22:00:00  | 14:14:45|   205
 2008-09 | 1 day 24:00:00  | 14:26:26|   202
 2008-10 | 1 day 19:00:00  | 12:32:56|   219

"median span" is the median computed with the pl/R median function
applied to intervals as a number of seconds and then cast back to
intervals for display. I believe the median is good to mitigate the
contribution of messages with wrong dates and posters that reply to
very old messages. And median span appears to differs a lot from the
average span.

If people feel like playing with the database to build other queries,
feel free to bug me off-list about it. I can arrange to make a dump
available or share the scripts to build it yourself from the
mailboxes archives.

 Best regards,
--
 Daniel
 PostgreSQL-powered mail user agent and storage:
http://www.manitou-mail.org



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres mail list traffic over time

2008-11-23 Thread Alvaro Herrera
Tom Lane wrote:

> FWIW, this project has always been pretty diversified geographically;
> we've had major contributors in Russia, Japan, and Australia for as far
> back as I can remember, not just Europe and the Americas.  I think there
> are more people now, but I'm not convinced that the distribution has
> changed much.

How about getting a new version of the world map showing developer's
location?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres mail list traffic over time

2008-11-23 Thread Raymond O'Donnell
On 23/11/2008 20:58, Alvaro Herrera wrote:

> How about getting a new version of the world map showing developer's
> location?

Cool! Definitely +1 if we can show contributors to the list generally,
not just developers.

Ray.


--
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
[EMAIL PROTECTED]
Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals
--

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] mail list traffic

2008-11-23 Thread Alvaro Herrera
Daniel Verite wrote:
>   Gregory Stark wrote:
> 
> > I would be curious to see the average lifespan of threads over time.
> 
> I happen to have the mail archives stored in a database, [...]

When I saw the manitou-mail.org stuff some days ago I was curious -- how
feasible would it be to host our web archives using a database of some
sort, instead of the current mbox-based Mhonarc installation we use,
which is so full of problems and limitations?

I wondered about using Oryx some time ago, and got in contact with
Abhijit Menon-Sen to that end, but that never fructified.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Running postgresql as a VMware ESx client

2008-11-23 Thread Glen Eustace

Hi all,

I was wondering whether anyone has had any experience running postgresql 
in a vm under ESx.  VMware provides significant HA/DR oppurtunities and 
we would like to use it if we can.  The DBase would be on a EMC SAN 
hosted LUN and the ESx servers would be dual Quad CPU HP DL-380/G5s. At 
this stage we would use iSCSI for SAN connectivity as our testing with 
MS-SQL has not indicated that FC is needed.


We are getting a bit of push back from the external support agency who 
seem more than a little bit nervous about the environment.  I would 
appreciate any comments.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Glen and Rosanne Eustace
GodZone Internet Services, a division of AGRE Enterprises Ltd.
P.O. Box 8020, Palmerston North, New Zealand 4446.
Ph: +64 6 357 8168, Fax +64 6 357 8165, Mob: +64 21 424 015
http://www.godzone.net.nz

"A Ministry specialising in providing low-cost Internet Services
to NZ Christian Churches, Ministries and Organisations."

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] mail list traffic

2008-11-23 Thread Matthew T. O'Connor

Alvaro Herrera wrote:

When I saw the manitou-mail.org stuff some days ago I was curious -- how
feasible would it be to host our web archives using a database of some
sort, instead of the current mbox-based Mhonarc installation we use,
which is so full of problems and limitations?

I wondered about using Oryx some time ago, and got in contact with
Abhijit Menon-Sen to that end, but that never fructified.


We are a DB project after all and hosting our own archives might make 
for a good example, eating our own dogfood so to speak.  I would image 
the biggest problem is just finding someone who wants to take this on 
and maintain it.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Running postgresql as a VMware ESx client

2008-11-23 Thread Scott Marlowe
On Sun, Nov 23, 2008 at 5:28 PM, Glen Eustace <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I was wondering whether anyone has had any experience running postgresql in
> a vm under ESx.  VMware provides significant HA/DR oppurtunities and we
> would like to use it if we can.  The DBase would be on a EMC SAN hosted LUN
> and the ESx servers would be dual Quad CPU HP DL-380/G5s. At this stage we
> would use iSCSI for SAN connectivity as our testing with MS-SQL has not
> indicated that FC is needed.
>
> We are getting a bit of push back from the external support agency who seem
> more than a little bit nervous about the environment.  I would appreciate
> any comments.

Generally speaking, virtualization allows you to take a bunch of low
powered servers and make them live in one big box saving money on
electricity and management.  Generally speaking, database sers are big
powerful boxes with lots of hard disks and gigs upon gigs of ram to
handle terabytes of data.  Those two things seem at odds to me.

When I saw you post that the ESx servers would be dual quad core
machines I immediately thought how nice that would be to run pgsql on,
without vmware between it and the hardware.

On to the real issue.  PostgreSQL, like any database, requires certain
guarantees about the data it's writing to the disks actually being
written when the OS says it was.  With a SAN and vmware, you now have
a few extra layers that could like to your database.  In order to see
if things are working properly, you need to do some semi-destructive
testing.  Meaning you need to start up a few hundred clients running
read/write transactions, force a checkpoint, and pull the plug on
various parts (or all) of your system and see if it can come back up.

Do not trust UPSes, generators, power conditioners etc.  They can and
do ALL fail sometimes.  I've seen it happen a few times myself, when
systems that could not possibly lose power lost power.  If your system
can't guarantee data integrity in case of power loss then I would push
back on it too.

What, exactly, as you looking to gain by running pgsql under vmware on
such hardware?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Using Postgres to store high volume streams of sensor readings

2008-11-23 Thread marcin mank
>Yes, the figures are like this:
>* average number of raw inserts / second (without any optimization
> or previous aggregation): #clients (~ 100 thousand) * #sensors (~ 10)
> / 6seconds = 166 thousand inserts / second...

this is average?
166 000 * 20 bytes per record * 86400 seconds per day = 280GB / day ,
not counting indices.

What is the time span You want to have the data from?

Greetings
Marcin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Running postgresql as a VMware ESx client

2008-11-23 Thread Glen Eustace



Generally speaking, virtualization allows you to take a bunch of low
powered servers and make them live in one big box saving money on
electricity and management.  Generally speaking, database sers are big
powerful boxes with lots of hard disks and gigs upon gigs of ram to
handle terabytes of data.  Those two things seem at odds to me.


If one is handling databases with Terabytes of data and 1000s of 
connections, I would agree. We will be looking at 100s of Megabytes max 
and possible several hundred connections. A much smaller workload.



When I saw you post that the ESx servers would be dual quad core
machines I immediately thought how nice that would be to run pgsql on,
without vmware between it and the hardware.


To pilot this project we were going to run just the single VM on the 
server, if there are performance issues we can always rebuild and put 
the OS directly on the hardware rather than the ESx hypervisor.



What, exactly, as you looking to gain by running pgsql under vmware on
such hardware?


Mobility, in the HA/DR sense.  Being able to vmotion the server while it 
is running to rectify hardware issues or perform upgrades.  To bring the 
server up at a DR site automagically using SRM.  And yes, we have still 
to investigate the whole crash consistency stuff with SRM.  This 
database is heavily biased toward reads rather than writes.


--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Glen and Rosanne Eustace
GodZone Internet Services, a division of AGRE Enterprises Ltd.
P.O. Box 8020, Palmerston North, New Zealand 4446.
Ph: +64 6 357 8168, Fax +64 6 357 8165, Mob: +64 21 424 015
http://www.godzone.net.nz

"A Ministry specialising in providing low-cost Internet Services
to NZ Christian Churches, Ministries and Organisations."

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Running postgresql as a VMware ESx client

2008-11-23 Thread Scott Marlowe
On Sun, Nov 23, 2008 at 6:47 PM, Glen Eustace <[EMAIL PROTECTED]> wrote:
>
>> Generally speaking, virtualization allows you to take a bunch of low
>> powered servers and make them live in one big box saving money on
>> electricity and management.  Generally speaking, database sers are big
>> powerful boxes with lots of hard disks and gigs upon gigs of ram to
>> handle terabytes of data.  Those two things seem at odds to me.
>
> If one is handling databases with Terabytes of data and 1000s of
> connections, I would agree. We will be looking at 100s of Megabytes max and
> possible several hundred connections. A much smaller workload.

Yeah, you're not really looking at heavy lifting here then.  Should be fine.
>
>> What, exactly, as you looking to gain by running pgsql under vmware on
>> such hardware?
>
> Mobility, in the HA/DR sense.  Being able to vmotion the server while it is
> running to rectify hardware issues or perform upgrades.  To bring the server
> up at a DR site automagically using SRM.  And yes, we have still to
> investigate the whole crash consistency stuff with SRM.  This database is
> heavily biased toward reads rather than writes.

Given the light load you're looking at running, you'll likely be ok as
long as there aren't any issues with crash recovery.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Using Postgres to store high volume streams of sensor readings

2008-11-23 Thread Ciprian Dorin Craciun
On Mon, Nov 24, 2008 at 3:42 AM, marcin mank <[EMAIL PROTECTED]> wrote:
>>Yes, the figures are like this:
>>* average number of raw inserts / second (without any optimization
>> or previous aggregation): #clients (~ 100 thousand) * #sensors (~ 10)
>> / 6seconds = 166 thousand inserts / second...
>
> this is average?
> 166 000 * 20 bytes per record * 86400 seconds per day = 280GB / day ,
> not counting indices.
>
> What is the time span You want to have the data from?
>
> Greetings
> Marcin

Well I'm not sure for the archival period... Maybe a day, maybe a
week... For the moment I'm just struggling with the insert speed.
(We could also use sharding -- horizontal partitioning on
different machines -- and this wourd reduce the load...)

Ciprian.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Returning schema name with table name

2008-11-23 Thread Thomas Markus

Hi,

my standard query (adapted to 1mb size) is:
select
 t.spcname as "tablespace"
   , pg_get_userbyid(c.relowner) as "owner"
   , n.nspname as "schema"
   , relname::text as "name"
   , pg_size_pretty(pg_total_relation_size(c.oid))::text as "total size"
   , case
   when c.relkind='i' then 'index'
   when c.relkind='t' then 'toast'
   when c.relkind='r' then 'table'
   when c.relkind='v' then 'view'
   when c.relkind='c' then 'composite type'
   when c.relkind='S' then 'sequence'
   else c.relkind::text
 end as "type"
from
   pg_class c
   left join pg_namespace n on n.oid = c.relnamespace
   left join pg_tablespace t on t.oid = c.reltablespace
where
   (pg_total_relation_size(c.oid)>>20)>0 and c.relkind!='t'
order by
   c.relkind desc, pg_total_relation_size(c.oid) desc


Andrus schrieb:

SELECT oid, relname::char(35) as Table_Name,
  pg_size_pretty(pg_total_relation_size(oid))::VARCHAR(15) as 
Total_Table_Size

FROM pg_class
where  pg_total_relation_size(oid)/(1024*1024)>0
ORDER BY pg_total_relation_size(oid) desc

returns table names with size greater than 1 MB

How to modify this so that schema name is also returned?
I have lot of tables with same name and thus this output is difficult 
to understand.

pg_class seems not contain schema names.

Andrus.




--
Thomas Markus


proventis GmbH | Zimmerstr. 79-81 | D-10117 Berlin |
Tel +49 (0)30 2936399-22 | Fax -50 | [EMAIL PROTECTED]
-
Geschäftsführer: Norman Frischmuth | Sitz: Berlin
Handelsregister: AG Berlin-Charlottenburg, HR 82917
-
Blue Ant-webbasiertes Projektmanagement - aktuelle Termine 2008:
http://www.proventis.net/website/live/blueant/veranstaltungen.html


begin:vcard
fn:Thomas Markus
n:Markus;Thomas
org:proventis GmbH
adr:;;Zimmerstr. 79-80;Berlin;Berlin;10117;Germany
email;internet:[EMAIL PROTECTED]
tel;work:+49 30 29 36 399 22
x-mozilla-html:FALSE
url:http://www.proventis.net
version:2.1
end:vcard


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] PgAgent Job Scehduler is NOT running

2008-11-23 Thread Abdul Rahman
Dear all,

I  installed PgAgent and started its service and successfully scheduled a 
backup and got 100% result. Now the same Job is not working even I reinstalled 
PgAgent but failed to get result. 

Regards,
Abdul Rehman.