Re: [GENERAL] Postgres mail list traffic over time
Gregory Stark wrote: > Tom Lane <[EMAIL PROTECTED]> writes: > >> Bruce Momjian <[EMAIL PROTECTED]> writes: >>> Tom Lane wrote: So, to a first approximation, the PG list traffic has been constant since 2000. Not the result I expected. >>> I also was confused by its flatness. I am finding the email traffic >>> almost impossible to continue tracking, so something different is >>> happening, but it seems it is not volume-related. >> Yes, my perception also is that it's getting harder and harder to keep >> up with the list traffic; so something is happening that a simple >> volume count doesn't capture. > > I've noticed recently that the mailing list traffic seems very "bursty". We > have days with hundreds of messages on lots of different in-depth topics and > other days with hardly any messages at all. I wonder if it's hard to follow > because we've been picking up more simultaneous threads instead of all being > on one thread together before moving on to the next one. > > Another idea, I wonder if the project has gone more international and > therefore has more traffic at odd hours of the day for everyone. It would also > mean more long-lived threads with large latencies between messages and > replies. I wouldn't be at all surprised if that were the case. Alas, it's not possible to analyze usefully because so many companies use .com addresses instead of addresses under a cctld, and because so many people use webmail services like gmail that provide no geographical information in the domain. Certainly the variety of languages seen in error messages, the variation in English language skills, etc would tend to suggest a pretty strong user base outside the US/Uk/Au . -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [Q]updating multiple rows with Different values
V S P <[EMAIL PROTECTED]> schrieb: > Hello, > searched documentation, FAQ and mailing list archives > (mailing list archive search is volumous :-) ) > > but could not find an answer: > > I would like to be able to update > several rows to different values at the same time > > In oracle this used to be called Array update or > 'collect' update or 'bulk' update -- but those > keywords did not bring anything for Postgresql. test=# create table foo (id int, val int); CREATE TABLE Zeit: 0,837 ms test=*# insert into foo values (1,1); INSERT 0 1 Zeit: 0,434 ms test=*# insert into foo values (2,2); INSERT 0 1 Zeit: 0,298 ms test=*# update foo set val = case when id=1 then 10 when id=2 then 20 end; UPDATE 2 Zeit: 0,424 ms test=*# select * from foo; id | val +- 1 | 10 2 | 20 (2 Zeilen) Is this okay for you? Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889° -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [Q]updating multiple rows with Different values
On Sat, Nov 22, 2008 at 10:04:48PM -0500, V S P wrote: > Hello, > searched documentation, FAQ and mailing list archives > (mailing list archive search is volumous :-) ) > > but could not find an answer: > > I would like to be able to update > several rows to different values at the same time > > In oracle this used to be called Array update or > 'collect' update or 'bulk' update -- but those > keywords did not bring anything for Postgresql. > > for example tbl_1 has two columns id and col1 > > > update tbl_1 set >col1=3 where id=25, >col1=5 where id=26 Something like this? UPDATE tbl_1 SET col1 = t.col1 FROM (VALUES (25, 3) (26, 5) ) AS t(id, col1) WHERE tbl_1.id = t.id; > I am using PHP PDO (and hoping that if there is a mechanism > within postgresql to do that PDO will support it as well). > > Thank you in advance, > VSP Regards, Gerhard signature.asc Description: Digital signature
Re: [GENERAL] date stamp on update?
On Sat, Nov 22, 2008 at 05:34:26PM -0500, blackwater dev wrote: > Is there a datatype in postgres that will automatically update the date when > the row is updated? I know I can do a timestamp and set the default to > now() but once the row is inserted, and then edited, I want the column > updated without editing my application code or adding a trigger. Is this > possible with Postgres? http://www.depesz.com/index.php/2008/05/08/mysqls-timestamp-in-postgresql/ depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: [EMAIL PROTECTED] / aim:depeszhdl / skype:depesz_hdl / gg:6749007 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] delete commands fails silently to delete primary key
There have been a number of index-corruption bugs fixed since 8.1.4 ... In particular, if it's possible that any of these clients abort before committing these insertions, the vacuum race condition bug fixed in 8.1.10 is a pretty likely candidate for your problem. I changed second statement to INSERT INTO session ('MYCOMPNAME',ipaddress,logintime,loggeduser) SELECT 'MYCOMPNAME', inet_client_addr()::CHAR(14),current_timestamp::CHAR(28),CURRENT_USER WHERE NOT EXISTS (SELECT 1 FROM session WHERE workplace='MYCOMPNAME') where MYCOMPNAME is logging-in computer name. Will this fix the isse or is it better to wait 100 ms and re-try insert? Andrus. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres mail list traffic over time
Craig Ringer <[EMAIL PROTECTED]> writes: > Gregory Stark wrote: >> Another idea, I wonder if the project has gone more international and >> therefore has more traffic at odd hours of the day for everyone. It would >> also >> mean more long-lived threads with large latencies between messages and >> replies. > > I wouldn't be at all surprised if that were the case. Alas, it's not > possible to analyze usefully because so many companies use .com > addresses instead of addresses under a cctld, and because so many people > use webmail services like gmail that provide no geographical information > in the domain. I would be curious to see the average lifespan of threads over time. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Using Postgres to store high volume streams of sensor readings
* Ciprian Dorin Craciun ([EMAIL PROTECTED]) wrote: > > Even better might be partitioning on the timestamp. IF all access is > > in a certain timestamp range it's usually a big win, especially > > because he can move to a new table every hour / day / week or whatever > > and merge the old one into a big "old data" table. > > Yes, If i would speed the inserts tremendously... I've tested it > and the insert speed is somewhere at 200k->100k. > > But unfortunately the query speed is not good at all because most > queries are for a specific client (and sensor) in a given time > range... Have you set up your partitions correctly (eg, with appropriate CHECK constraints and with constraint_exclusion turned on)? Also, you'd want to keep your indexes on the individual partitions, of course.. That should improve query time quite a bit since it should only be hitting the partitions where the data might be. Stephen signature.asc Description: Digital signature
Re: [GENERAL] Using Postgres to store high volume streams of sensor readings
> But unfortunately the query speed is not good at all > because most > queries are for a specific client (and sensor) in a given > time > range... Maybe I'm wrong, I don't know a lot of these things; but defining the index as (timestamp, clientid, sensor) instead of (clientid, sensor, timestamp) should give you more "locality of access" in the index creation? I think it would make more sense too, since you are not going to query the db without the timestamp, but maybe you want to query it without the clientid or probe (to get aggregates for example). Plus: could you give us some numbers about the difference in performance of the selects between the index defined as (timestamp) and defined as (clientid, sensor, timestamp)? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [Q]updating multiple rows with Different values
Thank you very much this is exactly what I am looking for As well as the example provided ' case when id=1 then 10 ' - it will work as well. Now just one more question: I will not have a lot of values to update (less than a 1000 at a time) -- but the values for col1 will be text that is up to 64K. So I will not be able to construct SQL strings and just send them (because it will probably exceed the character limits for the SQL statements). Instead, what I plan to do is to generate an sql string as prepared statement in PDO, and then bind values to it, so I will have UPDATE tbl_1 SET col1 = t.col1 FROM (VALUES (':val1', ':id1') (':val2', ':id2') (':val3', ':id3') ) AS t(id, col1) $count=0; foreach ($upd_arr as $upd_row ) { bindValue(':val'.$count,$upd_row->val); bindValue(':id'.$count,$upd_row->id); $count=$count+1 } Is this, aproximately, how I should be doing the update? Is there a limit on the amount of total size of the statement when gets out of PDO and into postgres If yes, what is it? I will just split the loop into chunks, just wanted to know. Thank you again for such a quick help. On Sun, 23 Nov 2008 10:11:56 +0100, "Gerhard Heift" <[EMAIL PROTECTED]> said: > On Sat, Nov 22, 2008 at 10:04:48PM -0500, V S P wrote: > > Hello, > > searched documentation, FAQ and mailing list archives > > (mailing list archive search is volumous :-) ) > > > > but could not find an answer: > > > > I would like to be able to update > > several rows to different values at the same time > > > > In oracle this used to be called Array update or > > 'collect' update or 'bulk' update -- but those > > keywords did not bring anything for Postgresql. > > > > for example tbl_1 has two columns id and col1 > > > > > > update tbl_1 set > >col1=3 where id=25, > >col1=5 where id=26 > > Something like this? > > UPDATE tbl_1 SET col1 = t.col1 FROM (VALUES > (25, 3) > (26, 5) > ) AS t(id, col1) > WHERE tbl_1.id = t.id; > > > I am using PHP PDO (and hoping that if there is a mechanism > > within postgresql to do that PDO will support it as well). > > > > Thank you in advance, > > VSP > > Regards, > Gerhard -- V S P [EMAIL PROTECTED] -- http://www.fastmail.fm - Or how I learned to stop worrying and love email again -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Using Postgres to store high volume streams of sensor readings
While most of my experience with oracle/informix I would also recommend a) partitioning on DB level Put partitions on on separate hard disks, have the system to be at least dual core, and make the disks to be attached via SCSI controller (not IDE) for parallel performance. b) partitioning on application level (that is having the insert code dynamically figure out what DB/and what table to go (this complicates the application for inserts as well as for reports) c) may be there is a chance to remove the index (if all you are doing is inserts) -- and then recreate it later? e) I did not see the type of index but if the value of at least some of the indexed fields repeated a lot -- Oracle had what's called 'bitmap index' Postgresql might have something similar, where that type of index is optimized for the fact that values are the same for majority of the rows (it becomes much smaller, and therefore quicker to update). f) review that there are no insert triggers and constraints (eithe field or foreign) on those tables if there -- validate why they are there and see if they can be removed -- and the application would then need to gurantee correctness VSP On Sun, 23 Nov 2008 08:34:57 +0200, "Ciprian Dorin Craciun" <[EMAIL PROTECTED]> said: > On Sun, Nov 23, 2008 at 1:02 AM, Tom Lane <[EMAIL PROTECTED]> wrote: > > Alvaro Herrera <[EMAIL PROTECTED]> writes: > >> The problem is, most likely, on updating the indexes. Heap inserts > >> should always take more or less the same time, but index insertion > >> requires walking down the index struct for each insert, and the path to > >> walk gets larger the more data you have. > > > > It's worse than that: his test case inserts randomly ordered keys, which > > means that there's no locality of access during the index updates. Once > > the indexes get bigger than RAM, update speed goes into the toilet, > > because the working set of index pages that need to be touched also > > is bigger than RAM. That effect is going to be present in *any* > > standard-design database, not just Postgres. > > > > It's possible that performance in a real-world situation would be > > better, if the incoming data stream isn't so random; but it's > > hard to tell about that with the given facts. > > > > One possibly useful trick is to partition the data by timestamp with > > partition sizes chosen so that the indexes don't get out of hand. > > But the partition management might be enough of a PITA to negate > > any win. > > > >regards, tom lane > > Thanks for your feedback! This is just as I supposed, but i didn't > had the Postgres experience to be certain. > I'll include your conclusion to my report. > > Ciprian Craciun. > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- V S P [EMAIL PROTECTED] -- http://www.fastmail.fm - Email service worth paying for. Try it for free -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] delete commands fails silently to delete primary key
"Andrus" <[EMAIL PROTECTED]> writes: >> There have been a number of index-corruption bugs fixed since 8.1.4 ... >> >> In particular, if it's possible that any of these clients abort before >> committing these insertions, the vacuum race condition bug fixed in >> 8.1.10 is a pretty likely candidate for your problem. > I changed second statement to ... > Will this fix the isse No. Look, as you've been told several times already you are running a very old version with a lot of known bugs. Just update to the latest in that branch. It's not hard. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Using Postgres to store high volume streams of sensor readings
On Sun, Nov 23, 2008 at 3:28 PM, Stephen Frost <[EMAIL PROTECTED]> wrote: > * Ciprian Dorin Craciun ([EMAIL PROTECTED]) wrote: >> > Even better might be partitioning on the timestamp. IF all access is >> > in a certain timestamp range it's usually a big win, especially >> > because he can move to a new table every hour / day / week or whatever >> > and merge the old one into a big "old data" table. >> >> Yes, If i would speed the inserts tremendously... I've tested it >> and the insert speed is somewhere at 200k->100k. >> >> But unfortunately the query speed is not good at all because most >> queries are for a specific client (and sensor) in a given time >> range... > > Have you set up your partitions correctly (eg, with appropriate CHECK > constraints and with constraint_exclusion turned on)? Also, you'd want > to keep your indexes on the individual partitions, of course.. That > should improve query time quite a bit since it should only be hitting > the partitions where the data might be. > >Stephen > > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.9 (GNU/Linux) > > iEYEARECAAYFAkkpWpEACgkQrzgMPqB3kihvyACgm6ITdkodTqZvDLCjqavj9lkR > w1oAnRrB1rbW+bF6Spr77VcH5/Mty4S6 > =G7aX > -END PGP SIGNATURE- Well, now that I've read the previous two emails better, I understand what Scot and Stephen are talking about... So if I understood it correctly: I should build indexes only for certain parts of the data (like previous full hour and so). But I see a problem: wouldn't this lead to a lot of indices beeing created (24 / hour, ~150 / week, ...)? Another question: wouldn't the index creation impact the insertion and query speed during they are created? Either case I don't think this is a very easy to implement solution... Ciprian Craciun. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres mail list traffic over time
Craig Ringer <[EMAIL PROTECTED]> writes: > Gregory Stark wrote: >> Another idea, I wonder if the project has gone more international and >> therefore has more traffic at odd hours of the day for everyone. > I wouldn't be at all surprised if that were the case. Alas, it's not > possible to analyze usefully because so many companies use .com > addresses instead of addresses under a cctld, and because so many people > use webmail services like gmail that provide no geographical information > in the domain. You can often get a sense of where someone is by noting the timezone of the Date: header in their messages. That seems to get localized correctly even by a lot of the big services like gmail. FWIW, this project has always been pretty diversified geographically; we've had major contributors in Russia, Japan, and Australia for as far back as I can remember, not just Europe and the Americas. I think there are more people now, but I'm not convinced that the distribution has changed much. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Using Postgres to store high volume streams of sensor readings
Thanks for your info! Please see my observations below. By the way, we are planning to also try Informix (the time series extension?)... Do you have some other tips about Informix? Ciprian Craciun. On Sun, Nov 23, 2008 at 6:06 PM, V S P <[EMAIL PROTECTED]> wrote: > While most of my experience with oracle/informix > > I would also recommend > a) partitioning on DB level > Put partitions on on separate hard disks, have the system to be > at least dual core, and make the disks to be attached via SCSI > controller (not IDE) for parallel performance. This I found out, but for now I'm not able to change the disk layout... > b) partitioning on application level (that is having > the insert code dynamically figure out what DB/and what table to go > (this complicates the application for inserts as well as for reports) We wanted to do this, and in this circumstance the winner for the moment is BerkeleyDB as it's super fast for readings. (This is what it's currently called sharding, right?) > c) may be there is a chance to remove the index (if all you are doing > is inserts) -- and then recreate it later? Not possible, as we would like to use the same database (and table) for both inserts and real time queries... Otherwise the application would complicate a lot... > e) I did not see the type of index but if the value of at least > some of the indexed fields repeated a lot -- Oracle had what's called > 'bitmap index' > Postgresql might have something similar, where that type of index > is optimized for the fact that values are the same for majority > of the rows (it becomes much smaller, and therefore quicker to update). For the moment the index type is the default one (btree), and from the documentation I didn't see another matching (with the current usage) one. > f) review that there are no insert triggers and > constraints (eithe field or foreign) on those tables > if there -- validate why they are there and see if they can > be removed -- and the application would then need to gurantee > correctness Nop, no triggers or constraints (other than not null). > VSP > > > On Sun, 23 Nov 2008 08:34:57 +0200, "Ciprian Dorin Craciun" > <[EMAIL PROTECTED]> said: >> On Sun, Nov 23, 2008 at 1:02 AM, Tom Lane <[EMAIL PROTECTED]> wrote: >> > Alvaro Herrera <[EMAIL PROTECTED]> writes: >> >> The problem is, most likely, on updating the indexes. Heap inserts >> >> should always take more or less the same time, but index insertion >> >> requires walking down the index struct for each insert, and the path to >> >> walk gets larger the more data you have. >> > >> > It's worse than that: his test case inserts randomly ordered keys, which >> > means that there's no locality of access during the index updates. Once >> > the indexes get bigger than RAM, update speed goes into the toilet, >> > because the working set of index pages that need to be touched also >> > is bigger than RAM. That effect is going to be present in *any* >> > standard-design database, not just Postgres. >> > >> > It's possible that performance in a real-world situation would be >> > better, if the incoming data stream isn't so random; but it's >> > hard to tell about that with the given facts. >> > >> > One possibly useful trick is to partition the data by timestamp with >> > partition sizes chosen so that the indexes don't get out of hand. >> > But the partition management might be enough of a PITA to negate >> > any win. >> > >> >regards, tom lane >> >> Thanks for your feedback! This is just as I supposed, but i didn't >> had the Postgres experience to be certain. >> I'll include your conclusion to my report. >> >> Ciprian Craciun. >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general > -- > V S P > [EMAIL PROTECTED] > > -- > http://www.fastmail.fm - Email service worth paying for. Try it for free -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Using Postgres to store high volume streams of sensor readings
> If you watch the speed, you'll see that the insert > speed is the > same, but the scan speed is worse (from 32k to 200). As I said, I don't know a lot about these things. But I would like someone to comment on this (so that maybe I will know something!): 1) I thought the poor insert performance was due to a "locality of access" in the index creation, hence I thought that since the timestamp is always increasing putting it as first column in the index should give a better insert speed, but it didn't: why 2) I thought that given a query like: select * from taba where clientid=2 and sensor=4 and timestamp between 'start_t' and 'end_t' there shouldn't be a huge difference in speed between an index defined as (timestamp, clientid, sensorid) and another one defined as (clientid, sensor, timestamp) but I was VERY wrong: it's 1000 times worst. How is it possible??? It's obvious I don't know how multicolumn indexes work... Can someone explain? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Using Postgres to store high volume streams of sensor readings
On Sun, Nov 23, 2008 at 10:01 AM, Ciprian Dorin Craciun <[EMAIL PROTECTED]> wrote: > On Sun, Nov 23, 2008 at 3:28 PM, Stephen Frost <[EMAIL PROTECTED]> wrote: >> * Ciprian Dorin Craciun ([EMAIL PROTECTED]) wrote: >>> > Even better might be partitioning on the timestamp. IF all access is >>> > in a certain timestamp range it's usually a big win, especially >>> > because he can move to a new table every hour / day / week or whatever >>> > and merge the old one into a big "old data" table. >>> >>> Yes, If i would speed the inserts tremendously... I've tested it >>> and the insert speed is somewhere at 200k->100k. >>> >>> But unfortunately the query speed is not good at all because most >>> queries are for a specific client (and sensor) in a given time >>> range... >> >> Have you set up your partitions correctly (eg, with appropriate CHECK >> constraints and with constraint_exclusion turned on)? Also, you'd want >> to keep your indexes on the individual partitions, of course.. That >> should improve query time quite a bit since it should only be hitting >> the partitions where the data might be. >> >>Stephen >> >> -BEGIN PGP SIGNATURE- >> Version: GnuPG v1.4.9 (GNU/Linux) >> >> iEYEARECAAYFAkkpWpEACgkQrzgMPqB3kihvyACgm6ITdkodTqZvDLCjqavj9lkR >> w1oAnRrB1rbW+bF6Spr77VcH5/Mty4S6 >> =G7aX >> -END PGP SIGNATURE- > >Well, now that I've read the previous two emails better, I > understand what Scot and Stephen are talking about... > >So if I understood it correctly: I should build indexes only for > certain parts of the data (like previous full hour and so). But I see > a problem: wouldn't this lead to a lot of indices beeing created (24 / > hour, ~150 / week, ...)? No, not exactly what I'm talking about. I'm talking about pre-creating partitions that the data will soon go into (let's say a new one every hour) with indexes in place, and having a trigger that fires on insert to put the data into the right partition. Once that partition is no longer being inserted into, and we aren't running a bunch of queries on it, we migrate it to a historical partition. So, your table looks something like this all the time: |**|^^|##|##|##|##|$$...| Where: ** is a partition we have created in advance of needing it. ^^ is the partition we are currently writing to ## are the partitions we're still using in select queries a lot $$$... are the old data stuffed into the monolithic history table. When it's time to switch to writing to the new partition (i.e. **) we make a new one ahead of that, and the trigger starts writing to what was a ** partition but is now the new ^^, and the ^^ becomes a ##. At the end of the day / week whatever, we take all the old ## partitions and move their data into the $$$ and drop the ## partitions. Note that we only need to put data into an archive partition to keep from having hundreds or thousands of partitions. There's a limit of a few hundred partitions where things start getting slow again due to planner overhead. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [Q]updating multiple rows with Different values
On Sun, Nov 23, 2008 at 8:43 AM, V S P <[EMAIL PROTECTED]> wrote: > Thank you very much > this is exactly what I am looking for > > As well as the example provided > ' case when id=1 then 10 ' > > - it will work as well. > > > Now just one more question: > I will not have a lot of values to update (less than a 1000 > at a time) -- but the values for col1 will be text that is > up to 64K. So I will not be able to construct SQL strings > and just send them (because it will probably exceed the character > limits for the SQL statements). There are no character limits for sql statements in pgsql -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] mail list traffic
Gregory Stark wrote: > I would be curious to see the average lifespan of threads over time. I happen to have the mail archives stored in a database, so I've expressed this in SQL and below are some results for hackers and general, 2007-2008. count is the number of distinct threads whose oldest message is in the specified month. A thread is started as soon as a message has an In-Reply-To field pointing to an existing Message-Id. Results for pgsql-hackers: month | avg span | median span | count -+--+-+--- 2007-01 | 7 days 10:00:00 | 1 day 04:18:00 | 211 2007-02 | 7 days 10:00:00 | 1 day 00:23:48 | 186 2007-03 | 16 days 30:00:00 | 1 day 05:45:37 | 171 2007-04 | 13 days 26:00:00 | 19:07:00| 142 2007-05 | 19 days 30:00:00 | 1 day 04:46:36 | 122 2007-06 | 15 days 19:00:00 | 23:38:13| 111 2007-07 | 19 days 25:00:00 | 21:04:04| 106 2007-08 | 13 days 30:00:00 | 20:26:39| 133 2007-09 | 21 days 32:00:00 | 1 day 16:43:10 | 121 2007-10 | 13 days 19:00:00 | 17:23:24| 148 2007-11 | 16 days 15:00:00 | 16:23:00| 140 2007-12 | 17 days 16:00:00 | 1 day 07:28:05 |81 2008-01 | 13 days 12:00:00 | 23:02:33| 127 2008-02 | 9 days 11:00:00 | 12:44:28| 130 2008-03 | 10 days 14:00:00 | 22:57:18| 140 2008-04 | 10 days 14:00:00 | 1 day 00:32:34 | 132 2008-05 | 13 days 09:00:00 | 1 day 20:57:57 | 113 2008-06 | 7 days 27:00:00 | 1 day 05:42:46 | 102 2008-07 | 13 days 26:00:00 | 2 days 07:43:34 | 133 2008-08 | 9 days 33:00:00 | 1 day 07:47:09 | 121 2008-09 | 7 days 25:00:00 | 1 day 19:00:50 | 125 2008-10 | 6 days 14:00:00 | 1 day 10:31:01 | 178 Results for pgsql-general: month |avg span | median span | count -+-+-+--- 2007-01 | 1 day 25:00:00 | 10:57:11| 329 2007-02 | 2 days 28:00:00 | 10:50:38| 295 2007-03 | 3 days 08:00:00 | 14:54:08| 310 2007-04 | 6 days 18:00:00 | 17:40:55| 244 2007-05 | 3 days 22:00:00 | 16:43:54| 287 2007-06 | 2 days 13:00:00 | 11:26:46| 297 2007-07 | 2 days 19:00:00 | 11:59:40| 263 2007-08 | 3 days 14:00:00 | 16:35:16| 335 2007-09 | 3 days 14:00:00 | 13:23:09| 245 2007-10 | 2 days 16:00:00 | 08:46:09| 302 2007-11 | 3 days 07:00:00 | 08:28:06| 294 2007-12 | 2 days 31:00:00 | 10:25:14| 255 2008-01 | 2 days 14:00:00 | 13:23:12| 248 2008-02 | 2 days 14:00:00 | 10:02:16| 257 2008-03 | 1 day 25:00:00 | 13:20:06| 245 2008-04 | 1 day 30:00:00 | 08:26:06| 238 2008-05 | 3 days 22:00:00 | 18:58:27| 211 2008-06 | 2 days 24:00:00 | 14:46:02| 191 2008-07 | 1 day 29:00:00 | 10:37:17| 221 2008-08 | 1 day 22:00:00 | 14:14:45| 205 2008-09 | 1 day 24:00:00 | 14:26:26| 202 2008-10 | 1 day 19:00:00 | 12:32:56| 219 "median span" is the median computed with the pl/R median function applied to intervals as a number of seconds and then cast back to intervals for display. I believe the median is good to mitigate the contribution of messages with wrong dates and posters that reply to very old messages. And median span appears to differs a lot from the average span. If people feel like playing with the database to build other queries, feel free to bug me off-list about it. I can arrange to make a dump available or share the scripts to build it yourself from the mailboxes archives. Best regards, -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres mail list traffic over time
Tom Lane wrote: > FWIW, this project has always been pretty diversified geographically; > we've had major contributors in Russia, Japan, and Australia for as far > back as I can remember, not just Europe and the Americas. I think there > are more people now, but I'm not convinced that the distribution has > changed much. How about getting a new version of the world map showing developer's location? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres mail list traffic over time
On 23/11/2008 20:58, Alvaro Herrera wrote: > How about getting a new version of the world map showing developer's > location? Cool! Definitely +1 if we can show contributors to the list generally, not just developers. Ray. -- Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland [EMAIL PROTECTED] Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals -- -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] mail list traffic
Daniel Verite wrote: > Gregory Stark wrote: > > > I would be curious to see the average lifespan of threads over time. > > I happen to have the mail archives stored in a database, [...] When I saw the manitou-mail.org stuff some days ago I was curious -- how feasible would it be to host our web archives using a database of some sort, instead of the current mbox-based Mhonarc installation we use, which is so full of problems and limitations? I wondered about using Oryx some time ago, and got in contact with Abhijit Menon-Sen to that end, but that never fructified. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Running postgresql as a VMware ESx client
Hi all, I was wondering whether anyone has had any experience running postgresql in a vm under ESx. VMware provides significant HA/DR oppurtunities and we would like to use it if we can. The DBase would be on a EMC SAN hosted LUN and the ESx servers would be dual Quad CPU HP DL-380/G5s. At this stage we would use iSCSI for SAN connectivity as our testing with MS-SQL has not indicated that FC is needed. We are getting a bit of push back from the external support agency who seem more than a little bit nervous about the environment. I would appreciate any comments. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Glen and Rosanne Eustace GodZone Internet Services, a division of AGRE Enterprises Ltd. P.O. Box 8020, Palmerston North, New Zealand 4446. Ph: +64 6 357 8168, Fax +64 6 357 8165, Mob: +64 21 424 015 http://www.godzone.net.nz "A Ministry specialising in providing low-cost Internet Services to NZ Christian Churches, Ministries and Organisations." -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] mail list traffic
Alvaro Herrera wrote: When I saw the manitou-mail.org stuff some days ago I was curious -- how feasible would it be to host our web archives using a database of some sort, instead of the current mbox-based Mhonarc installation we use, which is so full of problems and limitations? I wondered about using Oryx some time ago, and got in contact with Abhijit Menon-Sen to that end, but that never fructified. We are a DB project after all and hosting our own archives might make for a good example, eating our own dogfood so to speak. I would image the biggest problem is just finding someone who wants to take this on and maintain it. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running postgresql as a VMware ESx client
On Sun, Nov 23, 2008 at 5:28 PM, Glen Eustace <[EMAIL PROTECTED]> wrote: > Hi all, > > I was wondering whether anyone has had any experience running postgresql in > a vm under ESx. VMware provides significant HA/DR oppurtunities and we > would like to use it if we can. The DBase would be on a EMC SAN hosted LUN > and the ESx servers would be dual Quad CPU HP DL-380/G5s. At this stage we > would use iSCSI for SAN connectivity as our testing with MS-SQL has not > indicated that FC is needed. > > We are getting a bit of push back from the external support agency who seem > more than a little bit nervous about the environment. I would appreciate > any comments. Generally speaking, virtualization allows you to take a bunch of low powered servers and make them live in one big box saving money on electricity and management. Generally speaking, database sers are big powerful boxes with lots of hard disks and gigs upon gigs of ram to handle terabytes of data. Those two things seem at odds to me. When I saw you post that the ESx servers would be dual quad core machines I immediately thought how nice that would be to run pgsql on, without vmware between it and the hardware. On to the real issue. PostgreSQL, like any database, requires certain guarantees about the data it's writing to the disks actually being written when the OS says it was. With a SAN and vmware, you now have a few extra layers that could like to your database. In order to see if things are working properly, you need to do some semi-destructive testing. Meaning you need to start up a few hundred clients running read/write transactions, force a checkpoint, and pull the plug on various parts (or all) of your system and see if it can come back up. Do not trust UPSes, generators, power conditioners etc. They can and do ALL fail sometimes. I've seen it happen a few times myself, when systems that could not possibly lose power lost power. If your system can't guarantee data integrity in case of power loss then I would push back on it too. What, exactly, as you looking to gain by running pgsql under vmware on such hardware? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Using Postgres to store high volume streams of sensor readings
>Yes, the figures are like this: >* average number of raw inserts / second (without any optimization > or previous aggregation): #clients (~ 100 thousand) * #sensors (~ 10) > / 6seconds = 166 thousand inserts / second... this is average? 166 000 * 20 bytes per record * 86400 seconds per day = 280GB / day , not counting indices. What is the time span You want to have the data from? Greetings Marcin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running postgresql as a VMware ESx client
Generally speaking, virtualization allows you to take a bunch of low powered servers and make them live in one big box saving money on electricity and management. Generally speaking, database sers are big powerful boxes with lots of hard disks and gigs upon gigs of ram to handle terabytes of data. Those two things seem at odds to me. If one is handling databases with Terabytes of data and 1000s of connections, I would agree. We will be looking at 100s of Megabytes max and possible several hundred connections. A much smaller workload. When I saw you post that the ESx servers would be dual quad core machines I immediately thought how nice that would be to run pgsql on, without vmware between it and the hardware. To pilot this project we were going to run just the single VM on the server, if there are performance issues we can always rebuild and put the OS directly on the hardware rather than the ESx hypervisor. What, exactly, as you looking to gain by running pgsql under vmware on such hardware? Mobility, in the HA/DR sense. Being able to vmotion the server while it is running to rectify hardware issues or perform upgrades. To bring the server up at a DR site automagically using SRM. And yes, we have still to investigate the whole crash consistency stuff with SRM. This database is heavily biased toward reads rather than writes. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Glen and Rosanne Eustace GodZone Internet Services, a division of AGRE Enterprises Ltd. P.O. Box 8020, Palmerston North, New Zealand 4446. Ph: +64 6 357 8168, Fax +64 6 357 8165, Mob: +64 21 424 015 http://www.godzone.net.nz "A Ministry specialising in providing low-cost Internet Services to NZ Christian Churches, Ministries and Organisations." -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running postgresql as a VMware ESx client
On Sun, Nov 23, 2008 at 6:47 PM, Glen Eustace <[EMAIL PROTECTED]> wrote: > >> Generally speaking, virtualization allows you to take a bunch of low >> powered servers and make them live in one big box saving money on >> electricity and management. Generally speaking, database sers are big >> powerful boxes with lots of hard disks and gigs upon gigs of ram to >> handle terabytes of data. Those two things seem at odds to me. > > If one is handling databases with Terabytes of data and 1000s of > connections, I would agree. We will be looking at 100s of Megabytes max and > possible several hundred connections. A much smaller workload. Yeah, you're not really looking at heavy lifting here then. Should be fine. > >> What, exactly, as you looking to gain by running pgsql under vmware on >> such hardware? > > Mobility, in the HA/DR sense. Being able to vmotion the server while it is > running to rectify hardware issues or perform upgrades. To bring the server > up at a DR site automagically using SRM. And yes, we have still to > investigate the whole crash consistency stuff with SRM. This database is > heavily biased toward reads rather than writes. Given the light load you're looking at running, you'll likely be ok as long as there aren't any issues with crash recovery. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Using Postgres to store high volume streams of sensor readings
On Mon, Nov 24, 2008 at 3:42 AM, marcin mank <[EMAIL PROTECTED]> wrote: >>Yes, the figures are like this: >>* average number of raw inserts / second (without any optimization >> or previous aggregation): #clients (~ 100 thousand) * #sensors (~ 10) >> / 6seconds = 166 thousand inserts / second... > > this is average? > 166 000 * 20 bytes per record * 86400 seconds per day = 280GB / day , > not counting indices. > > What is the time span You want to have the data from? > > Greetings > Marcin Well I'm not sure for the archival period... Maybe a day, maybe a week... For the moment I'm just struggling with the insert speed. (We could also use sharding -- horizontal partitioning on different machines -- and this wourd reduce the load...) Ciprian. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Returning schema name with table name
Hi, my standard query (adapted to 1mb size) is: select t.spcname as "tablespace" , pg_get_userbyid(c.relowner) as "owner" , n.nspname as "schema" , relname::text as "name" , pg_size_pretty(pg_total_relation_size(c.oid))::text as "total size" , case when c.relkind='i' then 'index' when c.relkind='t' then 'toast' when c.relkind='r' then 'table' when c.relkind='v' then 'view' when c.relkind='c' then 'composite type' when c.relkind='S' then 'sequence' else c.relkind::text end as "type" from pg_class c left join pg_namespace n on n.oid = c.relnamespace left join pg_tablespace t on t.oid = c.reltablespace where (pg_total_relation_size(c.oid)>>20)>0 and c.relkind!='t' order by c.relkind desc, pg_total_relation_size(c.oid) desc Andrus schrieb: SELECT oid, relname::char(35) as Table_Name, pg_size_pretty(pg_total_relation_size(oid))::VARCHAR(15) as Total_Table_Size FROM pg_class where pg_total_relation_size(oid)/(1024*1024)>0 ORDER BY pg_total_relation_size(oid) desc returns table names with size greater than 1 MB How to modify this so that schema name is also returned? I have lot of tables with same name and thus this output is difficult to understand. pg_class seems not contain schema names. Andrus. -- Thomas Markus proventis GmbH | Zimmerstr. 79-81 | D-10117 Berlin | Tel +49 (0)30 2936399-22 | Fax -50 | [EMAIL PROTECTED] - Geschäftsführer: Norman Frischmuth | Sitz: Berlin Handelsregister: AG Berlin-Charlottenburg, HR 82917 - Blue Ant-webbasiertes Projektmanagement - aktuelle Termine 2008: http://www.proventis.net/website/live/blueant/veranstaltungen.html begin:vcard fn:Thomas Markus n:Markus;Thomas org:proventis GmbH adr:;;Zimmerstr. 79-80;Berlin;Berlin;10117;Germany email;internet:[EMAIL PROTECTED] tel;work:+49 30 29 36 399 22 x-mozilla-html:FALSE url:http://www.proventis.net version:2.1 end:vcard -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] PgAgent Job Scehduler is NOT running
Dear all, I installed PgAgent and started its service and successfully scheduled a backup and got 100% result. Now the same Job is not working even I reinstalled PgAgent but failed to get result. Regards, Abdul Rehman.