Re: [GENERAL] Re: 4 billion record limit?
Mitch Vincent wrote: > > There is something else that many aren't considering. In every application > I've ever written to use any database I use ID numbers of my own making, > always they're integer. 4 billion is the limit on any integer field, not > just the OID so there are limitations everyone should realize when using any > integer for any kind of record identification purposes.. > That's an excellent point, especially considering that *sequences* use an integer to hold their max_value, which is by default 2,147,483,647. You cannot go larger than that, either. I guess it's constrained to be positive. So OIDs give you more potential unique values than sequences, far as I can tell.
Re: [GENERAL] 4 billion record limit?
Thomas Lockhart wrote: > > > FWIW, I checked into MySQL, and as far as I can tell, they have nothing > > like this implicit 4 billion transactional "limit". So maybe competitive > > spirit will drive the postgres hackers to fix this problem sooner than > > later. ;-) > > We have *never* had a report of someone pushing this 4GB limit, and > theoretical problems usually go into the long-term development plan, not > in the "OHMYGODITSBROKEN" list. That's absolutely true, and I'm sorry for pushing your MySQL button. I'm not in some sort of panic to get this fixed. I just raised this issue to (1) see if I was the only one who noticed/cared and (2) learn some details to see if/what limits were really there. > Tom Lane wrote: > > Hmph. Has anyone ever come close to shoving 4G rows into MySQL? Maybe > they just haven't documented their limits... The guantlet has been dropped! Looks like I have my work cut out for me, this weekend. :-) Again, just to clarify, I'm not concerned about a database holding 4 billion records at once. I'm concerned about performing 4,000,000,000 inserts and 3,999,999,000 deletes over the course of a few years. I just ran the numbers and that's about 50 transactions/second on an international web site active 24 hours/day for 3 years. Put 5 such sites on one server and you're down to 10 trans/sec per site. Ramp up to 30 trans/sec/site and you're down to 1 year. Juggle these numbers however you want; it's not that far fetched. Sure, it has a lot to do with the application. Most databases are read much more than they're written, and of course a SELECT doesn't chew up an OID. But every INSERT does. Any application that updates information frequently can be vulnerable to this problem. With my table structure, it happens to be much easier to DELETE and INSERT inside a transaction than to UPDATE. And every "row" of customer-supplied data could turn into 100 INSERTs, so I'm accelerating toward that 4G limit 100 times faster. Bottom line: I can recycle my own IDs if I have to. But I need to know whether I should change my table structure and/or code to conserve OIDs, an internal system identifier particular to the RDBMS I chose. Considering the main reason I chose Postgres was for scalability, this seems a little ironic. -- Paul Caskey [EMAIL PROTECTED] Software Engineer New Mexico Software 5041 Indian School NE Albuquerque, NM 87110 --
Re: [GENERAL] Re: 4 billion record limit?
On Fri, Jul 28, 2000 at 11:48:10AM -0500, Keith G. Murphy wrote: > Mitch Vincent wrote: > > > > There is something else that many aren't considering. In every application > > I've ever written to use any database I use ID numbers of my own making, > > always they're integer. 4 billion is the limit on any integer field, not > > just the OID so there are limitations everyone should realize when using any > > integer for any kind of record identification purposes.. > > > That's an excellent point, especially considering that *sequences* use > an integer to hold their max_value, which is by default 2,147,483,647. > You cannot go larger than that, either. I guess it's constrained to be > positive. So OIDs give you more potential unique values than sequences, > far as I can tell. However, in each case, you've got a per table (per field, really) limit, not a per database. Not to mention that there are work arounds: two int fields form a nice 64 bit compund key, without any need for a 64 bit int custom type. Admittedly cumbersome to use, but standard SQL. The sequence is a bit more or a problem, but since it's non standard SQL anyway, writing your own sequence that uses a numeric counter gives you potentially infinite serials. Ross -- Ross J. Reedstrom, Ph.D., <[EMAIL PROTECTED]> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
[GENERAL] PG vs. Oracle for larger databases
How suitable is PG for doing larger databases? The need I am considering would be a financial database that does maybe up to 100k transactions/day. Obviously, it needs to be very reliable, and have minimal scheduled, and no unscheduled downtime. Should this project be on Oracle or Postgres? thankks, e
Re: [GENERAL] performance help
Ernie <[EMAIL PROTECTED]> writes: > This query is very fast. > > cw=# SELECT distinct n.news_id, headline, link, to_char(created, > 'mm-dd-hh24:mi'), > cw-# created FROM news_article_summary n, news_cat nc WHERE n.news_id = > nc.news_id AND > cw-# created > CURRENT_TIMESTAMP-30 AND nc.code_id > cw-# > in(14,227,326,321,327,4301,5179,5100,585,5175,567,5185,568,5381,5109,554,5621,5 > 462, > cw(# 597,5324,5117,569,5142,570,5327,571,5167,5481,5145) ORDER BY created desc; > > Unique(cost=60322.14..60559.66 rows=1900 width=48) > -> Sort (cost=60322.14..60322.14 rows=19001 width=48) > -> Nested Loop (cost=0.00..58651.80 rows=19001 width=48) > -> Seq Scan on news_article_summary n (cost=0.00..416.14 > rows=1898 width=36) > -> Index Scan using news_cat_news_id on news_cat nc > (cost=0.00..30.53 rows=10 width=12) > > And here is the query I'm having problems with (slow) > cw=# SELECT distinct n.news_id, headline, link, > cw-# to_char(created, 'mm-dd- hh24:mi'), > cw-# created FROM news_article_summary n, news_cat nc > cw-# WHERE n.news_id = nc.news_id AND created > CURRENT_TIMESTAMP-30 AND > cw-# nc.code_id in(4261,4182) ORDER BY created desc; > > Unique(cost=35162.58..35181.27 rows=150 width=48) > -> Sort (cost=35162.58..35162.58 rows=1496 width=48) > -> Nested Loop (cost=0.00..35083.71 rows=1496 width=48) > -> Index Scan using news_cat_code_id, news_cat_code_id on > news_cat nc (cost=0.00..55.31 rows=79 width=12) > -> Seq Scan on news_article_summary n (cost=0.00..416.14 > rows=1898 width=36) The difference evidently is which table is scanned as the outside of the nested loop. The first plan says "scan news_article_summary sequentially, and for each row that passes the WHERE clauses that mention only that table, probe into news_cat_news_id for the row(s) that match by news_id; then check the remaining WHERE clauses on the combined row(s)." The second plan says "probe into news_cat_news_id for the row(s) that have the requested code_id values, and for each one scan news_article_summary sequentially to locate matching rows." If there are a lot of matches for nc.code_id in(4261,4182) then the second plan would scan news_article_summary many times, which'd account for it being slow. Are news_id and code_id unique columns? I am guessing from context that at least news_id might be. The planner does not seem to know that, judging from its row-count guesses. Perhaps all you need is a VACUUM ANALYZE so that the planner becomes aware the column is unique. regards, tom lane
[GENERAL] Database Corruption
Looks like one of my tables got corrupted. Can someone explain how to recover from this?? Trying to drop the table is not working...Postgres hangs. Any help is appreciated. Arthur
Re: [GENERAL] PG vs. Oracle for larger databases
> How suitable is PG for doing larger databases? The need I am > considering would be a financial database that does maybe up to 100k > transactions/day. In a day? I think a lot of us do that much in an hour > Obviously, it needs to be very reliable, and have > minimal scheduled, and no unscheduled downtime. Should this project > be on Oracle or Postgres? The only "unscheduled" downtime we've had with postgreSQL was when one of our programmers ran the disk out of space, which will give any database server fits. ; ) steve
[GENERAL] Thanks and questions...
I'm new to postgres, but I've quickly become a big fan. Thank you for such a great project and I hope in the future to be able to contribute to the effort. I'm a newbie to the list and as such, I haven't have much of a change to lurk. I hope my explanation isn't too long and my questions aren't too forward. I am currently working on a project where I need to create a program that will create a table, add rows to that table, and update another table with info about the newly created table. For those who may know, the information is generated by something that is very much like cflowd. Essentially, every hour (or what ever you specify), a compressed, pipe-delimited flat file is written to disk with information about network usage. This is happening on a beefy computer - ultra sparc, 4x400Mhz processors, 1G of RAM. My goal is to import these files into the database. During the busiest times of the day, I expect that there will be 100,000/hr in a month or so. The two issues are epoch time and performance. I've read in the mailing list archives and read a similar thread: "Importing data w/ Unix timestamp." In a previous project, I had a flat file, I imported it into a temporary table, and then moved it to it real table to add timestamp info. I did this using perl DBI. While the performance on that that was great, it would be unacceptable for 100,000 insert a hour. What approach would be the most efficient way to accomplish this goal? With what language or tools would you recommend? If I were to leave the time as a int4, epoch time, what would the select look like to return other time formats? Thanks for your help!! -g2 Oh... a couple off topic questions. Where can I read more about libpgeasy? (Bruce: thanks for writing this and your book!) Does any one know of any web sites that would have more examples of postgresql tasks, projects, etc. ?
Re: [GENERAL] Thanks and questions...
George Robinson II <[EMAIL PROTECTED]> writes: > What approach would be the most efficient way to accomplish this goal? > With what language or tools would you recommend? If I were to leave the > time as a int4, epoch time, what would the select look like to return > other time formats? Presently the easiest way to get from Unix time to a stored timestamp datum is to coerce to abstime first. regression=# create table foo (f1 timestamp); CREATE -- this doesn't work: regression=# insert into foo values(964824656); ERROR: Attribute 'f1' is of type 'timestamp' but expression is of type 'int4' You will need to rewrite or cast the expression -- but this does: regression=# insert into foo values(abstime(964824656)); INSERT 308042 1 regression=# select * from foo; f1 2000-07-28 18:50:56-04 (1 row) I don't think this'd work in the context of a COPY command, unfortunately, but it works fine in an INSERT. regards, tom lane
[GENERAL] Hopefully simple date conversion question
I have a legacy database that I am porting to PostgreSQL. One of the fields is an integer column that actually represents a date. It is represented as the number of days since July 1, 1867. What I am wondering is: Is there a function or other means of getting this integer converted into a real date field? I hope the answer isn't too obvious. I did do some looking into this, but I'm fairly new to Postgres. Thanks in advance for the help. Jason
Re: [GENERAL] Thanks and questions...
On Fri, Jul 28, 2000 at 06:53:41PM -0400, Tom Lane wrote: > George Robinson II <[EMAIL PROTECTED]> writes: > > What approach would be the most efficient way to accomplish this goal? > > With what language or tools would you recommend? If I were to leave the > > time as a int4, epoch time, what would the select look like to return > > other time formats? > > Presently the easiest way to get from Unix time to a stored timestamp > datum is to coerce to abstime first. > > regression=# insert into foo values(abstime(964824656)); > > I don't think this'd work in the context of a COPY command, > unfortunately, but it works fine in an INSERT. > > regards, tom lane Would it maybe be easier to store in the table as int4, then coerce on the way out. That way you could use copy.
Re: [GENERAL] Hopefully simple date conversion question
"Jason C. Pion" <[EMAIL PROTECTED]> writes: > I have a legacy database that I am porting to PostgreSQL. One of the > fields is an integer column that actually represents a date. It is > represented as the number of days since July 1, 1867. > What I am wondering is: Is there a function or other means of getting this > integer converted into a real date field? There might be an easier way, but the first thought that comes to mind is * convert the value to Unix timestamp notation (subtract off enough days to make 0 correspond to 1/1/1970, then multiply by 24*60*60). * coerce the resulting integer to abstime and then to whatever datatype you want to use (timestamp or date, likely). This will not work if you have any dates before about 1900, however, since the Unix timestamp value would underflow. regards, tom lane
RE: [GENERAL] Re: 4 billion record limit?
> > That's an excellent point, especially considering that *sequences* use > > an integer to hold their max_value, which is by default 2,147,483,647. > > You cannot go larger than that, either. I guess it's constrained to be > > positive. So OIDs give you more potential unique values than sequences, > > far as I can tell. What about postgres' int8 type which holds up to +/- 9 billion billion or something. - Andrew
[GENERAL] Hmm ... shouldn't path_distance be MIN distance not MAX distance?
I just noticed that path_distance() in geo_ops.c (the "<->" operator for path datatype) claims to be computing the minimum distance between any two line segments of the two paths, but actually it's computing the maximum such distance. Isn't this broken? regards, tom lane
[GENERAL] Manual pages added to book
I have added the PostgreSQL manual pages to the appendix of my book. I will be adding an index once the publisher is done proofreading it. All reports I get are that it looks good. Addison-Wesley will be printing this book directly from a Latex-generated Postscript file that I provide. I have also sent a PDF with crop marks to the publisher for them to test. I continue to get suggestions from readers, and I appreciate it. These comments have made the book writing process much more pleasurable. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026