Re: [GENERAL] Re: 4 billion record limit?

2000-07-28 Thread Keith G. Murphy

Mitch Vincent wrote:
>  
> There is something else that many aren't considering. In every application
> I've ever written to use any database I use ID numbers of my own making,
> always they're integer. 4 billion is the limit on any integer field, not
> just the OID so there are limitations everyone should realize when using any
> integer for any kind of record identification purposes..
> 
That's an excellent point, especially considering that *sequences* use
an integer to hold their max_value, which is by default 2,147,483,647. 
You cannot go larger than that, either.  I guess it's constrained to be
positive.  So OIDs give you more potential unique values than sequences,
far as I can tell.



Re: [GENERAL] 4 billion record limit?

2000-07-28 Thread Paul Caskey

Thomas Lockhart wrote:
> 
> > FWIW, I checked into MySQL, and as far as I can tell, they have nothing
> > like this implicit 4 billion transactional "limit".  So maybe competitive
> > spirit will drive the postgres hackers to fix this problem sooner than
> > later.  ;-)
> 
> We have *never* had a report of someone pushing this 4GB limit, and
> theoretical problems usually go into the long-term development plan, not
> in the "OHMYGODITSBROKEN" list.

That's absolutely true, and I'm sorry for pushing your MySQL button.  I'm
not in some sort of panic to get this fixed.  I just raised this issue to
(1) see if I was the only one who noticed/cared and (2) learn some details
to see if/what limits were really there.

> Tom Lane wrote:
>
> Hmph.  Has anyone ever come close to shoving 4G rows into MySQL?  Maybe
> they just haven't documented their limits...

The guantlet has been dropped!  Looks like I have my work cut out for me,
this weekend.  :-)

Again, just to clarify, I'm not concerned about a database holding 4
billion records at once.  I'm concerned about performing 4,000,000,000
inserts and 3,999,999,000 deletes over the course of a few years.  I just
ran the numbers and that's about 50 transactions/second on an
international web site active 24 hours/day for 3 years.  Put 5 such sites
on one server and you're down to 10 trans/sec per site.  Ramp up to 30
trans/sec/site and you're down to 1 year.  Juggle these numbers however
you want; it's not that far fetched.

Sure, it has a lot to do with the application.  Most databases are read
much more than they're written, and of course a SELECT doesn't chew up an
OID.  But every INSERT does.  Any application that updates information
frequently can be vulnerable to this problem.  With my table structure, it
happens to be much easier to DELETE and INSERT inside a transaction than
to UPDATE.  And every "row" of customer-supplied data could turn into 100
INSERTs, so I'm accelerating toward that 4G limit 100 times faster.

Bottom line: I can recycle my own IDs if I have to.  But I need to know
whether I should change my table structure and/or code to conserve OIDs,
an internal system identifier particular to the RDBMS I chose. 
Considering the main reason I chose Postgres was for scalability, this
seems a little ironic.

-- 
Paul Caskey [EMAIL PROTECTED]   Software Engineer
New Mexico Software 5041 Indian School NE   Albuquerque, NM 87110
--



Re: [GENERAL] Re: 4 billion record limit?

2000-07-28 Thread Ross J. Reedstrom

On Fri, Jul 28, 2000 at 11:48:10AM -0500, Keith G. Murphy wrote:
> Mitch Vincent wrote:
> >  
> > There is something else that many aren't considering. In every application
> > I've ever written to use any database I use ID numbers of my own making,
> > always they're integer. 4 billion is the limit on any integer field, not
> > just the OID so there are limitations everyone should realize when using any
> > integer for any kind of record identification purposes..
> > 
> That's an excellent point, especially considering that *sequences* use
> an integer to hold their max_value, which is by default 2,147,483,647. 
> You cannot go larger than that, either.  I guess it's constrained to be
> positive.  So OIDs give you more potential unique values than sequences,
> far as I can tell.

However, in each case, you've got a per table (per field, really) limit,
not a per database. Not to mention that there are work arounds: two int
fields form a nice 64 bit compund key, without any need for a 64 bit
int custom type. Admittedly cumbersome to use, but standard SQL. The
sequence is a bit more or a problem, but since it's non standard SQL
anyway, writing your own sequence that uses a numeric counter gives you
potentially infinite serials.

Ross
-- 
Ross J. Reedstrom, Ph.D., <[EMAIL PROTECTED]> 
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005



[GENERAL] PG vs. Oracle for larger databases

2000-07-28 Thread Erich


How suitable is PG for doing larger databases?  The need I am
considering would be a financial database that does maybe up to 100k
transactions/day.  Obviously, it needs to be very reliable, and have
minimal scheduled, and no unscheduled downtime.  Should this project
be on Oracle or Postgres?

thankks,

e



Re: [GENERAL] performance help

2000-07-28 Thread Tom Lane

Ernie <[EMAIL PROTECTED]> writes:
> This query is very fast.
> 
> cw=# SELECT distinct n.news_id, headline, link, to_char(created,
> 'mm-dd-hh24:mi'),
> cw-# created FROM news_article_summary n, news_cat nc WHERE n.news_id =
> nc.news_id AND 
> cw-# created > CURRENT_TIMESTAMP-30 AND nc.code_id
> cw-#
> in(14,227,326,321,327,4301,5179,5100,585,5175,567,5185,568,5381,5109,554,5621,5
> 462,
> cw(# 597,5324,5117,569,5142,570,5327,571,5167,5481,5145) ORDER BY created desc;
> 
> Unique(cost=60322.14..60559.66 rows=1900 width=48)
>   ->  Sort  (cost=60322.14..60322.14 rows=19001 width=48)
>   ->  Nested Loop  (cost=0.00..58651.80 rows=19001 width=48)
> ->  Seq Scan on news_article_summary n  (cost=0.00..416.14
> rows=1898 width=36)
> ->  Index Scan using news_cat_news_id on news_cat nc 
> (cost=0.00..30.53 rows=10 width=12)
> 
> And here is the query I'm having problems with (slow)
> cw=# SELECT distinct n.news_id, headline, link,
> cw-# to_char(created, 'mm-dd- hh24:mi'),
> cw-# created FROM news_article_summary n, news_cat nc
> cw-# WHERE n.news_id = nc.news_id AND created > CURRENT_TIMESTAMP-30 AND
> cw-# nc.code_id in(4261,4182) ORDER BY created desc;
> 
> Unique(cost=35162.58..35181.27 rows=150 width=48)
>   ->  Sort  (cost=35162.58..35162.58 rows=1496 width=48)
>   ->  Nested Loop  (cost=0.00..35083.71 rows=1496 width=48)
> ->  Index Scan using news_cat_code_id, news_cat_code_id on
> news_cat nc  (cost=0.00..55.31 rows=79 width=12)
> ->  Seq Scan on news_article_summary n  (cost=0.00..416.14
> rows=1898 width=36)

The difference evidently is which table is scanned as the outside of the
nested loop.  The first plan says "scan news_article_summary
sequentially, and for each row that passes the WHERE clauses that
mention only that table, probe into news_cat_news_id for the row(s)
that match by news_id; then check the remaining WHERE clauses on the
combined row(s)."  The second plan says "probe into news_cat_news_id
for the row(s) that have the requested code_id values, and for each
one scan news_article_summary sequentially to locate matching rows."

If there are a lot of matches for nc.code_id in(4261,4182) then the
second plan would scan news_article_summary many times, which'd account
for it being slow.

Are news_id and code_id unique columns?  I am guessing from context
that at least news_id might be.  The planner does not seem to know
that, judging from its row-count guesses.  Perhaps all you need is
a VACUUM ANALYZE so that the planner becomes aware the column is
unique.

regards, tom lane



[GENERAL] Database Corruption

2000-07-28 Thread Arthur M. Kang

Looks like one of my tables got corrupted.  Can someone explain how to
recover from this??  Trying to drop the table is not working...Postgres
hangs.

Any help is appreciated.

Arthur




Re: [GENERAL] PG vs. Oracle for larger databases

2000-07-28 Thread Steve Wolfe

> How suitable is PG for doing larger databases?  The need I am
> considering would be a financial database that does maybe up to 100k
> transactions/day.

  In a day?  I think a lot of us do that much in an hour

> Obviously, it needs to be very reliable, and have
> minimal scheduled, and no unscheduled downtime.  Should this project
> be on Oracle or Postgres?

  The only "unscheduled" downtime we've had with postgreSQL was when one of
our programmers ran the disk out of space, which will give any database
server fits. ; )

steve





[GENERAL] Thanks and questions...

2000-07-28 Thread George Robinson II

I'm new to postgres, but I've quickly become a big fan.  Thank you for
such a great project and I hope in the future to be able to contribute
to the effort.
I'm a newbie to the list and as such, I haven't have much of a change
to lurk.  I hope my explanation isn't too long and my questions aren't
too forward.

I am currently working on a project where I need to create a program
that will create a table, add rows to that table, and update another
table with info about the newly created table.  For those who may know,
the information is generated by something that is very much like
cflowd.  Essentially, every hour (or what ever you specify), a
compressed, pipe-delimited flat file is written to disk with information
about network usage.  This is happening on a beefy computer - ultra
sparc, 4x400Mhz processors, 1G of RAM.

My goal is to import these files into the database.  During the busiest
times of the day, I expect that there will be 100,000/hr in a month or
so.

The two issues are epoch time and performance.  I've read in the
mailing list archives and read a similar thread: "Importing data w/ Unix
timestamp."  In a previous project, I had a flat file, I imported it
into a temporary table, and then moved it to it real table to add
timestamp info.  I did this using perl DBI.  While the performance on
that that was great, it would be unacceptable for 100,000 insert a hour.

What approach would be the most efficient way to accomplish this goal? 
With what language or tools would you recommend?  If I were to leave the
time as a int4, epoch time, what would the select look like to return
other time formats?

Thanks for your help!!

-g2

Oh... a couple off topic questions.  Where can I read more about
libpgeasy?  (Bruce: thanks for writing this and your book!)   Does any
one know of any web sites that would have more examples of postgresql
tasks, projects, etc. ?



Re: [GENERAL] Thanks and questions...

2000-07-28 Thread Tom Lane

George Robinson II <[EMAIL PROTECTED]> writes:
>   What approach would be the most efficient way to accomplish this goal? 
> With what language or tools would you recommend?  If I were to leave the
> time as a int4, epoch time, what would the select look like to return
> other time formats?

Presently the easiest way to get from Unix time to a stored timestamp
datum is to coerce to abstime first.

regression=# create table foo (f1 timestamp);
CREATE
-- this doesn't work:
regression=# insert into foo values(964824656);
ERROR:  Attribute 'f1' is of type 'timestamp' but expression is of type 'int4'
You will need to rewrite or cast the expression
-- but this does:
regression=# insert into foo values(abstime(964824656));
INSERT 308042 1
regression=# select * from foo;
   f1

 2000-07-28 18:50:56-04
(1 row)

I don't think this'd work in the context of a COPY command,
unfortunately, but it works fine in an INSERT.

regards, tom lane



[GENERAL] Hopefully simple date conversion question

2000-07-28 Thread Jason C. Pion

I have a legacy database that I am porting to PostgreSQL.  One of the
fields is an integer column that actually represents a date.  It is
represented as the number of days since July 1, 1867.

What I am wondering is: Is there a function or other means of getting this
integer converted into a real date field?  I hope the answer isn't too
obvious.  I did do some looking into this, but I'm fairly new to Postgres.

Thanks in advance for the help.

Jason




Re: [GENERAL] Thanks and questions...

2000-07-28 Thread Samuel Sieb

On Fri, Jul 28, 2000 at 06:53:41PM -0400, Tom Lane wrote:
> George Robinson II <[EMAIL PROTECTED]> writes:
> > What approach would be the most efficient way to accomplish this goal? 
> > With what language or tools would you recommend?  If I were to leave the
> > time as a int4, epoch time, what would the select look like to return
> > other time formats?
> 
> Presently the easiest way to get from Unix time to a stored timestamp
> datum is to coerce to abstime first.
> 
> regression=# insert into foo values(abstime(964824656));
> 
> I don't think this'd work in the context of a COPY command,
> unfortunately, but it works fine in an INSERT.
> 
>   regards, tom lane

Would it maybe be easier to store in the table as int4, then coerce on the
way out.  That way you could use copy.




Re: [GENERAL] Hopefully simple date conversion question

2000-07-28 Thread Tom Lane

"Jason C. Pion" <[EMAIL PROTECTED]> writes:
> I have a legacy database that I am porting to PostgreSQL.  One of the
> fields is an integer column that actually represents a date.  It is
> represented as the number of days since July 1, 1867.

> What I am wondering is: Is there a function or other means of getting this
> integer converted into a real date field?

There might be an easier way, but the first thought that comes to mind
is
  * convert the value to Unix timestamp notation (subtract off enough
days to make 0 correspond to 1/1/1970, then multiply by 24*60*60).
  * coerce the resulting integer to abstime and then to whatever
datatype you want to use (timestamp or date, likely).

This will not work if you have any dates before about 1900, however,
since the Unix timestamp value would underflow.

regards, tom lane



RE: [GENERAL] Re: 4 billion record limit?

2000-07-28 Thread Andrew Snow


> > That's an excellent point, especially considering that *sequences* use
> > an integer to hold their max_value, which is by default 2,147,483,647.
> > You cannot go larger than that, either.  I guess it's constrained to be
> > positive.  So OIDs give you more potential unique values than sequences,
> > far as I can tell.

What about postgres' int8 type which holds up to +/- 9 billion billion or
something.


- Andrew





[GENERAL] Hmm ... shouldn't path_distance be MIN distance not MAX distance?

2000-07-28 Thread Tom Lane

I just noticed that path_distance() in geo_ops.c (the "<->" operator
for path datatype) claims to be computing the minimum distance between
any two line segments of the two paths, but actually it's computing the
maximum such distance.

Isn't this broken?

regards, tom lane



[GENERAL] Manual pages added to book

2000-07-28 Thread Bruce Momjian

I have added the PostgreSQL manual pages to the appendix of my book.  I
will be adding an index once the publisher is done proofreading it.  All
reports I get are that it looks good.

Addison-Wesley will be printing this book directly from a
Latex-generated Postscript file that I provide. I have also sent a PDF
with crop marks to the publisher for them to test.

I continue to get suggestions from readers, and I appreciate it.  These
comments have made the book writing process much more pleasurable.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026