Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-24 Thread Tom Lane
Josh Berkus writes: > Tom, how does our heuristic sampling work? Is it pure random sampling, or > page sampling? Manfred probably remembers better than I do, but I think the idea is to approximate pure random sampling as best we can without actually examining every page of the table.

Re: [PERFORM] Disk filling, CPU filling, renegade inserts and deletes?

2005-04-24 Thread Richard Plotkin
Hi Tom, Thanks! That's exactly what it was. There was a discrepancy in the data that turned this into an endless loop. Everything has been running smoothly since I made a change. Thanks so much, Richard On Apr 23, 2005, at 12:50 PM, Tom Lane wrote: Richard Plotkin <[EMAIL PROTECTED]> writes:

Re: [PERFORM] Sort and index

2005-04-24 Thread Jim C. Nasby
On Sat, Apr 23, 2005 at 01:00:40AM -0400, Tom Lane wrote: > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > >> Feel free to propose better cost equations. > > > Where would I look in code to see what's used now? > > All the gold is hidden in src/backend/optimizer/path/costsize.c. > >

Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-24 Thread Josh Berkus
Folks, > I wonder if this paper has anything that might help: > http://www.stat.washington.edu/www/research/reports/1999/tr355.ps - if I > were more of a statistician I might be able to answer :-) Actually, that paper looks *really* promising. Does anyone here have enough math to solve for D(s

Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-24 Thread Josh Berkus
Andrew, > The math in the paper does not seem to look at very low levels of q (= > sample to pop ratio). Yes, I think that's the failing. Mind you, I did more testing and found out that for D/N ratios of 0.1 to 0.3, the formula only works within 5x accuracy (which I would consider acceptable)

Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-24 Thread Marko Ristola
Here is my opinion. I hope this helps. Maybe there is no one good formula: On boolean type, there are at most 3 distinct values. There is an upper bound for fornames in one country. There is an upper bound for last names in one country. There is a fixed number of states and postal codes in one coun

Re: [ODBC] [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon

2005-04-24 Thread Marko Ristola
Here is, how you can receive all one billion rows with pieces of 2048 rows. This changes PostgreSQL and ODBC behaviour: Change ODBC data source configuration in the following way: Fetch = 2048 UseDeclareFetch = 1 It does not create core dumps with 32 bit computers with billions of rows! This is a b