> Probably off-topic, but I think it's worth to see what astronomers are > doing with their very big spatial databases. For example, we are working > with more than 500,000,000 rows catalog and we use some special > transformation > of coordinates to integer numbers with preserving objects closeness. > I hope we could show postgresql is good enough to be used in astronomy > for very big catalogs. Currently, MS SQL is in use. > See http://www.sdss.jhu.edu/htm/ for details. We use another technique.
You know, I don't think a lot of people "get" the issues I was describing, or maybe they don't believe it, I don't know, but, I think that it would be a useful contrib project to create an 'analyze_special('table', 'column', 'method')' function that does a better job at calculating the stats for table that contain multiple trend waveforms. A separate function will probably work well as the trends within the data probably only apply to specific rows. It's interesting, because I don't think it needs to calculate a perfect representation of the data so much as better clue to its nature for the optimizer. When I get the time (or can get someone to pay me to do it) I'm going to try it. > > > Oleg > On Wed, 9 Feb 2005 [EMAIL PROTECTED] wrote: > >> I wrote a message caled "One Big trend vs multiple smaller trends in >> table >> statistics" that, I think, explains what we've been seeing. >> >> >>> [EMAIL PROTECTED] wrote: >>>> >>>> In this case, the behavior observed could be changed by altering the >>>> sample size for a table. I submit that an arbitrary fixed sample size >>>> is >>>> not a good base for the analyzer, but that the sample size should be >>>> based >>>> on the size of the table or some calculation of its deviation. >>>> >>> >>> Mark, >>> >>> Do you have any evidence that the Sample Size had anything to do >>> with the performance problem you're seeing? >> >> Sample size is only a bandaid for the issue, however, more samples >> always >> provide more information. >> >> >>> >>> I also do a lot with the complete Census/TIGER database. >>> >>> Every problem I have with the optimizer comes down to the >>> fact that the data is loaded (and ordered on disk) by >>> State/County FIPS codes, and then queried by zip-code >>> or by city name. Like this: >>> >>> Alabama 36101 [hundreds of pages with zip's in 36***] >>> Alaska 99686 [hundreds of pages with zip's in 9****] >>> Arizona 85701 [hundreds of pages with zip's in 855**] >>> >>> Note that the zip codes are *NOT* sequential. >> >> Again, read "One Big Trend..." and let me know what you think. I think >> it >> describes exactly the problem that we see. >> >> For now, the solution that works for me is to seriously up the value of >> "targrows" in analyze.c. It makes it take longer, and while the stats >> are >> not "correct" because they are not designed to detect these sorts of >> patterns, a larger sample allows them to be "less wrong" enough to give >> a >> better hint to the planner. >> >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 7: don't forget to increase your free space map settings >> > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to [EMAIL PROTECTED] so that your > message can get through to the mailing list cleanly > ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org