> > This has been proposed before, and always rejected. While you're
> > always welcome to provide a patch, I'm very doubtful it would be
> > accepted into the main product.
>
> The example given in this thread certainly isn't going to change
anybody's mind.
> "Hi, I propose reducing everybody'
> > Hmm. pqStrerror is defined in libpgport (which is linked into the
> > backend) as well as libpq. ISTM that libpq should not be
> linked with
> > -Wl,-bI:../../../src/backend/postgres.imp, since it's not
> intended to
> > be loaded into the backend. Without having looked at the code, I'm
> Now I've asked for the quickest path to detailed
> understanding of the pg IO subsystem. The goal being to get
> more up to speed on its coding details. Certainly not to
> annoy you or anyone else.
Basically pg does random 8k (compile time blocksize) reads/writes only.
Bitmap and sequentia
> > Is it reasonable to cancel and restart the vacuum process
periodically
> > (say every 12 hours) until it manages to complete the work? It takes
> > about 2 hours to do the table scan, and should get in about 10 hours
> > of index work each round.
If we started the vacuum with the indexes,
> Another possibility is optimizing for the special case of
> indexing on a partitioning key. In this case, index values
> would be very localized to one table, so just storing the
> table info on each index page (or something similar) would work well.
If you have the partitioning key in the i
> In my original example, a sequential scan of the 1TB of 2KB
> or 4KB records, => 250M or 500M records of data, being sorted
> on a binary value key will take ~1000x more time than reading
> in the ~1GB Btree I described that used a Key+RID (plus node
> pointers) representation of the data.
> I think the main problem with switching to visual studio
> project files is maintainabilty. (It's not easy to get all
I think the target should be a way to auto create those files with gmake
(maybe with mingw for configure).
The format of VS6 project and workspace files is pretty simple.
It
> > The bgwriter could "update" all columns of dead heap tuples in heap
> > pages to NULL and thus also gain free space without the need to
touch
> > the indexes.
> > The slot would stay used but it would need less space.
>
> Not unless it's running a transaction (consider TOAST updates).
Ok,
> > My wild guess is that deleting all index pointers for a removed
index
> > is more-or-less the same cost as creating new ones for
> > inserted/updated page.
>
> Only if you are willing to make the removal process
> recalculate the index keys from looking at the deleted tuple.
The bgwriter
> I ran a wal_buffer test series. It appears that increasing the
> wal_buffers is indeed very important for OLTP applications,
> potentially resulting in as much as a 15% average increase in
> transaction processing.
> What's interesting is that this is not just true for 8.1, it's true
for
>>> The point here is that fsync-off is only realistic for development
or
>>> playpen installations. You don't turn it off in a production
>>> machine, and I can't see that you'd turn off the full-page-write
>>> option either. So we have not solved anyone's performance problem.
>
>> Yes, thi
>> Only workable solution would imho be to write the LSN to each 512
byte
>> block (not that I am propagating that idea).
"Only workable" was a stupid formulation, I meant a solution that works
with
a LSN.
> We're not doing anything like that, as it would create an
> impossible space-managemen
> Here's an idea:
>
> We read the page that we would have backed up, calc the CRC and
> write a short WAL record with just the CRC, not the block. When
> we recover we re-read the database page, calc its CRC and
> compare it with the CRC from the transaction log. If they
> differ, we know tha
>> Are you sure about that? That would probably be the normal case, but
>> are you promised that the hardware will write all of the sectors of a
>> block in order?
>
> I don't think you can possibly assume that. If the block
> crosses a cylinder boundary then it's certainly an unsafe
> assum
> Escape processing would proceed as before, but the semantics would change to
> allow the use of different characters as the escape character, in addition
> to the special characters for delimiter and newline.
If you mean syntax to specify escape and delimiter (and newline ?),
that is a great a
> The problem I see with this proposal is that the buffer manager knows
> how to handle only a equally-sized pages. And the shared memory stuff
> gets sized according to size * num_pages. So what happens if a certain
> tablespace A with pagesize=X gets to have a lot of its pages cached,
> evicti
> I have not researched any deeper,but currently it fails with
>
> [EMAIL PROTECTED] postgresql-8.0.3]$
> CC=/opt/ibmcmp/vac/7.0/bin/xlc ./configure
> ..A..
> checking for int timezone... yes
> checking types of arguments for accept()... configure: error:
> could not determine argument types
T
> But to get the estimated cost ratio to match up with the actual cost
> ratio, we'd have to raise random_page_cost to nearly 70, which is a bit
> hard to credit. What was the platform being tested here?
Why ? Numbers for modern single disks are 1-2Mb/s 8k random and 50-120 Mb/s
sequential.
An
> >Incrementing random_page_cost from 4 (the default) to 5 causes the
> >planner to make a better decision.
>
> We have such a low default random_page_cost primarily to mask other
> problems in the optimizer, two of which are
>
> . multi-column index correlation
>
> . interpolation between min_
> > If we did not define
> > it that way, I think your example would have to error out --- how
> > would you choose which INSTEAD rule wins?
>
> The documentation says that they evaluate in alphabetical order by
> name. So I would expect that the first one to have its WHERE statement
> evaluate
> > It would keep the old table around while building the new, then grab
> > an exclusive lock to swap the two.
>
> Lock upgrading is right out.
It would need a whole of new family of "intent" locks, with different rules.
Andreas
---(end of broadcast)
> What that means is that neither the HAVING clause nor the targetlist
> can use any ungrouped columns except within aggregate calls; that is,
>
> select col from tab having 2>1
Informix:
select tabname from systables having 2 > 1;
294: The column (tabname) must be in the GROUP BY list.
> >Are you verifying that all the data that was committed was actually stored?
> >Or
> >just verifying that the database works properly after rebooting?
>
> I verified the data.
Does pg startup increase the xid by some amount (say 1000 xids) after crash ?
Else I think you would also need to rol
> >> One point that I no longer recall the reasoning behind is that xlog.c
> >> doesn't think O_SYNC is a preferable default over fsync.
> >
> >For larger (>8k) transactions O_SYNC|O_DIRECT is only good with the recent
> >pending patch to group WAL writes together. The fsync method gives the OS
> One point that I no longer recall the reasoning behind is that xlog.c
> doesn't think O_SYNC is a preferable default over fsync.
For larger (>8k) transactions O_SYNC|O_DIRECT is only good with the recent
pending patch to group WAL writes together. The fsync method gives the OS a
chance to do
> Would there be any value in incrementing by 2 for index accesses and 1
> for seq-scans/vacuums? Actually, it should probably be a ratio based on
> random_page_cost shouldn't it?
What happens with very small hot tables that are only a few pages and thus have
no index defined.
I think it woul
> And the user maintenance of updating those hints for every release of
> PostgreSQL as we improve the database engine.
I don't think so. Basically an optimizer hint simply raises or lowers the cost
of an index, mandates a certain join order, allows or disallows a seq scan ...
Imho it is not so
> I asked the question how do you get a record without going through an
> index, the answer was CTID, which unfortunately changes when the row is
> updated.
The ctid is a physical location of the row. On update a new tuple is written
in a new location, that is why the ctid changes. The old tuple
> > There's a very recent paper at
> > http://carmen.cs.uiuc.edu/~zchen9/paper/TPDS-final.ps on an alternative
> > to ARC which claims superior performance ...
>
> From a quick glance, this doesn't look applicable. The authors are
> discussing buffer replacement strategies for a multi-level cac
> >> FYI, IBM has applied for a patent on ARC (AFAICS the patent application
> >> is still pending, although the USPTO site is a little hard to grok):
> >
> >>
> http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220
> If we don't start where we left off, I am thinking if you do a lot of
> writes then do nothing, the next checkpoint would be huge because a lot
> of the LRU will be dirty because the bgwriter never got to it.
I think the problem is, that we don't see wether a "read hot"
page is also "write hot
> In general, I agree with Tom: I haven't seen many programs that use
> extended SELECT FOR UPDATE logic. However, the ones I have seen have
> been batch style programs written using a whole-table cursor - these
> latter ones have been designed for the cursor stability approach.
I think if we add
> Hmmm, I've not seen this. For example, with people who are having trouble
> with checkpoint spikes on Linux, I've taken to recommending that they call
> sync() (via cron) every 5-10 seconds (thanks, Bruce, for suggestion!).
> Believe it or not, this does help smooth out the spikes and give
> > Only if you redefine the meaning of bgwriter_percent. At present it's
> > defined by reference to the total number of dirty pages, and that can't
> > be known without collecting them all.
> >
> > If it were, say, a percentage of the total length of the T1/T2 lists,
> > then we'd have some ch
> The two alternative algorithms are similar, but have these
> differences:
> The former (option (2)) finds a constant number of dirty pages, though
> has varying search time.
This has the disadvantage of converging against 0 dirty pages.
A system that has less than maxpages dirty will write eve
> > > and stops early when eighter maxpages is reached or bgwriter_percent
> > > pages are scanned ?
> >
> > Only if you redefine the meaning of bgwriter_percent. At present it's
> > defined by reference to the total number of dirty pages, and that can't
> > be known without collecting them all.
> (2) Remove bgwriter_percent. I have yet to hear anyone argue that
> there's an actual need for bgwriter_percent in tuning
> bgwriter behavior,
One argument for it is to avoid writing very hot pages.
> (3) Change the meaning of bgwriter_percent, per Simon's proposal. Make
> it mean "the perce
>> However, one thing we can do is to try this in Makefile.aix:
>> # AIX needs threads for everything that uses libpq
>> LIBS += $(PTHREAD_LIBS)
>> That is going to enable thread libs for all linking including the
>> backend, but it might work.
> That is certainly wrong. The correct th
> OK, so does someone want to suggest why a library used to link libpq
> would also be needed to link binaries that use libpq? And with no cc_r
> it seems I have no idea how to get this working.
AIX does not pull in any libraries that a shared lib depends upon, not even
libc.
You only specify t
> I am going to discard these emails. We haven't solve the Win32 terminal
> server problem and I think it needs to be moved to the TODO list instead.
Yes, please do that. I do not think there is a problem on TS other than some
missing permissions. The patch was only intended to avoid starting 2
>> I think I recall that lseek may have a negative effect on some OS's
>> readahead calculations (probably only systems that cannot handle an
>> lseek to the next page eighter) ? Do you think we should cache the
>> last value to avoid the syscall ?
>
> We really can't, since the point of doing it
>> This is not true in my case, since I only "update statistics"/analyze
>> when the tables have representative content (i.e. not empty).
>
> I'm unsure why you feel you need a knob to defeat this. The only time
> when the plan would change from what you think of as the hand-tuned
> case is when
>> One possibility: vacuum already knows how many tuples it removed. We
>> could set reltuples equal to, say, the mean of the number-of-tuples-
>> after-vacuuming and the number-of-tuples-before. In a steady state
>> situation this would represent a fairly reasonable choice. In cases
>> where t
> rel->pages = RelationGetNumberOfBlocks(relation);
Is RelationGetNumberOfBlocks cheap enough that you can easily use it for the
optimizer ?
I myself have always preferred more stable estimates that only change
when told to. I never liked that vacuum (without analyze) and create index
chan
>> I am running of postgresql database servers with generally 30-50 users
>> at a time per server. I have noticed one thing for web based databases
>> that they fail to initialse a pg_connection connection every now and
>> again and return no error message at all.
I am thinking of the PG_SOM
>> Some other time maybe. Meanwhile, this patch ought to make it compile
>> more cleanly on Windows - not sure why I get errors there but not
>> Linux.
>
> Because getopt() is normally declared in unistd.h, not getopt.h (Windows
> being an exception?).
getopt is not in any standard Windows head
>> I like Kevin Brown's suggestion of writing out a temporary .txt file and
>> 'executing' it. It will follow the principle of least suprise for Windows
>> users.
>
> Note that the current default behavior (assuming you've not set EDITOR)
> is "vi foo.txt" which is quite unlikely to be helpful to a
>> So I thought I'd try a few scenarios, since I have it installed:
>> [ none of which work ]
>> So to answer your question, at least in part, there is no current good
>> behavior to emulate. At least on this version of CVS:
> I think this is fairly graphic proof that (1) a straight port without
> We could maybe compromise with (again, for Windows only) a policy like
> "double-quote unless the value already contains at least one
> double-quote". This should work conveniently for everybody except the
> user who wants EDITOR='gvim.exe -y' on Windows; he'll have to add
> some useless-lookin
>> The EDITOR variable seems to have a fairly standard meaning on Unix
>> systems. I've been using that EDITOR value for years without problems,
>> only when I use psql's \e once in a while it breaks. I don't think we
>> should deviate from what seems to be a standard practice.
>
> Agreed, no
> How are you planning to represent the association between MIN/MAX and
> particular index orderings in the system catalogs?
Don't we already have that info to decide whether an index handles
an "ORDER BY" without a sort node ?
Andreas
---(end of broadcast)-
> Just one question about the actual implementation of the patch - why are
> you setting the OS version *before* you call GetVersionEx()?
The Microsoft Example did a memset on the structure before calling void GetVersionEx().
Setting it to a version that needs the Global\ is only a safeguard aga
> > > o fix shared memory on Win2k terminal server
> > >
> > > We might be able to just mark this as not supported.
The shmem code works in a terminal server session with or without the patch.
Magnus had a different problem, probably permissions. Since I do not have a
non admin user (on a TS
> It makes no difference on any of my systems, so at least it doesn't
> completely solve the problem. I haven't heard any
> confirmation on wether
> it partially solves it.
It certainly does not solve any part of your problem. I think your problem
is a permissions problem.
It does however make
> >>> 300 secs (!) fs timeout is really broken.
> >>> Looks more like a locking or network timeout issue.
> >>> What error codes does unlink(3) return?
> >>
> >
> >> success.
> >
> >
> > Oops! 5min timeout for success is certainly problematic.
> >
> >
>
> You misunderstood. The 300 secs is not in
> >>o fix shared memory on Win2k terminal server
> >>
> >>We might be able to just mark this as not supported.
> >
> >I have attached a patch that I think fixes this. The problem I saw
> >and fixed is, that the shmem created in a terminal services client is not
> >visible to the console
> o fix shared memory on Win2k terminal server
>
> We might be able to just mark this as not supported.
I have attached a patch that I think fixes this. The problem I saw
and fixed is, that the shmem created in a terminal services client is not
visible to the console (or services.m
> > create index people_male_gay_ix on people (city) where gender = 'male' and
> > orientation = 'gay';
>
> You've forgotten part of my premise (based on a real case I discussed on IRC)
> that there are EIGHTEEN criteria columns.
That is why I said maybe :-) Whether it helps depends on the numb
> > The most nearly comparable thing is be the notion of "partial
> > indexes," where, supposing you had 60 region codes (e.g. - 50 US
> > states, 10 Canadian provinces), you might set up indices thus:
> For example, imagine you have a table on a dating website with 18 columns
> representing 18
> > If they are using tablespaces is it OK that anyone can see their
> > location?
>
> Good point. Should we obscure pg_tablespace similarly to
> what we do for pg_shadow?
Hmm, I can not see how a person with file access could not easily find the
file for a specific table without pg_tablespac
> Well, the question is how long must the individual databases retain
> state with which to answer "recover" requests. I don't like "forever",
> so I'm proposing that there should be an explicit command to say "you
> can forget about this gid".
I think it would be ok to forget the gid after:
61 matches
Mail list logo