On Tue, Dec 1, 2015 at 3:06 PM, Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> wrote: > At Tue, 1 Dec 2015 11:53:35 +0900, Michael Paquier > <michael.paqu...@gmail.com> wrote in > <cab7npqsmenek7nqmwgsiyltsbrznjkx80tbx3qf6cqss49s...@mail.gmail.com> >> On Tue, Dec 1, 2015 at 11:11 AM, Kyotaro HORIGUCHI >> <horiguchi.kyot...@lab.ntt.co.jp> wrote: >> > Hello, I studied your latest patch. >> >> Thanks! >> >> > I feel quite uncomfortable that it solves the problem from a kind >> > of nature of unlogged object by arbitrary flagging which is not >> > fully corresponds to the nature. If we can deduce the necessity >> > of fsync from some nature, it would be preferable. >> >> INIT_FORKNUM is not something only related to unlogged relations, >> indexes use them as well. And that's actually >> If you look at for example BRIN indexes that do not sync immediately >> their INIT_FORKNUM when index is created, I think that we still are >> going to need a new flag to control the sync at WAL replay because >> startup process cannot know a relation's persistence, thing that we >> can know when the XLOG_FPI record is created. For BRIN indexes, we >> want particularly to not sync the INIT_FORKNUM when the relation is >> not an unlogged one. > > (The comment added in brinbuildempty looks wrong since it > actually doesn't fsync it immediately..) > > Hmm, I've already seen that, and having your explanation I wonder > why brinbuidempty issues WAL for what is not necessary to be > persistent at the mement. Isn't it breaking agreements about > Write Ahead Log? INIT_FORKNUM and unconditionally fsync'ing would > be equally tied excluding the anormally about WAL. (Except for > succeeding newpages.)
Alvaro, your thoughts regarding those lines? When building an empty INIT_FORKNUM for a brin index its data is saved into a shared buffer and not immediately synced into disk. Shouldn't that be necessary for at least unlogged relations? >> > In short, it seems to me that the reason to choose using >> > XLOG_FPI_FOR_SYNC here is only performance of processing >> > successive FPIs for INIT_FORKNUM. >> >> Yeah, there is a one-way link between this WAL record a INIT_FORKNUM. >> However please note that having a INIT_FORKNUM does not always imply >> that a sync is wanted. copy_relation_data is an example of that. > > As I wrote above, I suppose we should fix(?) the irregular > relationship between WAL and init fork of brin and so. Yep. >> > INIT_FORKNUM is generated only for unlogged tables and their >> > belongings. I suppose such successive fsyncs doesn't cause >> > observable performance drop assuming that the number of unlogged >> > tables and belongings is not so high, especially with smarter >> > storages. All we should do is that just fsync only for >> > INIT_FORKNUM's FPIs for the case. If the performance does matter >> > even so, we still can fsync the last md-file when any wal record >> > other than FPI for INIT_FORK comes. (But this would be a bit >> > complex..) >> >> Hm. If a system uses a bunch of temporary relations with brin index or >> other included I would not say so. For back branches we may have to do >> it unconditionally using INIT_FORKNUM, but having a control flag to >> have it only done for unlogged relations would leverage that. > > It could, and should do so. And if we take such systems with > bunch of temp relations as significant (I agree with this), > XLogRegisterBlock() looks to be able to register multiple blocks > into single wal record and we could eliminate arbitrary flagging > on individual FPI records using it. Is it possible? I thought about using a BKPBLOCK flag but all of them are already taken if that's what you meant. it seems cheaper to do that a record level... -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers