On Wed 15-01-14 21:37:16, Robert Haas wrote: > On Wed, Jan 15, 2014 at 8:41 PM, Jan Kara <j...@suse.cz> wrote: > > On Wed 15-01-14 10:12:38, Robert Haas wrote: > >> On Wed, Jan 15, 2014 at 4:35 AM, Jan Kara <j...@suse.cz> wrote: > >> > Filesystems could in theory provide facility like atomic write (at least > >> > up > >> > to a certain size say in MB range) but it's not so easy and when there > >> > are > >> > no strong usecases fs people are reluctant to make their code more > >> > complex > >> > unnecessarily. OTOH without widespread atomic write support I understand > >> > application developers have similar stance. So it's kind of chicken and > >> > egg > >> > problem. BTW, e.g. ext3/4 has quite a bit of the infrastructure in place > >> > due to its data=journal mode so if someone on the PostgreSQL side wanted > >> > to > >> > research on this, knitting some experimental ext4 patches should be > >> > doable. > >> > >> Atomic 8kB writes would improve performance for us quite a lot. Full > >> page writes to WAL are very expensive. I don't remember what > >> percentage of write-ahead log traffic that accounts for, but it's not > >> small. > > OK, and do you need atomic writes on per-IO basis or per-file is enough? > > It basically boils down to - is all or most of IO to a file going to be > > atomic or it's a smaller fraction? > > The write-ahead log wouldn't need it, but data files writes would. So > we'd need it a lot, but not for absolutely everything. > > For any given file, we'd either care about writes being atomic, or we > wouldn't. OK, when you say that either all writes to a file should be atomic or none of them should be, then can you try the following: chattr +j <file>
will turn on data journalling for <file> on ext3/ext4 filesystem. Currently it *won't* guarantee the atomicity in all the cases but the performance will be very similar as if it would. You might also want to increase filesystem journal size with 'tune2fs -J size=XXX /dev/yyy' where XXX is desired journal size in MB. Default is 128 MB I think but with intensive data journalling you might want to have that in GB range. I'd be interested in hearing what impact does turning 'atomic write' support in PostgreSQL and using data journalling on ext4 have. Honza -- Jan Kara <j...@suse.cz> SUSE Labs, CR -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers