On Wed, 15 Jan 2014 21:37:16 -0500 Robert Haas <robertmh...@gmail.com> wrote:
> On Wed, Jan 15, 2014 at 8:41 PM, Jan Kara <j...@suse.cz> wrote: > > On Wed 15-01-14 10:12:38, Robert Haas wrote: > >> On Wed, Jan 15, 2014 at 4:35 AM, Jan Kara <j...@suse.cz> wrote: > >> > Filesystems could in theory provide facility like atomic write (at least > >> > up > >> > to a certain size say in MB range) but it's not so easy and when there > >> > are > >> > no strong usecases fs people are reluctant to make their code more > >> > complex > >> > unnecessarily. OTOH without widespread atomic write support I understand > >> > application developers have similar stance. So it's kind of chicken and > >> > egg > >> > problem. BTW, e.g. ext3/4 has quite a bit of the infrastructure in place > >> > due to its data=journal mode so if someone on the PostgreSQL side wanted > >> > to > >> > research on this, knitting some experimental ext4 patches should be > >> > doable. > >> > >> Atomic 8kB writes would improve performance for us quite a lot. Full > >> page writes to WAL are very expensive. I don't remember what > >> percentage of write-ahead log traffic that accounts for, but it's not > >> small. > > OK, and do you need atomic writes on per-IO basis or per-file is enough? > > It basically boils down to - is all or most of IO to a file going to be > > atomic or it's a smaller fraction? > > The write-ahead log wouldn't need it, but data files writes would. So > we'd need it a lot, but not for absolutely everything. > > For any given file, we'd either care about writes being atomic, or we > wouldn't. > Just getting caught up on this thread. One thing that you're just now getting to here is that the different types of files in the DB have different needs. It might be good to outline each type of file (WAL, data files, tmp files), what sort of I/O patterns are typically done to them, and what sort of "special needs" they have (atomicity or whatever). Then we could treat each file type as a separate problem, which may make some of these problems easier to solve. For instance, typically a WAL would be fairly sequential I/O, whereas the data files are almost certainly random. It may make sense to consider DIO for some of these use-cases, even if it's not suitable everywhere. For tempfiles, it may make sense to consider housing those on tmpfs. They wouldn't go to disk at all that way, but if there is mem pressure they could get swapped out (maybe this is standard practice already -- I don't know). > > As Dave notes, unless there is HW support (which is coming with newest > > solid state drives), ext4/xfs will have to implement this by writing data > > to a filesystem journal and after transaction commit checkpointing them to > > a final location. Which is exactly what you do with your WAL logs so > > it's not clear it will be a performance win. But it is easy enough to code > > for ext4 that I'm willing to try... > > Yeah, hardware support would be great. > -- Jeff Layton <jlay...@redhat.com> -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers