On Sun, Mar 29, 2015 at 11:56 AM, Andres Freund <and...@2ndquadrant.com> wrote: > I'm not sure whether the above is the best solution however. For one I > think it's not necessarily a good idea to opencode this in hio.c - I've > not observed it, but this probably can happen for btrees and such as > well. For another, this is still a exclusive lock while we're doing IO: > smgrextend() wants a page to write out, so we have to be careful not to > overwrite things. > I think relaxing a global lock will fix the contention mostly. However, several people suggested that extending with many pages have other benefits. This hints for a more fundamental change in our storage model. Currently we map one file per relation. While it is simple and robust, considering partitioned table, maybe later columnar storage are integrated into the core, this model needs some further thoughts. Think about a 1000 partitioned table with 100 columns: that is 100K files, no to speak of other forks - surely we can continue challenging file system's limit or playing around vfds, but we have a chance now to think ahead.
Most commercial database employs a DMS storage model, where it manages object mapping and freespace itself. So different objects are sharing storage within several files. Surely it has historic reasons, but it has several advantages over current model: - remove fd pressure - remove double buffering (by introducing ADIO) - controlled layout and access pattern (sequential and read-ahead) - better quota management - performance potentially better Considering platforms supported and the stableness period needed, we shall support both current storage model and DMS model. I will stop here to see if this deserves further discussion. Regards, Qingqing -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers