Thomas Kellerer wrote: > Albe Laurenz schrieb am 17.12.2014 um 11:07: >> and the performance will be worse than reading files from the file system. > > There is a Microsoft research [1] (from 2006) which tested this "myth" using > SQL Server. > It showed that the database might actually be faster than the file system. > > As this topic comes up at my workplace every now and then as well, I created > a little web application > (Java/JDBC) to test this on Postgres and possibly other DBMS. > > Turns out the Postgres as well isn't really slower at this than the file > system. > > For small files around 50k both perform similar: the average time to read the > blob from a bytea column > was around 2ms whereas the average time to read the blob from the filesystem > was around 1ms. The test > uses 50 threads to read the blobs using the PK of the table. > > "Reading from the filesystem" means looking up the path for the file in the > database table and then > reading the file from the filesystem. > > For larger files around 250k Postgres was actually faster in my tests: 130ms > reading the bytea column > vs. 260ms reading the file from disk. > > The tests were done locally on my Windows laptop. > I didn't have time yet to do this on a Linux server. I expect the filesystem > to have some impact on > the figures and NTFS is not known for being blazingly fast. So maybe those > figures will change.
That must be some strangeness of the web application, que no? PostgreSQL must do everything that a direct file access does, right? Plus some extra processing (load the data into shared_buffers, ...). Given that, do you have any explanation for what you observed? > My tests however do not take into account the actual time it takes to send > the binary data from the > server to the client (=browser). It might well be possible that serving the > file through an Apache Web > Server directly is faster than serving the file through a JEE Servlet. My > intention was to measure the > raw read speed of the binary data from the medium where it is stored. Why not compare 'SELECT ...' with psql (or libpq) with 'cat ...' to measure the difference? >> The downside is that you might end up with a huge database >> that you will have to backup and maintain > > I don't really buy the argument with the backup: the amount of data to be > backed up is essentially the > same. > With both solutions you can have incremental backups. The amount to back up will stay roughly the same, granted. But isn't backup/restore of a large database more cumbersome than backup/restore of a file system? And a major upgrade of a large database is more painful, right? > Another downside you didn't mentioned is the fact that you have to distribute > the files in the > filesystem properly. > Having thousands or even millions of files in a single directory is not going > to be maintenance > friendly either. That's right, you have to spend some thought on how to store the files as well. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general