Thomas Kellerer wrote:
> Albe Laurenz schrieb am 17.12.2014 um 11:07:
>> and the performance will be worse than reading files from the file system.
> 
> There is a Microsoft research [1] (from 2006) which tested this "myth" using 
> SQL Server.
> It showed that the database might actually be faster than the file system.
> 
> As this topic comes up at my workplace every now and then as well, I created 
> a little web application
> (Java/JDBC) to test this on Postgres and possibly other DBMS.
> 
> Turns out the Postgres as well isn't really slower at this than the file 
> system.
> 
> For small files around 50k both perform similar: the average time to read the 
> blob from a bytea column
> was around 2ms whereas the average time to read the blob from the filesystem 
> was around 1ms. The test
> uses 50 threads to read the blobs using the PK of the table.
> 
> "Reading from the filesystem" means looking up the path for the file in the 
> database table and then
> reading the file from the filesystem.
> 
> For larger files around 250k Postgres was actually faster in my tests: 130ms 
> reading the bytea column
> vs. 260ms reading the file from disk.
> 
> The tests were done locally on my Windows laptop.
> I didn't have time yet to do this on a Linux server. I expect the filesystem 
> to have some impact on
> the figures and NTFS is not known for being blazingly fast. So maybe those 
> figures will change.

That must be some strangeness of the web application, que no?
PostgreSQL must do everything that a direct file access does, right?
Plus some extra processing (load the data into shared_buffers, ...).
Given that, do you have any explanation for what you observed?

> My tests however do not take into account the actual time it takes to send 
> the binary data from the
> server to the client (=browser). It might well be possible that serving the 
> file through an Apache Web
> Server directly is faster than serving the file through a JEE Servlet. My 
> intention was to measure the
> raw read speed of the binary data from the medium where it is stored.

Why not compare 'SELECT ...' with psql (or libpq) with 'cat ...' to measure the 
difference?

>> The downside is that you might end up with a huge database
>> that you will have to backup and maintain
> 
> I don't really buy the argument with the backup: the amount of data to be 
> backed up is essentially the
> same.
> With both solutions you can have incremental backups.

The amount to back up will stay roughly the same, granted.
But isn't backup/restore of a large database more cumbersome than 
backup/restore of a file system?
And a major upgrade of a large database is more painful, right?

> Another downside you didn't mentioned is the fact that you have to distribute 
> the files in the
> filesystem properly.
> Having thousands or even millions of files in a single directory is not going 
> to be maintenance
> friendly either.

That's right, you have to spend some thought on how to store the files as well.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to