Robert Haas wrote: >On Fri, Jan 2, 2009 at 3:23 PM, Stephen R. van den Berg <s...@cuci.nl> wrote: >> Three things: >> a. Shouldn't it in theory be possible to have a decompression algorithm >> which is IO-bound because it decompresses faster than the disk can >> supply the data? (On common current hardware). >> b. Has the current algorithm been carefully benchmarked and/or optimised >> and/or chosen to fit the IO-bound target as close as possible? >> c. Are there any well-known pitfalls/objections which would prevent me from >> changing the algorithm to something more efficient (read: IO-bound)?
>Any compression algorithm is going to require you to decompress the >entire string before extracting a substring at a given offset. When >the data is uncompressed, you can jump directly to the offset you want >to read. Even if the compression algorithm requires no overhead at >all, it's going to make the location of the data nondeterministic, and >therefore force additional disk reads. That shouldn't be insurmountable: - I currently have difficulty imagining applications that actually do lots of substring extractions from large compressible fields. The most likely operation would be a table which contains tsearch indexed large textfields, but those are unlikely to participate in a lot of substring extractions. - Even if substring operations would be likely, I could envision a compressed format which compresses in compressed chunks of say 64KB which can then be addressed randomly independently. -- Sincerely, Stephen R. van den Berg. "Always remember that you are unique. Just like everyone else." -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers