Re: [GENERAL] Are there plans to add data compression feature to postgresql?

小波顾 Wed, 29 Oct 2008 09:10:20 -0700

Data Compression
The new data compression feature in SQL Server 2008 reduces the size of tables, 
indexes or a subset of their partitions by storing fixed-length data types in 
variable length storage format and by reducing the redundant data. The space 
savings achieved depends on the schema and the data distribution. Based on our 
testing with various data warehouse databases, we have seen a reduction in the 
size of real user databases up to 87% (a 7 to 1 compression ratio) but more 
commonly you should expect a reduction in the range of 50-70% (a compression 
ratio between roughly 2 to 1 and 3 to 1). 
SQL Server provides two types of compression as follows:
·         ROW compression enables storing fixed length types in variable length 
storage format. So for example, if you have a column of data type BIGINT which 
takes 8 bytes of storage in fixed format, when compressed it takes a variable 
number of bytes—anywhere from 0 bytes to up to 8 bytes. Since column values are 
stored as variable length, an additional 4‑bit length code is stored for each 
field within the row. Additionally, zero and NULL values don’t take any storage 
except for the 4‑bit code. 
·         PAGE compression is built on top of ROW compression. It minimizes 
storage of redundant data on the page by storing commonly occurring byte 
patterns on the page once and then referencing these values for respective 
columns. The byte pattern recognition is type-independent. Under PAGE 
compression, SQL Server optimizes space on a page using two techniques.
The first technique is column prefix. In this case, the system looks for a 
common byte pattern as a prefix for all values of a specific column across rows 
on the page. This process is repeated for all the columns in the table or 
index. The column prefix values that are computed are stored as an anchor 
record on the page and the data or index rows refer to the anchor record for 
the common prefix, if available, for each column.
The second technique is page level dictionary. This dictionary stores common 
values across columns and rows and stores them in a dictionary. The columns are 
then modified to refer to the dictionary entry.
Compression comes with additional CPU cost. This overhead is paid when you 
query or execute DML operations on compressed data. The relative CPU overhead 
with ROW is less than for PAGE, but PAGE compression can provide better 
compression. Since there are many kinds of workloads and data patterns, SQL 
Server exposes compression granularity at a partition level. You can choose to 
compress the whole table or index or a subset of partitions. For example, in a 
DW workload, if CPU is the dominant cost in your workload but you want to save 
some disk space, you may want to enable PAGE compression on partitions that are 
not accessed frequently while not compressing the current partition(s) that are 
accessed and manipulated more frequently. This reduces the total CPU cost, at a 
small increase in disk space requirements. If I/O cost is dominant for your 
workload, or you need to reduce disk space costs, compressing all data using 
PAGE compression may be the best choice. Compression can give many-fold 
speedups if it causes your working set of frequently touched pages to be cached 
in the main memory buffer pool, when it does not otherwise fit in memory. 
Preliminary performance results on one large-scale internal DW query 
performance benchmark used to test SQL Server 2008 show a 58% disk savings, an 
average 15% reduction in query runtime, and an average 20% increase in CPU 
cost. Some queries speeded up by a factor of up to seven. Your results depend 
on your workload, database, and hardware.
The commands to compress data are exposed as options in CREATE/ALTER DDL 
statements and support both ONLINE and OFFLINE mode. Additionally, a stored 
procedure is provided to help you estimate the space savings prior to actual 
compression. 
Backup Compression
Backup compression helps you to save in multiple ways.
By reducing the size of your SQL backups, you save significantly on disk media 
for your SQL backups. While all compression results depend on the nature of the 
data being compressed, results of 50% are not uncommon, and greater compression 
is possible. This enables you to use less storage for keeping your backups 
online, or to keep more cycles of backups online using the same storage.
Backup compression also saves you time. Traditional SQL backups are almost 
entirely limited by I/O performance. By reducing the I/O load of the backup 
process, we actually speed up both backups and restores.
Of course, nothing is entirely free, and this reduction in space and time come 
at the expense of using CPU cycles. The good news here is that the savings in 
I/O time offsets the increased use of CPU time, and you can control how much 
CPU is used by your backups at the expense of the rest of your workload by 
taking advantage of the Resource Governor.
 
URL:http://msdn.microsoft.com/en-us/library/cc278097.aspx


Date: Wed, 29 Oct 2008 15:35:44 +0000From: [EMAIL PROTECTED]: [EMAIL 
PROTECTED]: Re: [GENERAL] Are there plans to add data compression feature to 
postgresql?CC: [EMAIL PROTECTED]; pgsql-general@postgresql.org
2008/10/29 小波 顾 <[EMAIL PROTECTED]>

1. Little integers of types take 8 bytes in the past now only take 4 or 2 bytes 
if there are not so large.
So what actually happen if I have a table with few mills of values that fit in 
2 bytes, but all of the sudent I am going to add another column with something 
that requires 8 bytes ? update on all columns ? I am actually even against 
varchars in my databases, so something like that sounds at least creepy.

 -- GJ
_________________________________________________________________
Invite your mail contacts to join your friends list with Windows Live Spaces. 
It's easy!
http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us

Re: [GENERAL] Are there plans to add data compression feature to postgresql?

Reply via email to