Re: [zfs-discuss] Debunking the dedup memory myth

2010-07-20 Thread Robert Milkowski
On 20/07/2010 04:41, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Richard L. Hamilton I would imagine that if it's read-mostly, it's a win, but otherwise it costs more than it saves. Even more conventional compress

Re: [zfs-discuss] Debunking the dedup memory myth

2010-07-19 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Richard L. Hamilton > > I would imagine that if it's read-mostly, it's a win, but > otherwise it costs more than it saves. Even more conventional > compression tends to be more resource intens

Re: [zfs-discuss] Debunking the dedup memory myth

2010-07-18 Thread Haudy Kazemi
Brandon High wrote: On Fri, Jul 9, 2010 at 5:18 PM, Brandon High > wrote: I think that DDT entries are a little bigger than what you're using. The size seems to range between 150 and 250 bytes depending on how it's calculated, call it 200b each. Your 128G dat

Re: [zfs-discuss] Debunking the dedup memory myth

2010-07-18 Thread Garrett D'Amore
On Sun, 2010-07-18 at 16:18 -0700, Richard L. Hamilton wrote: > > I would imagine that if it's read-mostly, it's a win, but > otherwise it costs more than it saves. Even more conventional > compression tends to be more resource intensive than decompression... > > What I'm wondering is when dedu

Re: [zfs-discuss] Debunking the dedup memory myth

2010-07-18 Thread Erik Trimble
On 7/18/2010 4:18 PM, Richard L. Hamilton wrote: Even the most expensive decompression algorithms generally run significantly faster than I/O to disk -- at least when real disks are involved. So, as long as you don't run out of CPU and have to wait for CPU to be available for decompression, the

Re: [zfs-discuss] Debunking the dedup memory myth

2010-07-18 Thread Richard L. Hamilton
> Even the most expensive decompression algorithms > generally run > significantly faster than I/O to disk -- at least > when real disks are > involved. So, as long as you don't run out of CPU > and have to wait for > CPU to be available for decompression, the > decompression will win. The > same

Re: [zfs-discuss] Debunking the dedup memory myth

2010-07-10 Thread Erik Trimble
On 7/10/2010 10:14 AM, Brandon High wrote: On Sat, Jul 10, 2010 at 5:33 AM, Erik Trimble > wrote: Which brings up an interesting idea: if I have a pool with good random I/O (perhaps made from SSDs, or even one of those nifty Oracle F5100 things), I

Re: [zfs-discuss] Debunking the dedup memory myth

2010-07-10 Thread Garrett D'Amore
Even the most expensive decompression algorithms generally run significantly faster than I/O to disk -- at least when real disks are involved. So, as long as you don't run out of CPU and have to wait for CPU to be available for decompression, the decompression will win. The same concept is true f

Re: [zfs-discuss] Debunking the dedup memory myth

2010-07-10 Thread Edward Ned Harvey
> From: Roy Sigurd Karlsbakk [mailto:r...@karlsbakk.net] > > increases the probability of arc/ram cache hit. So dedup allows you > to > > stretch your disk, and also stretch your ram cache. Which also > > benefits performance. > > Theoretically, yes, but there will be an overhead in cpu/memory tha

Re: [zfs-discuss] Debunking the dedup memory myth

2010-07-10 Thread Roy Sigurd Karlsbakk
> > 4% seems to be a pretty good SWAG. > > Is the above "4%" wrong, or am I wrong? > > Suppose 200bytes to 400bytes, per 128Kbyte block ... > 200/131072 = 0.0015 = 0.15% > 400/131072 = 0.003 = 0.3% > which would mean for 100G unique data = 153M to 312M ram. > > Around 3G ram for 1Tb unique data,

Re: [zfs-discuss] Debunking the dedup memory myth

2010-07-10 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Brandon High > > Dedup is to > save space, not accelerate i/o. I'm going to have to disagree with you there. Dedup is a type of compression. Compression can be used for storage savings, and

Re: [zfs-discuss] Debunking the dedup memory myth

2010-07-10 Thread Edward Ned Harvey
> From: Richard Elling [mailto:rich...@nexenta.com] > > 4% seems to be a pretty good SWAG. Is the above "4%" wrong, or am I wrong? Suppose 200bytes to 400bytes, per 128Kbyte block ... 200/131072 = 0.0015 = 0.15% 400/131072 = 0.003 = 0.3% which would mean for 100G unique data = 153M to 312M ram.

Re: [zfs-discuss] Debunking the dedup memory myth

2010-07-10 Thread Brandon High
On Sat, Jul 10, 2010 at 5:33 AM, Erik Trimble wrote: > Which brings up an interesting idea: if I have a pool with good random > I/O (perhaps made from SSDs, or even one of those nifty Oracle F5100 > things), I would probably not want to have a DDT created, or at least have > one that was very

Re: [zfs-discuss] Debunking the dedup memory myth

2010-07-10 Thread Richard Elling
On Jul 10, 2010, at 5:33 AM, Erik Trimble wrote: > On 7/10/2010 5:24 AM, Richard Elling wrote: >> On Jul 9, 2010, at 11:10 PM, Brandon High wrote: >> >> >>> On Fri, Jul 9, 2010 at 5:18 PM, Brandon High wrote: >>> I think that DDT entries are a little bigger than what you're using. The >>> si

Re: [zfs-discuss] Debunking the dedup memory myth

2010-07-10 Thread Erik Trimble
On 7/10/2010 5:24 AM, Richard Elling wrote: On Jul 9, 2010, at 11:10 PM, Brandon High wrote: On Fri, Jul 9, 2010 at 5:18 PM, Brandon High wrote: I think that DDT entries are a little bigger than what you're using. The size seems to range between 150 and 250 bytes depending on how it's cal

Re: [zfs-discuss] Debunking the dedup memory myth

2010-07-10 Thread Richard Elling
On Jul 9, 2010, at 11:10 PM, Brandon High wrote: > On Fri, Jul 9, 2010 at 5:18 PM, Brandon High wrote: > I think that DDT entries are a little bigger than what you're using. The size > seems to range between 150 and 250 bytes depending on how it's calculated, > call it 200b each. Your 128G data

Re: [zfs-discuss] Debunking the dedup memory myth

2010-07-09 Thread Brandon High
On Fri, Jul 9, 2010 at 5:18 PM, Brandon High wrote: > I think that DDT entries are a little bigger than what you're using. The > size seems to range between 150 and 250 bytes depending on how it's > calculated, call it 200b each. Your 128G dataset would require closer to > 200M (+/- 25%) for the

Re: [zfs-discuss] Debunking the dedup memory myth

2010-07-09 Thread Neil Perrin
On 07/09/10 19:40, Erik Trimble wrote: On 7/9/2010 5:18 PM, Brandon High wrote: On Fri, Jul 9, 2010 at 5:00 PM, Edward Ned Harvey mailto:solar...@nedharvey.com>> wrote: The default ZFS block size is 128K. If you have a filesystem with 128G used, that means you are consuming 1,048,576

Re: [zfs-discuss] Debunking the dedup memory myth

2010-07-09 Thread Erik Trimble
On 7/9/2010 5:18 PM, Brandon High wrote: On Fri, Jul 9, 2010 at 5:00 PM, Edward Ned Harvey mailto:solar...@nedharvey.com>> wrote: The default ZFS block size is 128K. If you have a filesystem with 128G used, that means you are consuming 1,048,576 blocks, each of which must be checks

Re: [zfs-discuss] Debunking the dedup memory myth

2010-07-09 Thread Brandon High
On Fri, Jul 9, 2010 at 5:00 PM, Edward Ned Harvey wrote: > The default ZFS block size is 128K. If you have a filesystem with 128G > used, that means you are consuming 1,048,576 blocks, each of which must be > checksummed. ZFS uses adler32 and sha256, which means 4bytes and 32bytes > ... 36 byt

[zfs-discuss] Debunking the dedup memory myth

2010-07-09 Thread Edward Ned Harvey
Whenever somebody asks the question, "How much memory do I need to dedup X terabytes filesystem," the standard answer is "as much as you can afford to buy." This is true and correct, but I don't believe it's the best we can do. Because "as much as you can buy" is a true assessment for memory in *