Interesting ideas and a simulator would be fun for this purpose. You could be right and your example does make sense in a way but still.. I do wonder if it works out in the real world.
Let's say you have normal data that expires (user files etc) and large databases, some you keep for many months and sometimes even years. If you use 200G volumes and a database fills this volumes for 60%+ this volume might not expire for a long time even if the rest of the data has expired, that leaves a waste of 80G If you use 18G volumes and a database fills 6 volumes for a 100% and one volume for 60%, that leaves a waste of 10,8GB. Also, I don't have a clue of what the downside of small volumes could be, is there a disadvantage in having a few hundred volumes instead of 30 large ones? I can't think of a problem, maybe fs performance if the amount of volumes becomes insane..or a slight TSM performance impact if you start the reclaim process or do a query nodedata or something like that. This remark : Do a gedankenexperiment: Split 100TB into 100G vols, and into 10 10G vols. Then randomly expire data from them. Is not how I think real world data works, data doesn't expire randomly, parts of it do but large chunks of it don't (databases) Please prove me wrong, I love to learn new stuff! :) -----Oorspronkelijk bericht----- Van: ADSM: Dist Stor Manager [mailto:ads...@vm.marist.edu] Namens Allen S. Rout Verzonden: zondag 30 augustus 2009 1:57 Aan: ADSM-L@VM.MARIST.EDU Onderwerp: Re: [ADSM-L] Seeking wisdom on dedupe..filepool file size client compression and reclaims >> On Sat, 29 Aug 2009 09:24:11 +0200, Stefan Folkerts <stefan.folke...@itaa.nl> said: > Now I am thinking, dedupe only occurs when you move data the volumes > or reclaim them but 10G volumes might not get reclaimed for a LONG > time since they contain so little data the chance of that getting > reclaimed and thus deduplicated is relatively smaller than that > happening on a 100G volume. I think that, to a first approximation, the size of the volume is irrelevant to the issues you're discussing here. Do a gedankenexperiment: Split 100TB into 100G vols, and into 10 10G vols. Then randomly expire data from them. What you'll have is a bunch of volumes ranging from (say) 0% to 49% reclaimable. You will reclaim your _first_ volume a skewtch sooner in the 10G case. But on the average, you'll reclaim 500G of space in about the same number of days. Or said differently: in a week you'll reclaim about the same amount of space in each case. I need to publish a simulator. So pick volume sizes that avoid being silly in any direction. - Allen S. Rout