Tim Spriggs wrote: > Does anyone know a tool that can look over a dataset and give > duplication statistics? I'm not looking for something incredibly > efficient but I'd like to know how much it would actually benefit our > dataset: HiRISE has a large set of spacecraft data (images) that could > potentially have large amounts of redundancy, or not. Also, other up and > coming missions have a large data volume that have a lot of duplicate > image info and a small budget; with "d11p" in OpenSolaris there is a > good business case to invest in Sun/OpenSolaris rather than buy the > cheaper storage (+ linux?) that can simply hold everything as is. > > If someone feels like coding a tool up that basically makes a file of > checksums and counts how many times a particular checksum get's hit over > a dataset, I would be willing to run it and provide feedback. :) > > -Tim > >
Me too. Our data profile is just like Tim's: Terra bytes of satellite data. I'm going to guess that the d11p ratio won't be fantastic for us. I sure would like to measure it though. Jon -- - _____/ _____/ / - Jonathan Loran - - - / / / IT Manager - - _____ / _____ / / Space Sciences Laboratory, UC Berkeley - / / / (510) 643-5146 [EMAIL PROTECTED] - ______/ ______/ ______/ AST:7731^29u18e3 _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss