On Mon, Jun 16, 2014 at 12:36 PM, Kevin A. McGrail <kmcgr...@pccc.com> wrote:
> On 6/16/2014 9:49 AM, Joe Quinn wrote: > >> On 6/16/2014 9:42 AM, Dave Pooser wrote: >> >>> On 5/30/14 11:11 AM, "Kevin A. McGrail" <kmcgr...@pccc.com> wrote: >>> >>> Good time for an update to the users list about the issue. The box that >>>> processed the updates at the ASF collo failed catastrophically during a >>>> power surge that took down some other boxes as ell. Unfortunately, while >>>> the project requested backups in 2009, they were not implemented. >>>> >>> Now that the update box is back online (and thanks for all your hard work >>> on that! Systems archaeology is no fun at all), is there anything useful >>> the community can do to help prevent another such catastrophe? I'd be >>> willing to contribute hardware and/or VM space at $WORKPLACE for an >>> offsite replica as long as we wouldn't need to sync more than 2-4GB/day >>> after the initial setup completed. >>> >> >> If you have access to any SA boxes, make sure they have a scheduled >> backup (and make sure the backup works and has all important data!). If any >> systems do not have backups, report it to the appropriate list. >> >> Also make sure every task the box is designed to handle is appropriately >> documented, including user accounts required, libraries required and their >> versions, what crontabs should be, etc. >> > I think Joe's answer is correct but at the same time doesn't answer the > question of what the community at large can do to help. > > First, the overall takeaway for me is that documentation is important. > This was a hard task when it was just a box failure. When it became a box > failure with missing backups and large documentation issues, it effectively > became a personal mission to get the box working. A little documentation > on things went a long way for me and especially in the OS world, people > burnout or go on to other projects so it's helpful if you try and document. > > Second, to answer your question less philosophically, from the community > we always need: > > - Masscheckers - You run code nightly and automated against a hand sorted > spam/ham corpora to improve our rule scoring. Once you get it setup and > get good, sorted email, the system is very automated. > > - Rule writers - Spam evolves and we need people to write rules. If you > like balancing your checkbook, doing SoDoKu, see patterns in gibberish, > this might be perfect for you. And what I love doing is evolving the rule > writing from manual to automated processing. It's quite a science really! > > - Coders - The life blood of a project really. If you want to help write > code, become a committer and help drive this project on the PMC, speak up! > > - Testers - People who will use trunk on production systems and give > constructive feedback on real-world mail flow. NOTE: trunk is usually in > good shape and runs on many of the committers systems. Because of the way > the system is plugin based, the experimental stuff is usually not enabled > by default. > > - RBL Stuff - I'm also still working with the ASF to see if we can run a > distributed RBL under the projects Umbrella so stay tuned on that... > > If this is open source, why not take advantage of all of the repositories available for this? Git? Sourceforge? Mirrors? The problem isn't merely the lack of a backup, but a single point of failure waiting to happen. When something goes wrong with kernel.org or the like, it isn't backup tapes that get them online again quickly. I don't know enough about it, but I know the general principle that if you have a problem you're likely not the first to encounter it, and you usually don' t need to invent a solution, but look into how it has been solved elsewhere.