On Mon, Jun 16, 2014 at 12:36 PM, Kevin A. McGrail <kmcgr...@pccc.com>
wrote:

> On 6/16/2014 9:49 AM, Joe Quinn wrote:
>
>> On 6/16/2014 9:42 AM, Dave Pooser wrote:
>>
>>> On 5/30/14 11:11 AM, "Kevin A. McGrail" <kmcgr...@pccc.com> wrote:
>>>
>>>  Good time for an update to the users list about the issue.  The box that
>>>> processed the updates at the ASF collo failed catastrophically during a
>>>> power surge that took down some other boxes as ell. Unfortunately, while
>>>> the project requested backups in 2009, they were not implemented.
>>>>
>>> Now that the update box is back online (and thanks for all your hard work
>>> on that! Systems archaeology is no fun at all), is there anything useful
>>> the community can do to help prevent another such catastrophe? I'd be
>>> willing to contribute hardware and/or VM space at $WORKPLACE for an
>>> offsite replica as long as we wouldn't need to sync more than 2-4GB/day
>>> after the initial setup completed.
>>>
>>
>> If you have access to any SA boxes, make sure they have a scheduled
>> backup (and make sure the backup works and has all important data!). If any
>> systems do not have backups, report it to the appropriate list.
>>
>> Also make sure every task the box is designed to handle is appropriately
>> documented, including user accounts required, libraries required and their
>> versions, what crontabs should be, etc.
>>
> I think Joe's answer is correct but at the same time doesn't answer the
> question of what the community at large can do to help.
>
> First, the overall takeaway for me is that documentation is important.
>  This was a hard task when it was just a box failure. When it became a box
> failure with missing backups and large documentation issues, it effectively
> became a personal mission to get the box working.  A little documentation
> on things went a long way for me and especially in the OS world, people
> burnout or go on to other projects so it's helpful if you try and document.
>
> Second, to answer your question less philosophically, from the community
> we always need:
>
> - Masscheckers - You run code nightly and automated against a hand sorted
> spam/ham corpora to improve our rule scoring.  Once you get it setup and
> get good, sorted email, the system is very automated.
>
> - Rule writers - Spam evolves and we need people to write rules.  If you
> like balancing your checkbook, doing SoDoKu, see patterns in gibberish,
> this might be perfect for you.  And what I love doing is evolving the rule
> writing from manual to automated processing.  It's quite a science really!
>
> - Coders - The life blood of a project really.  If you want to help write
> code, become a committer and help drive this project on the PMC, speak up!
>
> - Testers - People who will use trunk on production systems and give
> constructive feedback on real-world mail flow.  NOTE: trunk is usually in
> good shape and runs on many of the committers systems. Because of the way
> the system is plugin based, the experimental stuff is usually not enabled
> by default.
>
> - RBL Stuff - I'm also still working with the ASF to see if we can run a
> distributed RBL under the projects Umbrella so stay tuned on that...
>
>
If this is open source, why not take advantage of all of the repositories
available for this?  Git?  Sourceforge?  Mirrors?  The problem isn't merely
the lack of a backup, but a single point of failure waiting to happen.
When something goes wrong with kernel.org or the like, it isn't backup
tapes that get them online again quickly.  I don't know enough about it,
but I know the general principle that if you have a problem you're likely
not the first to encounter it, and you usually don' t need to invent a
solution, but look into how it has been solved elsewhere.

Reply via email to