Joseph Myers <jos...@codesourcery.com>:
> To me, that indicates that using a conversion tool that is conservative in 
> its heuristics, and then selectively applying improvements to the extent 
> they can be done safely with manual review in a reasonable time, is better 
> than applying a conversion tool with more aggressive heuristics.

There's a more general point here, which I'm developing in my
book-in-progress.

Clean data-conversion problems can be done algorithmically without a
human in the loop.  Messy data-conversion problems need judgment
amplifiers.

Maxim's scripts try to treat a messy conversion problem as though it
were a clean one. Maxim is pretty sharp, so this almost works. Almost.
But the failure mode is predictable - overinterpreting badly-formed
input leads to plausible garbage on output.  

When this happens, it's the Goddess Eris's way of telling you that
there needs to be human judgment in the loop.  Instead of trying to
automate it out, you should be building tools that partion the process 
into things a computer does well, driven by choices a human makes well.

This is a point that needs making because programmers thrown at messy
conversion problems tend to be more fixated on achieving full
automation than they perhaps ought to be.

Elswhere I have written of Zeno tarpits:
http://esr.ibiblio.org/?p=6772 Subversion dump streams are not quite a
Zeno tarpit - they actually obey something that has the effect of a
formal specification - but ChangeLog parsing is.

> The issues with the reposurgeon conversion listed in Maxim's last comments 
> were of the form "reposurgeon is being conservative in how it generates 
> metadata from SVN information".  I think that's a very good basis for 
> adding on a limited set of safe improvements to authors and commit 
> messages that can be done reasonably soon and then doing the final 
> conversion with reposurgeon.

The flip side of this is that Joseph has been making intelligent and
realistic suggestions for how to improve reposurgeon.  That is
*invaluable* - it captures knowledge that will make future comparisons
easier and better.

Software engineers (outside of a few AI specialists) don't ordinarily
think of themselves as being in the knowledge-capture business. But
it's a useful perspective to cultivate.
-- 
                <a href="http://www.catb.org/~esr/";>Eric S. Raymond</a>


Reply via email to