On Fri, Jun 13, 2014 at 2:54 AM, Niels Basjes <ni...@basjes.nl> wrote:
> Hmmm, people only look at logs when they have a problem. So I don't think
> this would be enough.

This change to the framework will cause disruptions to users, to aid
InputFormat authors' debugging. The latter is a much smaller
population and better equipped to handle this complexity.

A log statement would print during submission, so it would be visible
to users. If a user's job is producing garbage but submission was
non-interactive, a log statement would be sufficient to debug the
issue. If the naming conflict is common in some contexts, the warning
can be disabled using the log configuration.

Beyond that, input validation is the responsibility of the InputFormat author.

> Perhaps this makes sense:
> - For 3.0: Shout at the developer who does it wrong (i.e. make it abstract
> and force them to think about this) i.e. Create new abstract method
> isSplittable (tt) in FileInputFormat, remove isSplitable (one t).
>
> To avoid needless code duplication (which we already have in the codebase)
> create a helper method something like 'fileNameIndicatesSplittableFile' (
> returns enum:  Splittable/NonSplittable/Unknown ).
>
> - For 2.x: Keep the enduser safe: Avoid "silently producing garbage" in all
> situations where the developer already did it wrong. (i.e. change
> isSplitable ==> return false) This costs performance only in those
> situations where the developer actually did it wrong (i.e. they didn't
> thing this through)
>
> How about that?

-1 on the 2.x change for compatibility reasons.

While we can break compatibility in the 3.x line, the tradeoff is
still not very compelling, frankly. -C

Reply via email to