I think breaking backwards compat is sensible since It's easily caught by the 
compiler and  in this case where the alternative is a 
Runtime error that can result in terabytes of mucked up output.

> On May 29, 2014, at 6:11 AM, Matt Fellows <matt.fell...@bespokesoftware.com> 
> wrote:
> 
> As someone who doesn't really contribute, just lurks, I could well be 
> misinformed or under-informed, but I don't see why we can't deprecate a 
> method which could cause dangerous side effects?  
> People can still use the deprecated methods for backwards compatibility, but 
> are discouraged by compiler warnings, and any changes they write to their 
> code can start to use the new functionality?
> 
> *Apologies if I'm stepping into a Hadoop holy war here
> 
> 
>> On Thu, May 29, 2014 at 10:47 AM, Niels Basjes <ni...@basjes.nl> wrote:
>> My original proposal (from about 3 years ago) was to change the isSplitable
>> method to return a safe default ( you can see that in the patch that is
>> still attached to that Jira issue).
>> For arguments I still do not fully understand this was rejected by Todd and
>> Doug.
>> 
>> So that is why my new proposal is to deprecate (remove!) the old method
>> with the typo in Hadoop 3.0 and replace it with something correct and less
>> error prone.
>> Given the fact that this would happen in a major version jump I thought
>> that would be the right time to do that.
>> 
>> Niels
>> 
>> 
>> On Thu, May 29, 2014 at 11:34 AM, Steve Loughran 
>> <ste...@hortonworks.com>wrote:
>> 
>> > On 28 May 2014 20:50, Niels Basjes <ni...@basjes.nl> wrote:
>> >
>> > > Hi,
>> > >
>> > > Last week I ran into this problem again
>> > > https://issues.apache.org/jira/browse/MAPREDUCE-2094
>> > >
>> > > What happens here is that the default implementation of the isSplitable
>> > > method in FileInputFormat is so unsafe that just about everyone who
>> > > implements a new subclass is likely to get this wrong. The effect of
>> > > getting this wrong is that all unit tests succeed and running it against
>> > > 'large' input files (>>64MiB) that are compressed using a non-splittable
>> > > compression (often Gzip) will cause the input to be fed into the mappers
>> > > multiple time (i.e. you get garbage results without ever seeing any
>> > > errors).
>> > >
>> > > Last few days I was at Berlin buzzwords talking to someone about this bug
>> > >
>> >
>> > that was me, I recall.
>> >
>> >
>> > > and this resulted in the following proposal which I would like your
>> > > feedback on.
>> > >
>> > > 1) This is a change that will break backwards compatibility (deliberate
>> > > choice).
>> > > 2) The FileInputFormat will get 3 methods (the old isSplitable with the
>> > > typo of one 't' in the name will disappear):
>> > >     (protected) isSplittableContainer --> true unless compressed with
>> > > non-splittable compression.
>> > >     (protected) isSplittableContent --> abstract, MUST be implemented by
>> > > the subclass
>> > >     (public)      isSplittable --> isSplittableContainer &&
>> > > isSplittableContent
>> > >
>> > > The idea is that only the isSplittable is used by other classes to know
>> > if
>> > > this is a splittable file.
>> > > The effect I hope to get is that a developer writing their own
>> > > fileinputformat (which I alone have done twice so far) is 'forced' and
>> > > 'helped' getting this right.
>> > >
>> >
>> > I could see making the attributes more explicit would be good -but stopping
>> > everything that exists from working isn't going to fly.
>> >
>> > what about some subclass, AbstractSplittableFileInputFormat that implements
>> > the container properly, requires that content one -and then calculates
>> > IsSplitable() from the results? Existing code: no change, new formats can
>> > descend from this (and built in ones retrofitted).
>> >
>> >
>> >
>> > > The reason for me to propose this as an incompatible change is that this
>> > > way I hope to eradicate some of the existing bugs in custom
>> > implementations
>> > > 'out there'.
>> > >
>> > > P.S. If you agree to this change then I'm willing to put my back into it
>> > > and submit a patch.
>> > >
>> > > --
>> > > Best regards,
>> > >
>> > > Niels Basjes
>> > >
>> >
>> > --
>> > CONFIDENTIALITY NOTICE
>> > NOTICE: This message is intended for the use of the individual or entity to
>> > which it is addressed and may contain information that is confidential,
>> > privileged and exempt from disclosure under applicable law. If the reader
>> > of this message is not the intended recipient, you are hereby notified that
>> > any printing, copying, dissemination, distribution, disclosure or
>> > forwarding of this communication is strictly prohibited. If you have
>> > received this communication in error, please contact the sender immediately
>> > and delete it from your system. Thank You.
>> >
>> 
>> 
>> 
>> --
>> Best regards / Met vriendelijke groeten,
>> 
>> Niels Basjes
> 
> 
> 
> -- 
> 
> First Option Software Ltd
> Signal House
> Jacklyns Lane
> Alresford
> SO24 9JJ
> Tel: +44 (0)1962 738232
> Mob: +44 (0)7710 160458 
> Fax: +44 (0)1962 600112
> Web: www.bespokesoftware.com
> 
> ____________________________________________________
> 
> This is confidential, non-binding and not company endorsed - see full terms 
> at www.fosolutions.co.uk/emailpolicy.html 
> First Option Software Ltd Registered No. 06340261
> Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.
> ____________________________________________________
> 

Reply via email to