It sounds like what we're coming around to is that we're comfortable with this 
proposal at the Java API layer, but the potential for impact at the shell 
interface makes it overall uncomfortable to declare anything in the 
compatibility guidelines.  I think that's a fair assessment.

Steve's example of AclEntry#toString is very relevant here.  Here I've been 
arguing that we should -1 patches that rely on toString output for any 
externalized format (serialization, UI, shell, etc.).  I wrote or code reviewed 
all of the HDFS ACL code, and yet this got past me.  Diligence during code 
review is not infallible.  I have assigned HADOOP-13150 to myself to change the 
HDFS ACL commands to avoid dependencies on toString.

Let's try to avoid this in new bits of code and update existing code as we find 
unfortunate dependencies on toString.

--Chris Nauroth

From: Steve Loughran <ste...@hortonworks.com<mailto:ste...@hortonworks.com>>
Date: Saturday, May 14, 2016 at 1:01 PM
To: Allen Wittenauer <a...@apache.org<mailto:a...@apache.org>>
Cc: Chris Nauroth <cnaur...@hortonworks.com<mailto:cnaur...@hortonworks.com>>, 
"common-dev@hadoop.apache.org<mailto:common-dev@hadoop.apache.org>" 
<common-dev@hadoop.apache.org<mailto:common-dev@hadoop.apache.org>>
Subject: Re: Compatibility guidelines for toString overrides


On 14 May 2016, at 18:39, Allen Wittenauer 
<a...@apache.org<mailto:a...@apache.org>> wrote:


On May 12, 2016, at 12:20 PM, Chris Nauroth 
<cnaur...@hortonworks.com<mailto:cnaur...@hortonworks.com>> wrote:

Hello Allen,

The intent is not for this rule to override other compatibility rules,
such as the important CLI output rule.  It's also not intended to give us
free reign to change existing toString() implementations without due
diligence.  If a patch changes an existing toString() implementation that
already goes out to the shell or any other form of external serialization,
then the patch needs to be declined.  (I concur that relying on toString()
like this should never be done, and I'd encourage us to fix any
occurrences we find.)


The problem is that we as a community do a terrible job on following the 
compatibility guidelines when it comes to CLI output.  Because while this maybe 
true:

Instead, the intent is to advertise to Java API consumers that toString()
output may evolve freely, and therefore we recommend against writing Java
code that depends on that output format.

... what's going to happen is people are going to assume that because toString 
is now considered Unstable, all output will be considered Unstable by way of 
mental short cut.  There have been many, many instances over the past year or 
two where even long time PMC members have claimed CLI output was already 
Unstable/outside the scope of the compatibility guidelines and I just see this 
re-inforcing that (incorrect) world view.

HDFS-9732 is a good example of how to handle this.  I didn't explicitly -1
it, but I did call out the CLI compatibility problem and recommend how to
change the patch so that it preserves compatibility.

Does this help address your concerns, or is the full code audit the only
thing that would convince you to lift your -1?

No, because "perception is reality".  If we want to make toString Unstable, 
then we need to be very clear as to which toStrings are Unstable and which 
aren't.  This isn't a universal rule without doing some legwork.


I am painfully coming round to this point of view *on existing classes*


  1.  We need some interface "StableString" which we can tag to say "this is 
stable"; new CLI-ready code can use it.
  2.  We need to see where we use stuff today., by backchaining from uses of 
System.out & System.err, and from TableListing uses especially (which should 
implement StableString itself obviously).
  3.  Or: look at where toString() is subclassed and go from there...there's 
less than 800 such overrides. But here you can't tell where the use of a string 
is going to be. There are obvious ones (DU, DF, others which appear to be 
useful, AclEntry.toString(), but many which look like IDE-generated patterns 
for debug.


There's another tactic, which would be to add some explicit log diagnostics 
method which we offered no guarantees on ... but the challenge then becomes one 
of "how to use efficiently in logging". That could be addressed, equally 
painfully, by having our own log class (extending/mimicing slfj4j and which 
handles that message specially. That would be an absolute PITA to work with. I 
think I'd rather look at our command line use of toString values

Now, what about new bits of code?

Reply via email to