I just spotted one: HADOOP-10027. A field was removed from the Java layer, which still could get referenced by an older version of the native layer. A backwards-compatible version of that patch would preserve the old fields in the Java layer.
Full disclosure: I was the one who committed that patch, so this was a miss by me during the code review. --Chris Nauroth On 10/6/15, 9:03 AM, "Chris Nauroth" <cnaur...@hortonworks.com> wrote: >Alan, would you please list the specific patches/JIRA issues that broke >compatibility? I have not been reviewing the native code lately, so it >would help me catch up quickly if you already know which specific patches >have introduced problems. If those patches currently reside only on trunk >and branch-2, then they have not yet shipped in an Apache release. We'd >still have an opportunity to fix them and avoid "dropping the match" >before shipping 2.8.0. > >Yes, we are aware that binary compatibility goes beyond the function >signatures and into data layout and semantics. > >--Chris Nauroth > > > > >On 10/6/15, 8:25 AM, "Alan Burlison" <alan.burli...@oracle.com> wrote: > >>On 06/10/2015 10:52, Steve Loughran wrote: >> >>>> That's not achievable as the method signatures need to change. Even >>>> though they are private they need to change from static to normal >>>> methods and the signatures need to change as well, as I said. >>> >>> We've done it before, simply by retaining the older method entry >>> points. Moving from static to instance-specific is a bigger change. >>> If the old entry points are there and retained, even if all uses have >>> been ripped out of the hadoop code, then the new methods will get >>> used. It's just that old stuff will still link. >> >>As I explained in my last email, converting the old static JNI functions >>to be wrappers around new instance JNI functions requires a jobject >>reference to be passed into the new function that the old one wraps >>around. The static methods can't magic one up. An instance pointer *is* >>available, the current code flow is Java object method -> static JNI >>function so if we could change the JNI from static->instance then we'd >>have what we needed. But if you are considering the JNI layer to be a >>public interface (which I think is a big mistake, no matter how >>convenient it might be), then you are simply screwed, both here and in >>other places. As I've said, I have a suspicion that changes we've >>already made have broken that compatibility anyway. >> >>>> JNI code is intimately intertwined with the Java code it runs >>>> with. Running mismatching Java & JNI versions is going to be a >>>> recipe for eventual disaster as the JVM explicitly does *not* do >>>> any error checking between Java and JNI. >>> >>> You mean jni code built for java7 isn't guaranteed to work on Java 8? >>> If so, that's not something we knew of ‹and something to worry >>> about. >> >>Actually I think that particular scenario is going to be OK. I wasn't >>clear - sorry - what I was musing about was the fact that the Hadoop JNI >>IO code delves into the innards of the platform Java classes and pulls >>out bits of private data. That's explicitly not-an-interface and could >>break at any time, although the likelihood may be low the JVM developers >>could change it and you'd just be SOL. The same goes for all the other >>private Java interfaces that Hadoop consumes - all the ones you get >>warnings about when you build it. For example there are already plans to >>make significant changes to sun.misc.unsafe for example. That will >>affect Hadoop. >> >>>> At some point some innocuous change will be made that will just >>>> cause undefined behaviour. >>>> >>>> I don't actually know how you'd get a JAR/JNI mismatch as they are >>>> built and packaged together, so I'm struggling to understand what >>>> the potential issue is here. >>> >>> it arises whenever you try to deploy to YARN any application >>> containing directly or indirectly (e.g. inside the spark-assembly >>> JAR) the Hadoop java classes of a previous Java version. libhadoop is >>> on the PATH of the far end, your app uploads their hadoop JARs, and >>> the moment something tries to use the JNI-backed method you get to >>> see a stack trace. >>> >>> https://issues.apache.org/jira/browse/HADOOP-11064 >>> >>> if you look at the patch there, that's the kind of thing I'd like to >>> see to address your solaris issues. >> >>Hmm, yes. That's appears to be a short-term hack-around to keep things >>running, not a fix. At very best, it's extremely fragile. >> >> From the bug: >> >>"We don't have any way of enforcing C API stability. Jenkins doesn't >>check for it, most Java programmers don't know how to achieve it." >> >>In which case I think reading this will be helpful: >>http://docs.oracle.com/cd/E19253-01/817-1984/chapter5-84101/index.html >> >>The assumption seems to be that as long as libhadoop.so keeps the same >>list of functions with the same arguments then it will be >>backwards-compatible. Unfortunately that's just flat out wrong. Binary >>compatibility requires more than that. It also requires that there are >>no changes to any data structures, and that the semantics of all the >>functions remain completely unchanged. I'd put money on that not being >>the case already. The errors you saw HADOOP-11064 are the easy ones >>because you got a run-time linker error. The others will cause >>mysterious behaviour, memory corruption and general WTFness. >> >>>> In any case the constraint you are requesting would flat-out >>>> preclude this change, and would also mean that most of the other >>>> JNI changes that have been committed recently would have to be >>>> ripped out as well . In summary, the bridge is already burned. >>> >>> We've covered the bridge in petrol but not quite dropped a match on >>> it. >> >>No, I'm reasonable certain you've already dropped the match, and if you >>haven't its just good fortune. >> >>-- >>Alan Burlison >>-- >> > >