On 06/10/2015 10:52, Steve Loughran wrote:

That's not achievable as the method signatures need to change. Even
though they are private they need to change from static to normal
methods and the signatures need to change as well, as I said.

We've done it before, simply by retaining the older method entry
points. Moving from static to instance-specific is a bigger change.
If the old entry points are there and retained, even if all uses have
been ripped out of the hadoop code, then the new methods will get
used. It's just that old stuff will still link.

As I explained in my last email, converting the old static JNI functions to be wrappers around new instance JNI functions requires a jobject reference to be passed into the new function that the old one wraps around. The static methods can't magic one up. An instance pointer *is* available, the current code flow is Java object method -> static JNI function so if we could change the JNI from static->instance then we'd have what we needed. But if you are considering the JNI layer to be a public interface (which I think is a big mistake, no matter how convenient it might be), then you are simply screwed, both here and in other places. As I've said, I have a suspicion that changes we've already made have broken that compatibility anyway.

JNI code is intimately  intertwined with the Java code it runs
with. Running mismatching Java & JNI versions is going to be a
recipe for eventual disaster as the JVM explicitly does *not* do
any error checking between Java and JNI.

You mean jni code built for java7 isn't guaranteed to work on Java 8?
If so, that's not something we knew of —and something to worry
about.

Actually I think that particular scenario is going to be OK. I wasn't clear - sorry - what I was musing about was the fact that the Hadoop JNI IO code delves into the innards of the platform Java classes and pulls out bits of private data. That's explicitly not-an-interface and could break at any time, although the likelihood may be low the JVM developers could change it and you'd just be SOL. The same goes for all the other private Java interfaces that Hadoop consumes - all the ones you get warnings about when you build it. For example there are already plans to make significant changes to sun.misc.unsafe for example. That will affect Hadoop.

At some point some innocuous change will be made that will just
cause undefined behaviour.

I don't actually know how you'd get a JAR/JNI mismatch as they are
built and packaged together, so I'm struggling to understand what
the potential issue is here.

it arises whenever you try to deploy to YARN any application
containing directly or indirectly (e.g. inside the spark-assembly
JAR) the Hadoop java classes of a previous Java version. libhadoop is
on the PATH of the far end, your app uploads their hadoop JARs, and
the moment something tries to use the JNI-backed method you get to
see a stack trace.

https://issues.apache.org/jira/browse/HADOOP-11064

if you look at the patch there, that's the kind of thing I'd like to
see to address your solaris issues.

Hmm, yes. That's appears to be a short-term hack-around to keep things running, not a fix. At very best, it's extremely fragile.

From the bug:

"We don't have any way of enforcing C API stability. Jenkins doesn't check for it, most Java programmers don't know how to achieve it."

In which case I think reading this will be helpful: http://docs.oracle.com/cd/E19253-01/817-1984/chapter5-84101/index.html

The assumption seems to be that as long as libhadoop.so keeps the same list of functions with the same arguments then it will be backwards-compatible. Unfortunately that's just flat out wrong. Binary compatibility requires more than that. It also requires that there are no changes to any data structures, and that the semantics of all the functions remain completely unchanged. I'd put money on that not being the case already. The errors you saw HADOOP-11064 are the easy ones because you got a run-time linker error. The others will cause mysterious behaviour, memory corruption and general WTFness.

In any case the constraint you are requesting would flat-out
preclude this change, and would also mean that most of the other
JNI changes that have been committed recently would have to be
ripped out as well . In summary, the bridge is already burned.

We've covered the bridge in petrol but not quite dropped a match on
it.

No, I'm reasonable certain you've already dropped the match, and if you haven't its just good fortune.

--
Alan Burlison
--

Reply via email to