On 05/10/15 18:30, Colin P. McCabe wrote:
1. Don't get DomainSocket working on Solaris. Rely on the legacy short-circuit read instead. It has poorer security guarantees, but doesn't require domain sockets. You can add a line of code to the failing junit tests to skip them on Solaris.
I really don't want to do that as it relegates Solaris to only ever being a second-class citizen.
2. Use a separate "timer wheel" thread which implements coarse-grained timeouts by calling shutdown() on domain sockets that have been active for too long. This thread could be global (one per JVM).
From what I can tell that won't stop all the test failures as they are written with the assumption that per-socket timeouts are available and that they time out exactly when expected.
3. Implement the poll/select loop you discussed earlier. As Steve commented, it would be easier to do this by adding new functions, rather than by changing existing ones. I don't think "ifdef skid marks" are necessary since poll and select are supported on Linux and so forth as well as Solaris. You would just need some code in DomainSocket.java to select the appropriate implementation at runtime based on the OS.
I could switch the implementation over to use poll everywhere but I haven't done that - Linux still uses socket timeouts. The issue is that in order to make poll() work I need to maintain the read/write timeouts alongside the filehandle - I can't store the timeout 'inside' the filehandle using setsockopt(). That means that the filehandle and the timeouts have to be stored together somewhere. The logical place to put the timeouts is in the same DomainSocket instances that holds the filehandle. If the DomainSocket JNI methods were all instance methods then there wouldn't be a problem, but they aren't, they are static methods where the integer filehandle is passed in as a parameter. And it wouldn't work if I change the native method parameter lists to include the timeouts as they need to be read/write. The only non-vile way I can come up with of doing this is to convert the JNI methods from static into instance methods. Even if that's the only change I make and I still pass in the filehandle as a parameter, the signatures will have changed as the 2nd parameter would now be an object reference and not a class reference.
The other option is to effectively write a complete Solaris-only replacement for DomainSocket, whether switching between that and the current one is done at compile or run-time isn't really the point. There's a fairly even split between the Java & JNI components of DomainSocket, so whichever way it's done there will be significant duplication of the overall logic and most likely code duplication. That means that bug fixes in one place have to be exactly mirrored in another, and that's unlikely to be sustainable.
My goal has been to keep the current logic as unchanged as possible. My prototype does that by literally prefixing each libc socket operation with a poll() call to check the filehandle is ready. The rest of the logic in DomainSocket is completely unchanged. That means that the behaviour between Linux and Solaris should be as identical as is possible.
Since you commented that Solaris is implementing timeout support in the future, approaches #1 or #2 could be placeholders until that's finished.
Unfortunately I can't predict when that might happen by, though. In my prototype it probes for working timeouts at configure time, so when they do become available they'll be used automatically.
I agree that there is no formal libhadoop.so compatibility policy and that is frustrating. This has been an issue for those who want to run jars compiled against multiple different versions of hadoop through the same YARN instance. We've discussed it in the past, but never really come up with a great solution. The best approach really would be to bundle libhadoop.so inside the hadoop jar files, so that it could be integral to the Hadoop version itself. However, nobody has done the work to make that happen. The second-best approach would be to include the Hadoop version in the libhadoop name itself (so we'd have libhadoop28.so for hadoop 2.8, and so forth.) Anyway, I think we can solve this particular issue without going down that rathole...
As I said, I believe that ship has long since sailed. Changes that have already been let in have I believe broken the backwards binary compatibility of the Java/JNI interface. Broken is broken, arguing that this proposal shouldn't be allowed in because it simply adds more brokenness to the existing brokenness is really missing the point. As far as I can tell, there already is no backwards compatibility.
-- Alan Burlison --