Hi Alan, As Chris commented earlier, the main use of DomainSocket is to transfer file descriptors from the DataNode to the DFSClient. As you know, this is something that can only be done through domain sockets, not through inet sockets. We do support passing data over domain sockets, but in practice we rarely turn it on since we haven't seen a performance advantage.
As I see it, you have a few different options here for getting this working on Solaris. 1. Don't get DomainSocket working on Solaris. Rely on the legacy short-circuit read instead. It has poorer security guarantees, but doesn't require domain sockets. You can add a line of code to the failing junit tests to skip them on Solaris. 2. Use a separate "timer wheel" thread which implements coarse-grained timeouts by calling shutdown() on domain sockets that have been active for too long. This thread could be global (one per JVM). 3. Implement the poll/select loop you discussed earlier. As Steve commented, it would be easier to do this by adding new functions, rather than by changing existing ones. I don't think "ifdef skid marks" are necessary since poll and select are supported on Linux and so forth as well as Solaris. You would just need some code in DomainSocket.java to select the appropriate implementation at runtime based on the OS. Since you commented that Solaris is implementing timeout support in the future, approaches #1 or #2 could be placeholders until that's finished. I agree that there is no formal libhadoop.so compatibility policy and that is frustrating. This has been an issue for those who want to run jars compiled against multiple different versions of hadoop through the same YARN instance. We've discussed it in the past, but never really come up with a great solution. The best approach really would be to bundle libhadoop.so inside the hadoop jar files, so that it could be integral to the Hadoop version itself. However, nobody has done the work to make that happen. The second-best approach would be to include the Hadoop version in the libhadoop name itself (so we'd have libhadoop28.so for hadoop 2.8, and so forth.) Anyway, I think we can solve this particular issue without going down that rathole... best, Colin On Mon, Oct 5, 2015 at 7:56 AM, Alan Burlison <alan.burli...@oracle.com> wrote: > On 05/10/2015 15:14, Steve Loughran wrote: > >> I don't think anyone would object for the changes, except for one big >> caveat: a lot of us would like that binary file to be backwards >> compatible; a Hadoop 2.6 JAR should be able to link to the 2.8+ >> libhadoop. So whatever gets changed, the old methods are still going >> to hang around > > > That's not achievable as the method signatures need to change. Even though > they are private they need to change from static to normal methods and the > signatures need to change as well, as I said. > > JNI code is intimately intertwined with the Java code it runs with. Running > mismatching Java & JNI versions is going to be a recipe for eventual > disaster as the JVM explicitly does *not* do any error checking between Java > and JNI. At some point some innocuous change will be made that will just > cause undefined behaviour. > > I don't actually know how you'd get a JAR/JNI mismatch as they are built and > packaged together, so I'm struggling to understand what the potential issue > is here. > > In any case the constraint you are requesting would flat-out preclude this > change, and would also mean that most of the other JNI changes that have > been committed recently would have to be ripped out as well . In summary, > the bridge is already burned. > > -- > Alan Burlison > --