Alan, I also meant to say that I didn't understand the comment about "in production it seems that DomainSocket is less commonly used". The current implementation of short-circuit read definitely utilizes DomainSocket, and it's very common to enable this in production clusters. The documentation page you mentioned includes discussion of a legacy short-circuit read implementation, which did not utilize UNIX domain sockets, but the legacy implementation is rarely used in practice now.
--Chris Nauroth On 9/30/15, 8:46 AM, "Chris Nauroth" <cnaur...@hortonworks.com> wrote: >Hello Alan, > >I think this sounds like a reasonable approach. I recommend that you file >a JIRA with the proposal (copy-paste the content of your email into a >comment) and then wait a few days before starting work in earnest to see >if anyone else wants to discuss it first. I also recommend notifying >Colin Patrick McCabe on that JIRA. It would be good to get a second >opinion from him, since he is the original author of much of this code. > >--Chris Nauroth > > > > >On 9/30/15, 1:14 AM, "Alan Burlison" <alan.burli...@oracle.com> wrote: > >>Now that the Hadoop native code builds on Solaris I've been chipping >>away at all the test failures. About 50% of the failures involve >>DomainSocket, either directly or indirectly. That seems to be mainly >>because the tests use DomainSocket to do single-node testing, whereas in >>production it seems that DomainSocket is less commonly used >>(https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/Sh >>o >>rtCircuitLocalReads.html). >> >>The particular problem on Solaris is that socket read/write timeouts >>(the SO_SNDTIMEO and SO_RCVTIMEO socket options) are not supported for >>UNIX domain (PF_UNIX) sockets. Those options are however supported for >>PF_INET sockets. That's because the socket implementation on Solaris is >>split roughly into two parts, for inet sockets and for STREAMS sockets, >>and the STREAMS implementation lacks support for SO_SNDTIMEO and >>SO_RCVTIMEO. As an aside, performance of sockets that use loopback or >>the host's own IP is slightly better than that of UNIX domain sockets on >>Solaris. >> >>I'm investigating getting timeouts supported for PF_UNIX sockets added >>to Solaris, but in the meantime I'm also looking how this might be >>worked around in Hadoop. One way would be to implement timeouts by >>wrapping all the read/write/send/recv etc calls in DomainSocket.c with >>either poll() or select(). >> >>The basic idea is to add two new fields to DomainSocket.c to hold the >>read/write timeouts. On platforms that support SO_SNDTIMEO and >>SO_RCVTIMEO these would be unused as setsockopt() would be used to set >>the socket timeouts. On platforms such as Solaris the JNI code would use >>the values to implement the timeouts appropriately. >> >>To prevent the code in DomainSocket.c becoming a #ifdef hairball, the >>current socket IO function calls such as accept(), send(), read() etc >>would be replaced with a macros such as HD_ACCEPT. On platforms that >>provide timeouts these would just expand to the normal socket functions, >>on platforms that don't support timeouts it would expand to wrappers >>that implements timeouts for them. >> >>The only caveats are that all code that does anything to a PF_UNIX >>socket would *always* have to do so via DomainSocket. As far as I can >>tell that's not an issue, but it would have to be borne in mind if any >>changes were made in this area. >> >>Before I set about doing this, does the approach seem reasonable? >> >>Thanks, >> >>-- >>Alan Burlison >>-- >> >