That's an interesting find, though I don't think we'd be able to swap in
INET sockets in this part of the code.  We use Unix domain sockets to
share an open file descriptor from the DataNode process to the HDFS client
process, and then the client reads directly from that open file
descriptor.  I think file descriptor sharing is a capability of Unix
domain sockets only, and not INET sockets.  As you said, I wouldn't expect
throughput on the Unix domain socket to be a bottleneck, because there is
very little data transferred.

--Chris Nauroth




On 9/30/15, 9:12 AM, "Alan Burlison" <alan.burli...@oracle.com> wrote:

>On 30/09/2015 16:56, Chris Nauroth wrote:
>
>> Alan, I also meant to say that I didn't understand the comment about "in
>> production it seems that DomainSocket is less commonly used".  The
>>current
>> implementation of short-circuit read definitely utilizes DomainSocket,
>>and
>> it's very common to enable this in production clusters.  The
>>documentation
>> page you mentioned includes discussion of a legacy short-circuit read
>> implementation, which did not utilize UNIX domain sockets, but the
>>legacy
>> implementation is rarely used in practice now.
>
>Oh, OK - thanks for the clarification. I couldn't find much about
>DomainSocket other than the link I posted and that didn't make it sound
>like it was used all that much. I'll make sure the JIRA reflects what
>you said above.
>
>Interestingly, INET sockets are faster than UNIX sockets on Linux as
>well as on Solaris. There's not much in it, around 10% in both cases,
>and I suspect socket throughput isn't the rate-limiting step anyway.
>
>-- 
>Alan Burlison
>--
>

Reply via email to