Hi Demai,

Nearly all input and output stream operations will talk directly to
the DN without involving the NN.  The NameNode is involved in metadata
operations such as renaming or opening files, not in reading data.

Hope this helps.

best,
Colin


On Thu, Feb 12, 2015 at 4:21 PM, Demai Ni <nid...@gmail.com> wrote:
> Colin,
>
> Thanks. 30~50K is smaller than I thought, through I understand that I
> shouldn't stress the traffic unnecessarily.
>
> If I can put my client(java/c) on a datanode and only read the local hdfs
> files, that is the files have their replicas on such datanode. Is there an
> API I can use to talk directly to DN, without stressing NN?  Thanks
>
> Demai
>
> On Thu, Feb 12, 2015 at 2:05 PM, Colin McCabe <cmcc...@alumni.cmu.edu>
> wrote:
>
>> The NN can do somewhere around 30,000 - 50,000 RPCs per second
>> currently, depending on configuration.  In general you do not want to
>> have extremely high NN RPC traffic, because it will slow things down.
>> You might consider re-architecting your application to do more DN
>> traffic and less NN traffic, if possible.  Hope that helps.
>>
>> best,
>> Colin
>>
>> On Tue, Feb 10, 2015 at 4:29 PM, Demai Ni <nid...@gmail.com> wrote:
>> > hi, folks,
>> >
>> > Is there a max limit of concurrent connection to a name node? or whether
>> > there is a best practice?
>> >
>> > My scenario is simple. Client(java/c++) program will open a connection
>> > through hdfs api call, and then open a few hdfs files, maybe read a bit
>> > data, then close the connection. In some case, the number of clients may
>> > be  50,000~100,000 concurrently. Is the number of connection acceptable?
>> >
>> > Thanks.
>> >
>> > Demai
>>

Reply via email to