Re: Read or Save specific blocks of a file

2018-04-24 Thread Thodoris Zois
Jim Clampffer [mailto:james.clampf...@gmail.com] > Sent: Tuesday, April 24, 2018 10:42 AM > To: Arpit Agarwal > Cc: hdfs-dev@hadoop.apache.org > Subject: Re: Read or Save specific blocks of a file > > If you want to read replicas from a specific DN after determining the > block b

RE: Read or Save specific blocks of a file

2018-04-23 Thread Takanobu Asanuma
Sent: Tuesday, April 24, 2018 10:42 AM To: Arpit Agarwal Cc: hdfs-dev@hadoop.apache.org Subject: Re: Read or Save specific blocks of a file If you want to read replicas from a specific DN after determining the block bounds via getFileBlockLocations you could abuse the rack locality infrastructu

Re: Read or Save specific blocks of a file

2018-04-23 Thread Jim Clampffer
If you want to read replicas from a specific DN after determining the block bounds via getFileBlockLocations you could abuse the rack locality infrastructure by generating a dummy topology script to get the NN to order replicas such that the client tries to read from the DNs you prefer first. It's

Re: Read or Save specific blocks of a file

2018-04-23 Thread Arpit Agarwal
Hi, Perhaps I missed something in the question. FileSystem#getFileBlockLocations followed by open, seek to start of target block, read. This will let you read the contents of a specific block using public APIs. On 4/23/18, 5:26 PM, "Daniel Templeton" wrote: I'm not aware of a way to wo

Re: Read or Save specific blocks of a file

2018-04-23 Thread Daniel Templeton
I'm not aware of a way to work with blocks using the public APIs. The easiest way to do it is probably to retrieve the block IDs and then go grab those blocks from the data nodes' local file systems directly. Daniel On 4/23/18 9:05 AM, Thodoris Zois wrote: Hello list, I have a file on HDFS t

Read or Save specific blocks of a file

2018-04-23 Thread Thodoris Zois
Hello list, I have a file on HDFS that is divided into 10 blocks (partitions). Is there any way to retrieve data from a specific block? (e.g: using the blockID). Except that, is there any option to write the contents of each block (or of one block) into separate files? Thank you very much, Th