Thank you for the response on this, but I still have some doubt. Simply,
the files is not in HDFS, it's in local storage. In Flink if I run the
program with, say 5 parallel tasks, what I would like to do is to read a
block of rows in each task as shown below. I looked at the simple CSV
reader and was thinking to create a custom one like that, but I would need
to know the task number to read the relevant block. Is this possible?

[image: Inline image 2]

Thank you,
Saliya

On Wed, Jan 20, 2016 at 12:47 PM, Till Rohrmann <trohrm...@apache.org>
wrote:

> With readHadoopFile you can use all of Hadoop’s FileInputFormats and thus
> you can also do everything with Flink, what you can do with Hadoop. Simply
> take the same Hadoop FileInputFormat which you would take for your
> MapReduce job.
>
> Cheers,
> Till
> ​
>
> On Wed, Jan 20, 2016 at 3:16 PM, Saliya Ekanayake <esal...@gmail.com>
> wrote:
>
>> Thank you, I saw the readHadoopFile, but I was not sure how it can be
>> used to the following, which is what I need. The logic of the code requires
>> an entire row to operate on, so in our current implementation with P tasks,
>> each of them will read a rectangular block of (N/P) x N from the matrix. Is
>> this possible with readHadoopFile? Also, the file may not be in hdfs, so is
>> it possible to refer to local disk in doing this?
>>
>> Thank you
>>
>> On Wed, Jan 20, 2016 at 1:31 AM, Chiwan Park <chiwanp...@apache.org>
>> wrote:
>>
>>> Hi Saliya,
>>>
>>> You can use the input format from Hadoop in Flink by using
>>> readHadoopFile method. The method returns a dataset which of type is
>>> Tuple2<Key, Value>. Note that MapReduce equivalent transformation in Flink
>>> is composed of map, groupBy, and reduceGroup.
>>>
>>> > On Jan 20, 2016, at 3:04 PM, Suneel Marthi <smar...@apache.org> wrote:
>>> >
>>> > Guess u r looking for Flink's BinaryInputFormat to be able to read
>>> blocks of data from HDFS
>>> >
>>> >
>>> https://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java/org/apache/flink/api/common/io/BinaryInputFormat.html
>>> >
>>> > On Wed, Jan 20, 2016 at 12:45 AM, Saliya Ekanayake <esal...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > I am trying to use Flink perform a parallel batch operation on a NxN
>>> matrix represented as a binary file. Each (i,j) element is stored as a Java
>>> Short value. In a typical MapReduce programming with Hadoop, each map task
>>> will read a block of rows of this matrix and perform computation on that
>>> block and emit result to the reducer.
>>> >
>>> > How is this done in Flink? I am new to Flink and couldn't find a
>>> binary reader so far. Any help is greatly appreciated.
>>> >
>>> > Thank you,
>>> > Saliya
>>> >
>>> > --
>>> > Saliya Ekanayake
>>> > Ph.D. Candidate | Research Assistant
>>> > School of Informatics and Computing | Digital Science Center
>>> > Indiana University, Bloomington
>>> > Cell 812-391-4914
>>> > http://saliya.org
>>> >
>>>
>>> Regards,
>>> Chiwan Park
>>>
>>>
>>
>>
>> --
>> Saliya Ekanayake
>> Ph.D. Candidate | Research Assistant
>> School of Informatics and Computing | Digital Science Center
>> Indiana University, Bloomington
>> Cell 812-391-4914
>> http://saliya.org
>>
>
>


-- 
Saliya Ekanayake
Ph.D. Candidate | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington
Cell 812-391-4914
http://saliya.org

Reply via email to