Thank you for the response on this, but I still have some doubt. Simply, the files is not in HDFS, it's in local storage. In Flink if I run the program with, say 5 parallel tasks, what I would like to do is to read a block of rows in each task as shown below. I looked at the simple CSV reader and was thinking to create a custom one like that, but I would need to know the task number to read the relevant block. Is this possible?
[image: Inline image 2] Thank you, Saliya On Wed, Jan 20, 2016 at 12:47 PM, Till Rohrmann <trohrm...@apache.org> wrote: > With readHadoopFile you can use all of Hadoop’s FileInputFormats and thus > you can also do everything with Flink, what you can do with Hadoop. Simply > take the same Hadoop FileInputFormat which you would take for your > MapReduce job. > > Cheers, > Till > > > On Wed, Jan 20, 2016 at 3:16 PM, Saliya Ekanayake <esal...@gmail.com> > wrote: > >> Thank you, I saw the readHadoopFile, but I was not sure how it can be >> used to the following, which is what I need. The logic of the code requires >> an entire row to operate on, so in our current implementation with P tasks, >> each of them will read a rectangular block of (N/P) x N from the matrix. Is >> this possible with readHadoopFile? Also, the file may not be in hdfs, so is >> it possible to refer to local disk in doing this? >> >> Thank you >> >> On Wed, Jan 20, 2016 at 1:31 AM, Chiwan Park <chiwanp...@apache.org> >> wrote: >> >>> Hi Saliya, >>> >>> You can use the input format from Hadoop in Flink by using >>> readHadoopFile method. The method returns a dataset which of type is >>> Tuple2<Key, Value>. Note that MapReduce equivalent transformation in Flink >>> is composed of map, groupBy, and reduceGroup. >>> >>> > On Jan 20, 2016, at 3:04 PM, Suneel Marthi <smar...@apache.org> wrote: >>> > >>> > Guess u r looking for Flink's BinaryInputFormat to be able to read >>> blocks of data from HDFS >>> > >>> > >>> https://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java/org/apache/flink/api/common/io/BinaryInputFormat.html >>> > >>> > On Wed, Jan 20, 2016 at 12:45 AM, Saliya Ekanayake <esal...@gmail.com> >>> wrote: >>> > Hi, >>> > >>> > I am trying to use Flink perform a parallel batch operation on a NxN >>> matrix represented as a binary file. Each (i,j) element is stored as a Java >>> Short value. In a typical MapReduce programming with Hadoop, each map task >>> will read a block of rows of this matrix and perform computation on that >>> block and emit result to the reducer. >>> > >>> > How is this done in Flink? I am new to Flink and couldn't find a >>> binary reader so far. Any help is greatly appreciated. >>> > >>> > Thank you, >>> > Saliya >>> > >>> > -- >>> > Saliya Ekanayake >>> > Ph.D. Candidate | Research Assistant >>> > School of Informatics and Computing | Digital Science Center >>> > Indiana University, Bloomington >>> > Cell 812-391-4914 >>> > http://saliya.org >>> > >>> >>> Regards, >>> Chiwan Park >>> >>> >> >> >> -- >> Saliya Ekanayake >> Ph.D. Candidate | Research Assistant >> School of Informatics and Computing | Digital Science Center >> Indiana University, Bloomington >> Cell 812-391-4914 >> http://saliya.org >> > > -- Saliya Ekanayake Ph.D. Candidate | Research Assistant School of Informatics and Computing | Digital Science Center Indiana University, Bloomington Cell 812-391-4914 http://saliya.org