There should be a env.readbinaryfile() IIRC, check that Sent from my iPhone
> On Jan 24, 2016, at 12:44 PM, Saliya Ekanayake <esal...@gmail.com> wrote: > > Thank you for the response on this, but I still have some doubt. Simply, the > files is not in HDFS, it's in local storage. In Flink if I run the program > with, say 5 parallel tasks, what I would like to do is to read a block of > rows in each task as shown below. I looked at the simple CSV reader and was > thinking to create a custom one like that, but I would need to know the task > number to read the relevant block. Is this possible? > > <image.png> > > Thank you, > Saliya > >> On Wed, Jan 20, 2016 at 12:47 PM, Till Rohrmann <trohrm...@apache.org> wrote: >> With readHadoopFile you can use all of Hadoop’s FileInputFormats and thus >> you can also do everything with Flink, what you can do with Hadoop. Simply >> take the same Hadoop FileInputFormat which you would take for your MapReduce >> job. >> >> Cheers, >> Till >> >> >>> On Wed, Jan 20, 2016 at 3:16 PM, Saliya Ekanayake <esal...@gmail.com> wrote: >>> Thank you, I saw the readHadoopFile, but I was not sure how it can be used >>> to the following, which is what I need. The logic of the code requires an >>> entire row to operate on, so in our current implementation with P tasks, >>> each of them will read a rectangular block of (N/P) x N from the matrix. Is >>> this possible with readHadoopFile? Also, the file may not be in hdfs, so is >>> it possible to refer to local disk in doing this? >>> >>> Thank you >>> >>>> On Wed, Jan 20, 2016 at 1:31 AM, Chiwan Park <chiwanp...@apache.org> wrote: >>>> Hi Saliya, >>>> >>>> You can use the input format from Hadoop in Flink by using readHadoopFile >>>> method. The method returns a dataset which of type is Tuple2<Key, Value>. >>>> Note that MapReduce equivalent transformation in Flink is composed of map, >>>> groupBy, and reduceGroup. >>>> >>>> > On Jan 20, 2016, at 3:04 PM, Suneel Marthi <smar...@apache.org> wrote: >>>> > >>>> > Guess u r looking for Flink's BinaryInputFormat to be able to read >>>> > blocks of data from HDFS >>>> > >>>> > https://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java/org/apache/flink/api/common/io/BinaryInputFormat.html >>>> > >>>> > On Wed, Jan 20, 2016 at 12:45 AM, Saliya Ekanayake <esal...@gmail.com> >>>> > wrote: >>>> > Hi, >>>> > >>>> > I am trying to use Flink perform a parallel batch operation on a NxN >>>> > matrix represented as a binary file. Each (i,j) element is stored as a >>>> > Java Short value. In a typical MapReduce programming with Hadoop, each >>>> > map task will read a block of rows of this matrix and perform >>>> > computation on that block and emit result to the reducer. >>>> > >>>> > How is this done in Flink? I am new to Flink and couldn't find a binary >>>> > reader so far. Any help is greatly appreciated. >>>> > >>>> > Thank you, >>>> > Saliya >>>> > >>>> > -- >>>> > Saliya Ekanayake >>>> > Ph.D. Candidate | Research Assistant >>>> > School of Informatics and Computing | Digital Science Center >>>> > Indiana University, Bloomington >>>> > Cell 812-391-4914 >>>> > http://saliya.org >>>> > >>>> >>>> Regards, >>>> Chiwan Park >>> >>> >>> >>> -- >>> Saliya Ekanayake >>> Ph.D. Candidate | Research Assistant >>> School of Informatics and Computing | Digital Science Center >>> Indiana University, Bloomington >>> Cell 812-391-4914 >>> http://saliya.org > > > > -- > Saliya Ekanayake > Ph.D. Candidate | Research Assistant > School of Informatics and Computing | Digital Science Center > Indiana University, Bloomington > Cell 812-391-4914 > http://saliya.org