Re: Reading Binary Data (Matrix) with Flink

Suneel Marthi Sun, 24 Jan 2016 10:11:44 -0800

There should be a env.readbinaryfile() IIRC, check that

Sent from my iPhone


> On Jan 24, 2016, at 12:44 PM, Saliya Ekanayake <esal...@gmail.com> wrote:
> 
> Thank you for the response on this, but I still have some doubt. Simply, the 
> files is not in HDFS, it's in local storage. In Flink if I run the program 
> with, say 5 parallel tasks, what I would like to do is to read a block of 
> rows in each task as shown below. I looked at the simple CSV reader and was 
> thinking to create a custom one like that, but I would need to know the task 
> number to read the relevant block. Is this possible?
> 
> <image.png>
> 
> Thank you,
> Saliya
> 
>> On Wed, Jan 20, 2016 at 12:47 PM, Till Rohrmann <trohrm...@apache.org> wrote:
>> With readHadoopFile you can use all of Hadoop’s FileInputFormats and thus 
>> you can also do everything with Flink, what you can do with Hadoop. Simply 
>> take the same Hadoop FileInputFormat which you would take for your MapReduce 
>> job.
>> 
>> Cheers,
>> Till
>> 
>> 
>>> On Wed, Jan 20, 2016 at 3:16 PM, Saliya Ekanayake <esal...@gmail.com> wrote:
>>> Thank you, I saw the readHadoopFile, but I was not sure how it can be used 
>>> to the following, which is what I need. The logic of the code requires an 
>>> entire row to operate on, so in our current implementation with P tasks, 
>>> each of them will read a rectangular block of (N/P) x N from the matrix. Is 
>>> this possible with readHadoopFile? Also, the file may not be in hdfs, so is 
>>> it possible to refer to local disk in doing this?
>>> 
>>> Thank you
>>> 
>>>> On Wed, Jan 20, 2016 at 1:31 AM, Chiwan Park <chiwanp...@apache.org> wrote:
>>>> Hi Saliya,
>>>> 
>>>> You can use the input format from Hadoop in Flink by using readHadoopFile 
>>>> method. The method returns a dataset which of type is Tuple2<Key, Value>. 
>>>> Note that MapReduce equivalent transformation in Flink is composed of map, 
>>>> groupBy, and reduceGroup.
>>>> 
>>>> > On Jan 20, 2016, at 3:04 PM, Suneel Marthi <smar...@apache.org> wrote:
>>>> >
>>>> > Guess u r looking for Flink's BinaryInputFormat to be able to read 
>>>> > blocks of data from HDFS
>>>> >
>>>> > https://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java/org/apache/flink/api/common/io/BinaryInputFormat.html
>>>> >
>>>> > On Wed, Jan 20, 2016 at 12:45 AM, Saliya Ekanayake <esal...@gmail.com> 
>>>> > wrote:
>>>> > Hi,
>>>> >
>>>> > I am trying to use Flink perform a parallel batch operation on a NxN 
>>>> > matrix represented as a binary file. Each (i,j) element is stored as a 
>>>> > Java Short value. In a typical MapReduce programming with Hadoop, each 
>>>> > map task will read a block of rows of this matrix and perform 
>>>> > computation on that block and emit result to the reducer.
>>>> >
>>>> > How is this done in Flink? I am new to Flink and couldn't find a binary 
>>>> > reader so far. Any help is greatly appreciated.
>>>> >
>>>> > Thank you,
>>>> > Saliya
>>>> >
>>>> > --
>>>> > Saliya Ekanayake
>>>> > Ph.D. Candidate | Research Assistant
>>>> > School of Informatics and Computing | Digital Science Center
>>>> > Indiana University, Bloomington
>>>> > Cell 812-391-4914
>>>> > http://saliya.org
>>>> >
>>>> 
>>>> Regards,
>>>> Chiwan Park
>>> 
>>> 
>>> 
>>> -- 
>>> Saliya Ekanayake
>>> Ph.D. Candidate | Research Assistant
>>> School of Informatics and Computing | Digital Science Center
>>> Indiana University, Bloomington
>>> Cell 812-391-4914
>>> http://saliya.org
> 
> 
> 
> -- 
> Saliya Ekanayake
> Ph.D. Candidate | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
> Cell 812-391-4914
> http://saliya.org

Re: Reading Binary Data (Matrix) with Flink

Reply via email to