Hey Saliya,

I recommend using DataSetUtils.zipWithIndex for this task. [1] It comes
with flink-java.

[1]
https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/utils/DataSetUtils.java#L77

On Thu, Feb 25, 2016 at 5:52 PM, Saliya Ekanayake <esal...@gmail.com> wrote:

> Thank you, Marton. That seems doable.
>
> However, is there a way I can create a dummy indexed data set? Like a way
> to partition the index range without data across parallel tasks. For
> example, if I could have something like,
>
> DataSet<IndexedSet> ds = ...
>
> then I can implement a custom method to load required data for a split
> within a map operation, which will be less expensive than a join for my
> case.
>
> Thank you,
> Saliya
>
> On Thu, Feb 25, 2016 at 11:45 AM, Márton Balassi <balassi.mar...@gmail.com
> > wrote:
>
>> Hey Saliya,
>>
>> I would add a uniqe ID to both the DataSets, the variable you referred to
>> as 'i'. Then you can join the two DataSets on the field containing 'i' and
>> do the mapping on the joined result.
>>
>> Hope this helps,
>>
>> Marton
>>
>> On Thu, Feb 25, 2016 at 5:38 PM, Saliya Ekanayake <esal...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I've two data sets like,
>>>
>>> DataSet<T> a = ...
>>> DataSet<T> b = ...
>>>
>>> They have the same type and same decomposition. I want to apply a map
>>> operator that need both *a* and *b. *For example,
>>>
>>> a.map( i -> OP)
>>>
>>> within this OP I need the corresponding (*i *th) element of *b* as
>>> well. Is there a way to do this?
>>>
>>> Thank you,
>>> Saliya
>>>
>>> --
>>> Saliya Ekanayake
>>> Ph.D. Candidate | Research Assistant
>>> School of Informatics and Computing | Digital Science Center
>>> Indiana University, Bloomington
>>> Cell 812-391-4914
>>> http://saliya.org
>>>
>>
>>
>
>
> --
> Saliya Ekanayake
> Ph.D. Candidate | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
> Cell 812-391-4914
> http://saliya.org
>

Reply via email to