Sorry I hit too soon, then after updating the priority I would execute
group by DoFn and then use an external sorting (by this priority key) to
guarantee that on the next DoFn you have a sorted Iterable.


JC

Juan Carlos Garcia <jcgarc...@gmail.com> schrieb am Sa., 24. Aug. 2019,
11:08:

> Hi,
>
> The main puzzle here is how to deliver the priority change of a row during
> a given window, my best shot would be to have a side input
> (PCollectionView) containing the change of priority, then in the slow
> worker beam transform extract this side input and update the corresponding
> row with the new priority.
>
> Again is just an idea.
>
> JC
>
> Chad Dombrova <chad...@gmail.com> schrieb am Sa., 24. Aug. 2019, 03:32:
>
>> Hi all,
>> Our team is brainstorming how to solve a particular type of problem with
>> Beam, and it's a bit beyond our experience level, so I thought I'd turn to
>> the experts for some advice.
>>
>> Here are the pieces of our puzzle:
>>
>>    - a data source with the following properties:
>>       - rows represent work to do
>>       - each row has an integer priority
>>       - rows can be added or deleted
>>       - priorities of a row can be changed
>>       - <10k rows
>>    - a slow worker Beam transform (Map/ParDo) that consumes a row at a
>>    time
>>
>>
>> We want a streaming pipeline that delivers rows from our data store to
>> the worker transform,  resorting the source based on priority each time a
>> new row is delivered.  The goal is that last second changes in priority can
>> affect the order of the slowly yielding read.  Throughput is not a major
>> concern since the worker is the bottleneck.
>>
>> I have a few questions:
>>
>>    - is the sort of problem that BeamSQL can solve? I'm not sure how
>>    sorting and resorting are handled there in a streaming context...
>>    - I'm unclear on how back-pressure in Flink affects streaming reads.
>>    It's my hope that data/messages are left in the data source until
>>    back-pressure subsides, rather than read eagerly into memory.  Can someone
>>    clarify this for me?
>>    - is there a combination of windowing and triggering that can solve
>>    this continual resorting plus slow yielding problem?  It's not 
>> unreasonable
>>    to keep all of our rows in memory on Flink, as long as we're snapshotting
>>    state.
>>
>>
>> Any advice on how an expert Beamer would solve this is greatly
>> appreciated!
>>
>> Thanks,
>> chad
>>
>>

Reply via email to