Sorry I hit too soon, then after updating the priority I would execute group by DoFn and then use an external sorting (by this priority key) to guarantee that on the next DoFn you have a sorted Iterable.
JC Juan Carlos Garcia <jcgarc...@gmail.com> schrieb am Sa., 24. Aug. 2019, 11:08: > Hi, > > The main puzzle here is how to deliver the priority change of a row during > a given window, my best shot would be to have a side input > (PCollectionView) containing the change of priority, then in the slow > worker beam transform extract this side input and update the corresponding > row with the new priority. > > Again is just an idea. > > JC > > Chad Dombrova <chad...@gmail.com> schrieb am Sa., 24. Aug. 2019, 03:32: > >> Hi all, >> Our team is brainstorming how to solve a particular type of problem with >> Beam, and it's a bit beyond our experience level, so I thought I'd turn to >> the experts for some advice. >> >> Here are the pieces of our puzzle: >> >> - a data source with the following properties: >> - rows represent work to do >> - each row has an integer priority >> - rows can be added or deleted >> - priorities of a row can be changed >> - <10k rows >> - a slow worker Beam transform (Map/ParDo) that consumes a row at a >> time >> >> >> We want a streaming pipeline that delivers rows from our data store to >> the worker transform, resorting the source based on priority each time a >> new row is delivered. The goal is that last second changes in priority can >> affect the order of the slowly yielding read. Throughput is not a major >> concern since the worker is the bottleneck. >> >> I have a few questions: >> >> - is the sort of problem that BeamSQL can solve? I'm not sure how >> sorting and resorting are handled there in a streaming context... >> - I'm unclear on how back-pressure in Flink affects streaming reads. >> It's my hope that data/messages are left in the data source until >> back-pressure subsides, rather than read eagerly into memory. Can someone >> clarify this for me? >> - is there a combination of windowing and triggering that can solve >> this continual resorting plus slow yielding problem? It's not >> unreasonable >> to keep all of our rows in memory on Flink, as long as we're snapshotting >> state. >> >> >> Any advice on how an expert Beamer would solve this is greatly >> appreciated! >> >> Thanks, >> chad >> >>