Hey Theodore,
Thanks for pointing us to Tensorflow, I didn't think it would be useful
for this.
From what you cite they seem to have a very similar state-handling
mechanism to Flink (as well as fault tolerance).
I'm sure we can get ideas from how they do model-parallel training. I'll
definitel
Hello all,
I would also be really interested in how a PS-like architecture would work
in Flink. Note that we not necessarily talking about PS, but generally how
QueryableState can be used for ML tasks with I guess a focus on
model-parallel training.
One suggestion I would make is to take a look a
Hey Ufuk,
I'm happy to contribute. At least I'll get a bit more understanding of
the details.
Breaking the assumption that only a single thread updates state would
brings us from strong isolation guarantees (i.e. serializability at the
updates and read committed at the external queries) to n
Hey Gabor,
great ideas here. It's only slightly related, but I'm currently working on a
proposal to improve the queryable state APIs for lookups (partly along the
lines of what you suggested with higher level accessors). Maybe you are
interested in contributing there?
I really like your ideas
Hi Gyula, Jinkui Shi,
Thanks for your thoughts!
@Gyula: I'll try and explain a bit more detail.
The API could be almost like the QueryableState's. It could be
higher-level though: returning Java objects instead of serialized data
(because there would not be issues with class loading). Also, i
hi,Gábor Hermann
The online parameter server is a good proposal.
PS’ paper [1] have a early implement [2], and now it’s mxnet [3].
I have some thought about online PS in Flink:
1. Whether support flexible and configurable update strategy?
For example, in one iteration, computing serv
Hi Gábor,
I think the general idea is very nice, but it would nice to see clearer
what benefit does this bring from the developers perspective. Maybe rough
API sketch and 1-2 examples.
I am wondering what sort of consistency guarantees do you imagine for such
operations, or why the fault toleranc
Hi all,
TL;DR: Is it worth to implement a special QueryableState for querying
state from another part of a Flink streaming job and aligning it with
fault tolerance?
I've been thinking about implementing a Parameter Server with/within
Flink. A Parameter Server is basically a specialized key-v