Hi Kaan,
explicitly mapping to physical nodes is currently not supported and would
need some workarounds. I have readded user mailing list (please always also
include it in response); maybe someone can help you with that.
Best,
Arvid
On Thu, Apr 30, 2020 at 10:12 AM Kaan Sancak wrote:
> One q
Hi Kaan,
not entirely sure I understand your solution. I gathered that you create a
dataset of TCP addresses and then use flatMap to fetch and output the data?
If so, then I think it's a good solution for batch processing (DataSet). It
doesn't work in DataStream because it doesn't play well with
Hi Kaan,
seems like ZMQ is using TCP and not HTTP. So I guess the easiest way would
be to use a ZMQ Java binding to access it [1].
But of course, it's much more complicated to write an iterator logic for
that. Not sure how ZMQ signals the end of such a graph? Maybe it closes the
socket and you ca
Hm, I confused sockets to work the other way around (so pulling like
URLInputStream instead of listening). I'd go by providing the data on a
port on each generator node. And then read from that in multiple sources.
I think the best solution is to implement a custom InputFormat and then use
readInp
Yes, that sounds like a great idea and actually that's what I am trying to do.
> Then you configure your analysis job to read from each of these sockets with
> a separate source and union them before feeding them to the actual job?
Before trying to open the sockets on the slave nodes, first I h
Hi Kaan,
sorry, I haven't considered I/O as the bottleneck. I thought a bit more
about your issue and came to a rather simple solution.
How about you open a socket on each of your generator nodes? Then you
configure your analysis job to read from each of these sockets with a
separate source and u
Thanks for the answer! Also thanks for raising some concerns about my question.
Some of the graphs I have been using is larger than 1.5 tb, and I am currently
an experiment stage of a project, and I am making modifications to my code and
re-runing the experiments again. Currently, on some of the
Hi Kaan,
afaik there is no (easy) way to switch from streaming back to batch API
while retaining all data in memory (correct me if I misunderstood).
However, from your description, I also have some severe understanding
problems. Why can't you dump the data to some file? Do you really have more
ma