Re: Reading from sockets using dataset api

2020-05-05 Thread Arvid Heise
Hi Kaan, explicitly mapping to physical nodes is currently not supported and would need some workarounds. I have readded user mailing list (please always also include it in response); maybe someone can help you with that. Best, Arvid On Thu, Apr 30, 2020 at 10:12 AM Kaan Sancak wrote: > One q

Re: Reading from sockets using dataset api

2020-04-29 Thread Arvid Heise
Hi Kaan, not entirely sure I understand your solution. I gathered that you create a dataset of TCP addresses and then use flatMap to fetch and output the data? If so, then I think it's a good solution for batch processing (DataSet). It doesn't work in DataStream because it doesn't play well with

Re: Reading from sockets using dataset api

2020-04-29 Thread Arvid Heise
Hi Kaan, seems like ZMQ is using TCP and not HTTP. So I guess the easiest way would be to use a ZMQ Java binding to access it [1]. But of course, it's much more complicated to write an iterator logic for that. Not sure how ZMQ signals the end of such a graph? Maybe it closes the socket and you ca

Re: Reading from sockets using dataset api

2020-04-24 Thread Arvid Heise
Hm, I confused sockets to work the other way around (so pulling like URLInputStream instead of listening). I'd go by providing the data on a port on each generator node. And then read from that in multiple sources. I think the best solution is to implement a custom InputFormat and then use readInp

Re: Reading from sockets using dataset api

2020-04-24 Thread Kaan Sancak
Yes, that sounds like a great idea and actually that's what I am trying to do. > Then you configure your analysis job to read from each of these sockets with > a separate source and union them before feeding them to the actual job? Before trying to open the sockets on the slave nodes, first I h

Re: Reading from sockets using dataset api

2020-04-24 Thread Arvid Heise
Hi Kaan, sorry, I haven't considered I/O as the bottleneck. I thought a bit more about your issue and came to a rather simple solution. How about you open a socket on each of your generator nodes? Then you configure your analysis job to read from each of these sockets with a separate source and u

Re: Reading from sockets using dataset api

2020-04-23 Thread Kaan Sancak
Thanks for the answer! Also thanks for raising some concerns about my question. Some of the graphs I have been using is larger than 1.5 tb, and I am currently an experiment stage of a project, and I am making modifications to my code and re-runing the experiments again. Currently, on some of the

Re: Reading from sockets using dataset api

2020-04-23 Thread Arvid Heise
Hi Kaan, afaik there is no (easy) way to switch from streaming back to batch API while retaining all data in memory (correct me if I misunderstood). However, from your description, I also have some severe understanding problems. Why can't you dump the data to some file? Do you really have more ma