Hi, I have been running some experiments on large graph data, smallest graph I have been using is around ~70 billion edges. I have a graph generator, which generates the graph in parallel and feeds to the running system. However, it takes a lot of time to read the edges, because even though the graph generation process is parallel, in Flink I can only listen from master node (correct me if I am wrong). Another option is dumping the generated data to a file and reading with readFromCsv, however this is not feasible in terms of storage management.
What I want to do is, invoking my graph generator, using ipc/tcp protocols and reading the generated data from the sockets. Since the graph data is also generated parallel in each node, I want to make use of ipc, and read the data in parallel at each node. I made some online digging but couldn’t find something similar using dataset api. I would be glad if you have some similar use cases or examples. Is it possible to use streaming environment to create the data in parallel and switch to dataset api? Thanks in advance! Best Kaan