Re: Distribute DataSet to subset of nodes

2015-09-15 Thread Fabian Hueske
Hi Stefan, the problem is that you cannot directly influence the scheduling of tasks to nodes to ensure that you can read the data that you put in the local filesystems of your nodes. HDFS gives a shared file system which means that each node can read data from anywhere in the cluster. I assumed t

Re: Distribute DataSet to subset of nodes

2015-09-15 Thread Stefan Bunk
Hi Fabian, I think we might have a misunderstanding here. I have already copied the first file to five nodes, and the second file to five other nodes, outside of Flink. In the open() method of the operator, I just read that file via normal Java means. I do not see, why this is tricky or how HDFS s

Re: Flink Streaming and Google Cloud Pub/Sub?

2015-09-15 Thread Robert Metzger
Hey Martin, I don't think anybody used Google Cloud Pub/Sub with Flink yet. There are no tutorials for implementing streaming sources and sinks, but Flink has a few connectors that you can use as a reference. For the sources, you basically have to extend RichSourceFunction (or RichParallelSourceFu

Re: flink with kafka 0.7

2015-09-15 Thread Robert Metzger
I don't think its working. According to the Kafka documentation ( https://kafka.apache.org/documentation.html#upgrade): 0.8, the release in which added replication, was our first > backwards-incompatible release: major changes were made to the API, > ZooKeeper data structures, and protocol, and co