Thanks for the overview. I think I will use akka streams and pipe the result to kafka, then move on with flink. Tzu-Li (Gordon) Tai <tzuli...@apache.org> schrieb am Do. 27. Apr. 2017 um 18:37:
> Hi Georg, > > Simply from the aspect of a Flink source that listens to a REST endpoint > for input data, there should be quite a variety of options to do that. The > Akka streaming source from Bahir should also serve this purpose well. It > would also be quite straightforward to implement one yourself. > > On the other hand, what Jörn was suggesting was that you would want to > first persist the incoming data from the REST endpoint to a repayable > storage / queue, and your Flink job reads from that replayable storage / > queue. > The reason for this is that Flink’s checkpointing mechanism for > exactly-once guarantee relies on a replayable source (see [1]), and since a > REST endpoint is not replayable, you’ll not be able to benefit from the > fault-tolerance guarantees provided by Flink. The most popular source used > with Flink for exactly-once, currently, is Kafka [2]. The only extra > latency compared to just fetching REST endpoint, in this setup, is writing > to the intermediate Kafka topic. > > Of course, if you’re just testing around and just getting to know Flink, > this setup isn’t necessary. > You can just start off with a source such as the Flink Akka connector in > Bahir, and start writing your first Flink job right away :) > > Cheers, > Gordon > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/guarantees.html > [2] > https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/kafka.html > > On 24 April 2017 at 4:02:14 PM, Georg Heiler (georg.kf.hei...@gmail.com) > wrote: > > Wouldn't adding flume -> Kafka -> flink also introduce additional latency? > > Georg Heiler <georg.kf.hei...@gmail.com> schrieb am So., 23. Apr. 2017 um > 20:23 Uhr: > >> So you would suggest flume over a custom akka-source from bahir? >> >> Jörn Franke <jornfra...@gmail.com> schrieb am So., 23. Apr. 2017 um >> 18:59 Uhr: >> >>> I would use flume to import these sources to HDFS and then use flink or >>> Hadoop or whatever to process them. While it is possible to do it in flink, >>> you do not want that your processing fails because the web service is not >>> available etc. >>> Via flume which is suitable for this kind of tasks it is more controlled >>> and reliable. >>> >>> On 23. Apr 2017, at 18:02, Georg Heiler <georg.kf.hei...@gmail.com> >>> wrote: >>> >>> New to flink I would like to do a small project to get a better feeling >>> for flink. I am thinking of getting some stats from several REST api (i.e. >>> Bitcoin course values from different exchanges) and comparing prices over >>> different exchanges in real time. >>> >>> Are there already some REST api sources for flink as a sample to get >>> started implementing a custom REST source? >>> >>> I was thinking about using https://github.com/timmolter/XChange to >>> connect to several exchanges. E.g. to make a single api call by hand would >>> look similar to >>> >>> val currencyPair = new CurrencyPair(Currency.XMR, Currency.BTC) >>> CertHelper.trustAllCerts() >>> val poloniex = >>> ExchangeFactory.INSTANCE.createExchange(classOf[PoloniexExchange].getName) >>> val dataService = poloniex.getMarketDataService >>> >>> generic(dataService) >>> raw(dataService.asInstanceOf[PoloniexMarketDataServiceRaw]) >>> System.out.println(dataService.getTicker(currencyPair)) >>> >>> How would be a proper way to make this available as a flink source? I >>> have seen >>> https://github.com/apache/flink/blob/master/flink-contrib/flink-connector-wikiedits/src/main/java/org/apache/flink/streaming/connectors/wikiedits/WikipediaEditsSource.java >>> but >>> new to flink am a bit unsure how to proceed. >>> >>> Regards, >>> Georg >>> >>>