Re: Flink first project

Georg Heiler Thu, 27 Apr 2017 11:00:08 -0700

Thanks for the overview. I think I will use akka streams and pipe the
result to kafka, then move on with flink.
Tzu-Li (Gordon) Tai <tzuli...@apache.org> schrieb am Do. 27. Apr. 2017 um
18:37:


> Hi Georg,
>
> Simply from the aspect of a Flink source that listens to a REST endpoint
> for input data, there should be quite a variety of options to do that. The
> Akka streaming source from Bahir should also serve this purpose well. It
> would also be quite straightforward to implement one yourself.
>
> On the other hand, what Jörn was suggesting was that you would want to
> first persist the incoming data from the REST endpoint to a repayable
> storage / queue, and your Flink job reads from that replayable storage /
> queue.
> The reason for this is that Flink’s checkpointing mechanism for
> exactly-once guarantee relies on a replayable source (see [1]), and since a
> REST endpoint is not replayable, you’ll not be able to benefit from the
> fault-tolerance guarantees provided by Flink. The most popular source used
> with Flink for exactly-once, currently, is Kafka [2]. The only extra
> latency compared to just fetching REST endpoint, in this setup, is writing
> to the intermediate Kafka topic.
>
> Of course, if you’re just testing around and just getting to know Flink,
> this setup isn’t necessary.
> You can just start off with a source such as the Flink Akka connector in
> Bahir, and start writing your first Flink job right away :)
>
> Cheers,
> Gordon
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/guarantees.html
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/kafka.html
>
> On 24 April 2017 at 4:02:14 PM, Georg Heiler (georg.kf.hei...@gmail.com)
> wrote:
>
> Wouldn't adding flume -> Kafka -> flink also introduce additional latency?
>
> Georg Heiler <georg.kf.hei...@gmail.com> schrieb am So., 23. Apr. 2017 um
> 20:23 Uhr:
>
>> So you would suggest flume over a custom akka-source from bahir?
>>
>> Jörn Franke <jornfra...@gmail.com> schrieb am So., 23. Apr. 2017 um
>> 18:59 Uhr:
>>
>>> I would use flume to import these sources to HDFS and then use flink or
>>> Hadoop or whatever to process them. While it is possible to do it in flink,
>>> you do not want that your processing fails because the web service is not
>>> available etc.
>>> Via flume which is suitable for this kind of tasks it is more controlled
>>> and reliable.
>>>
>>> On 23. Apr 2017, at 18:02, Georg Heiler <georg.kf.hei...@gmail.com>
>>> wrote:
>>>
>>> New to flink I would like to do a small project to get a better feeling
>>> for flink. I am thinking of getting some stats from several REST api (i.e.
>>> Bitcoin course values from different exchanges) and comparing prices over
>>> different exchanges in real time.
>>>
>>> Are there already some REST api sources for flink as a sample to get
>>> started implementing a custom REST source?
>>>
>>> I was thinking about using https://github.com/timmolter/XChange to
>>> connect to several exchanges. E.g. to make a single api call by hand would
>>> look similar to
>>>
>>> val currencyPair = new CurrencyPair(Currency.XMR, Currency.BTC)
>>>   CertHelper.trustAllCerts()
>>>   val poloniex =
>>> ExchangeFactory.INSTANCE.createExchange(classOf[PoloniexExchange].getName)
>>>   val dataService = poloniex.getMarketDataService
>>>
>>>   generic(dataService)
>>>   raw(dataService.asInstanceOf[PoloniexMarketDataServiceRaw])
>>> System.out.println(dataService.getTicker(currencyPair))
>>>
>>> How would be a proper way to make this available as a flink source? I
>>> have seen
>>> https://github.com/apache/flink/blob/master/flink-contrib/flink-connector-wikiedits/src/main/java/org/apache/flink/streaming/connectors/wikiedits/WikipediaEditsSource.java
>>>  but
>>> new to flink am a bit unsure how to proceed.
>>>
>>> Regards,
>>> Georg
>>>
>>>

Re: Flink first project

Reply via email to