Re: Calling external services/databases from DataStream API

2017-01-31 Thread Fabian Hueske
ges and not just keys. > > > > Regards, > > > > Diego > > > > *De:* Stephan Ewen [mailto:se...@apache.org] > *Enviado el:* lunes, 30 de enero de 2017 17:39 > *Para:* user@flink.apache.org > *Asunto:* Re: Calling external services/databases from DataStream API >

RE: Calling external services/databases from DataStream API

2017-01-30 Thread Diego Fustes Villadóniga
, 30 de enero de 2017 17:39 Para: user@flink.apache.org Asunto: Re: Calling external services/databases from DataStream API Hi! The Distributed cache would actually indeed be nice to add to the DataStream API. Since the runtime parts for that are all in place, the code would be mainly on the

Re: Calling external services/databases from DataStream API

2017-01-30 Thread Stephan Ewen
Hi! The Distributed cache would actually indeed be nice to add to the DataStream API. Since the runtime parts for that are all in place, the code would be mainly on the "client" side that sets up the JobGraph to be submitted and executed. For the problem of scaling this, there are two solutions t

Re: Calling external services/databases from DataStream API

2017-01-30 Thread Jonas
I have a similar usecase where I (for the purposes of this discussion) have a GeoIP Database that is not fully available from the start but will eventually be "full". The GeoIP tuples are coming in one after another. After ~4M tuples the GeoIP database is complete. I also need to do the same query