Re: Best way to link static data to event data?

2019-09-27 Thread Sameer Wadkar
The main consideration in these type of scenarios is not the type of source function you use. The key point is how does the event operator get the slow moving master data and cache it. And then recover it if it fails and restarts again. It does not matter that the csv file does not change ofte

Re: Best way to link static data to event data?

2019-09-27 Thread John Smith
It's a fairly small static file that may update once in a blue moon lol But I'm hopping to use existing functions. Why can't I just use CSV to table source? Why should I have to now either write my own CSV parser or look for 3rd party, then what put in a Java Map and lookup that map? I'm finding F

Re: Best way to link static data to event data?

2019-09-27 Thread Sameer W
Connected Streams is one option. But may be an overkill in your scenario if your CSV does not refresh. If your CSV is small enough (number of records wise), you could parse it and load it into an object (serializable) and pass it to the constructor of the operator where you will be streaming the da

Re: Best way to link static data to event data?

2019-09-27 Thread John Smith
I don't think I need state for this... I need to load a CSV. I'm guessing as a table and then filter my events parse the number, transform the event into geolocation data and sink that downstream data source. So I'm guessing i need a CSV source and my Kafka source and somehow join those transform

Re: Best way to link static data to event data?

2019-09-27 Thread Oytun Tez
Hi, You should look broadcast state pattern in Flink docs. --- Oytun Tez *M O T A W O R D* The World's Fastest Human Translation Platform. oy...@motaword.com — www.motaword.com On Fri, Sep 27, 2019 at 2:42 PM John Smith wrote: > Using 1.8 > > I have a list of phone area codes, cities and the

Best way to link static data to event data?

2019-09-27 Thread John Smith
Using 1.8 I have a list of phone area codes, cities and their geo location in CSV file. And my events from Kafka contain phone numbers. I want to parse the phone number get it's area code and then associate the phone number to a city, geo location and as well count how many numbers are in that ci

Re: Flink- Heap Space running out

2019-09-27 Thread Nishant Gupta
Appoligies correction done to previous email Hi Fabian and Mike *flink-conf.yaml [In a 3 node cluster having 120 GB memory each and 3 TB hard disk ]* jobmanager.heap.size: 50120m taskmanager.heap.size: 50120m *With Idle state retention having below configuration (Same heap space issue)

Increasing trend for state size of keyed stream using ProcessWindowFunction with ProcessingTimeSessionWindows

2019-09-27 Thread Oliwer Kostera
Hi all, I'm using ProcessWindowFunction in a keyed stream with the following definition: final SingleOutputStreamOperator processWindowFunctionStream = keyedStream.window(ProcessingTimeSessionWindows.withGap(Time.milliseconds(100))) .process(new CustomProcessWindowFunction(

Flink ColumnStats

2019-09-27 Thread Flavio Pompermaier
Hi all, I've seen that recently there was an ongoing effort about Flink ColumnStats but I can't find a Flink job that computes Flink table stats, I found only a code that does the conversion from Hive catalog. Is there any Flink utility I can call to compute them on a Table? We've tried to impleme

Re: Flink- Heap Space running out

2019-09-27 Thread Nishant Gupta
Hi Fabian and Mike *flink-conf.yaml [In a 3 node cluster having 120 GB memory each and 3 TB hard disk ]* jobmanager.heap.size: 50120m taskmanager.heap.size: 50120m *With Idle state retention having below configuration (Same heap space issue) * *execution:* planner: old type: streaming ti

ClassNotFoundException when submitting a job

2019-09-27 Thread 163
Hi Guys, Flink version is 1.9.0 and built against HDP. I got the following exceptions when submitting a job using Hadoop input to read sequence file in hdfs. Thanks for your help! Qi The program finished with the following exceptio