The main consideration in these type of scenarios is not the type of source function you use. The key point is how does the event operator get the slow moving master data and cache it. And then recover it if it fails and restarts again.
It does not matter that the csv file does not change often. It is possible that the event operator may fail and restart. The csv data needs to made available to it again. In that scenario the initial suggestion I made to pass the csv data in the constructor is not adequate by itself. You need to store it in the operator state which allows it to recover it when it restarts on failure. As long as the above takes place you have resiliency and you can use any suitable method or source. I have not used Table source as much but connected streams and operator state has worked out for me in similar scenarios. Sameer Sent from my iPhone > On Sep 27, 2019, at 4:38 PM, John Smith <java.dev....@gmail.com> wrote: > > It's a fairly small static file that may update once in a blue moon lol But > I'm hopping to use existing functions. Why can't I just use CSV to table > source? > > Why should I have to now either write my own CSV parser or look for 3rd > party, then what put in a Java Map and lookup that map? I'm finding Flink to > be a bit of death by 1000 paper cuts lol > > if i put the CSV in a table I can then use it to join across it with the > event no? > >> On Fri, 27 Sep 2019 at 16:25, Sameer W <sam...@axiomine.com> wrote: >> Connected Streams is one option. But may be an overkill in your scenario if >> your CSV does not refresh. If your CSV is small enough (number of records >> wise), you could parse it and load it into an object (serializable) and pass >> it to the constructor of the operator where you will be streaming the data. >> >> If the CSV can be made available via a shared network folder (or S3 in case >> of AWS) you could also read it in the open function (if you use Rich >> versions of the operator). >> >> The real problem I guess is how frequently does the CSV update. If you want >> the updates to propagate in near real time (or on schedule) the option 1 ( >> parse in driver and send it via constructor does not work). Also in the >> second option you need to be responsible for refreshing the file read from >> the shared folder. >> >> In that case use Connected Streams where the stream reading in the file (the >> other stream reads the events) periodically re-reads the file and sends it >> down the stream. The refresh interval is your tolerance of stale data in the >> CSV. >> >>> On Fri, Sep 27, 2019 at 3:49 PM John Smith <java.dev....@gmail.com> wrote: >>> I don't think I need state for this... >>> >>> I need to load a CSV. I'm guessing as a table and then filter my events >>> parse the number, transform the event into geolocation data and sink that >>> downstream data source. >>> >>> So I'm guessing i need a CSV source and my Kafka source and somehow join >>> those transform the event... >>> >>>> On Fri, 27 Sep 2019 at 14:43, Oytun Tez <oy...@motaword.com> wrote: >>>> Hi, >>>> >>>> You should look broadcast state pattern in Flink docs. >>>> >>>> --- >>>> Oytun Tez >>>> >>>> M O T A W O R D >>>> The World's Fastest Human Translation Platform. >>>> oy...@motaword.com — www.motaword.com >>>> >>>> >>>>> On Fri, Sep 27, 2019 at 2:42 PM John Smith <java.dev....@gmail.com> wrote: >>>>> Using 1.8 >>>>> >>>>> I have a list of phone area codes, cities and their geo location in CSV >>>>> file. And my events from Kafka contain phone numbers. >>>>> >>>>> I want to parse the phone number get it's area code and then associate >>>>> the phone number to a city, geo location and as well count how many >>>>> numbers are in that city/geo location.