Flink and CVE

2019-04-12 Thread TechnoMage
In preparation for putting a flink project in production we wanted to find the CVE entries for flink applicable to the last few releases, but they appear to have all Apache projects in one big list. Any help in narrowing it down to just Flink entries would be appreciated. Michael

Table exception

2018-11-29 Thread TechnoMage
I have a simple test for looking at Flink SQL and hit an exception reported as a bug. I wonder though if it is a missing dependency. Michael Error in test org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmaster.JobResult.t

Flink SQL questions

2018-11-01 Thread TechnoMage
I am trying to get a minimal Flink SQL program going to test it out. I have a StreamExecutionEnvironment and from that created a StreamTableEnvironment. The docs indicate there should be a fromDataStream method to create a Table, but none appears to exist according to Eclipse. The method regi

Question about serialization and performance

2018-10-31 Thread TechnoMage
In running tests of flink jobs we are seeing some that yield really good performance (2.5M records in minutes) and others that are struggleing to get past 200k records processed. In the later case there are a large number of keys, and each key gets state in the form of 3 value states. One hold

Re: Kafka connector issue

2018-08-16 Thread TechnoMage
It looks like it is some issue with backpressure as the same behavior happens with the client library as a custom source. Michael > On Aug 16, 2018, at 6:59 PM, TechnoMage wrote: > > I have seen this in the past and running into it again. > > I have a kafka consumer that is

Kafka connector issue

2018-08-16 Thread TechnoMage
I have seen this in the past and running into it again. I have a kafka consumer that is not getting all the records from the topic. Kafka conforms there are 300k messages in each partition, and flink only sees a total of 8000 records in the source. Kafka is 2.0, flink is 1.4.2 connector is Fli

Windows support

2018-06-13 Thread TechnoMage
Has any work been done on support for Windows in 1.5? I tried the scripts in 1.4 with windows 10 with no luck. Michael

Re: Conceptual question

2018-06-08 Thread TechnoMage
6.0. >>>> >>>> Before that, you need to manually handle the state migration in your >>>> operator’s open method. Lets assume that your OperatorV1 has a state field >>>> “stateV1”. Your OperatorV2 defines field “stateV2”, which is incompatible >>&

Conceptual question

2018-06-06 Thread TechnoMage
We are still pretty new to Flink and I have a conceptual / DevOps question. When a job is modified and we want to deploy the new version, what is the preferred method? Our jobs have a lot of keyed state. If we use snapshots we have old state that may no longer apply to the new pipeline. If we

Re: MapWithState for two keyed stream

2018-05-09 Thread TechnoMage
CoRichFlatMap or union will work. If you need to know which is historical the flatmap will be better as you can tell which stream it cam from. But, be careful about reading historical data and trying to process it all before processing the new data. That can lead to buffering a lot of incomin

Re: Streaming and batch jobs together

2018-05-08 Thread TechnoMage
If you use a KeyedStream you can group records by key (city) and then use a RichFlatMap to aggregate state in a MapState or ListState per key. You can then have that operator publish the updated results as a new aggregated record, or send it to a database or such as you see fit. Michael > On

Re: Updating external service and then processing response

2018-04-29 Thread TechnoMage
If the external web service call does not modify the state of that external system all the approaches you list are probably ok. If there is external state modification then you want to ensure on restart the Flink job does not resend requests to that service or that it can handle duplicate reque

Re: Flink flatMap to pass a tuple and get multiple tuple

2018-04-27 Thread TechnoMage
Any itterable of Tuples will work for a for loop: List, Set, etc. Michael > On Apr 27, 2018, at 10:47 AM, Soheil Pourbafrani > wrote: > > Thanks, what did you consider the return type of parse method? Arraylist of > tuples? > > On Friday, April 27, 2018, Te

Re: Flink flatMap to pass a tuple and get multiple tuple

2018-04-27 Thread TechnoMage
it would look more like: for (Tuple2<> t2 : parse(t.f3) { collector.collect(t2); } Michael > On Apr 27, 2018, at 9:08 AM, Soheil Pourbafrani wrote: > > Hi, I want to use flatMap to pass to function namely 'parse' a tuple and it > will return multiple tuple, that each should be a recor

Re: Measure End-to-End latency/delay for each record

2018-04-26 Thread TechnoMage
Kafka for both? > > -- > Dhruv Kumar > PhD Candidate > Department of Computer Science and Engineering > University of Minnesota > www.dhruvkumar.me <http://www.dhruvkumar.me/> > >> On Apr 26, 2018, at 16:37, TechnoMage > <mailto:mla...@technomage.com>> wr

Re: Measure End-to-End latency/delay for each record

2018-04-26 Thread TechnoMage
e real world > streaming systems. > > -- > Dhruv Kumar > PhD Candidate > Department of Computer Science and Engineering > University of Minnesota > www.dhruvkumar.me <http://www.dhruvkumar.me/> > >> On Apr 26, 2018, at 13:26, Techno

Re: Measure End-to-End latency/delay for each record

2018-04-26 Thread TechnoMage
In a single machine system this may work ok. In a multi-machine system this is not as reliable as the time skew from one machine (source) to another (sink) can impact the measurements. This also does not account for back presure on the source. We are using an external process to in parallel r

Re: Setting the parallelism in a cluster of machines properly

2018-04-26 Thread TechnoMage
icious the par = 8 > > Jps? Meaning? > > Oh I should mention that the JobManager node is also a TaskManager. > > Best, > Max > >> On 27 Apr 2018, at 01:39, TechnoMage > <mailto:mla...@technomage.com>> wrote: >> >> Check that you have slaves

Re: Setting the parallelism in a cluster of machines properly

2018-04-26 Thread TechnoMage
> I have set numOfSlotsPerTaskManager = 8. Which is reasonable as each has 8 > cpus. > > Best, > Makis > > On Fri, 27 Apr 2018, 01:26 TechnoMage, <mailto:mla...@technomage.com>> wrote: > You need to verify your configs are correct. Check that the local machine > sees all the

Re: Setting the parallelism in a cluster of machines properly

2018-04-26 Thread TechnoMage
You need to verify your configs are correct. Check that the local machine sees all the task managers, that is the most likely reason it will reject a higher parallelism. I use a java program to submit to a 3 node 18 slot cluster without issue on a job with 18 parallelism. I have not used the

Re: Setting the parallelism in a cluster of machines properly

2018-04-26 Thread TechnoMage
Go to the web UI and verify all 136 TaskManagers are visible in the machine you are submitting the job from. I have encountered issues where not all TaskManagers start, or you may not have all 17 configured properly to be one cluster vs 17 clusters of 8. Michael > On Apr 26, 2018, at 10:48 AM

Re: Why assignTimestampsAndWatermarks parallelism same as map,it will not fired?

2018-04-25 Thread TechnoMage
If you are using keyed messages in Kafka, or keyed streams in flink, then only partitions that get hashed to the proper value will get data. If not keyed messages, then yes they should all get data. Michael > On Apr 25, 2018, at 8:25 PM, 潘 功森 wrote: > > The event is running all the time in o

Re: data enrichment with SQL use case

2018-04-25 Thread TechnoMage
I agree in the general case you need to operate on the stream data based on the metadata you have. The side input feature coming some day may help you, in that it would give you a means to receive inputs out of band. But, given changing metadata and changing stream data I am not sure this is a

Re: data enrichment with SQL use case

2018-04-25 Thread TechnoMage
Using a flat map function, you can always buffer the non-meta data stream in the operator state until the metadata is aggregated, and then process any collected data. It would require a RichFlatMap to hold data. Michael > On Apr 25, 2018, at 1:20 PM, Ken Krugler wrote: > > Hi Fabian, > >> O

Re: Weird Kafka Connector issue

2018-04-25 Thread TechnoMage
019/flink-kafka-consumer-groupid-not-working > > <https://stackoverflow.com/questions/38639019/flink-kafka-consumer-groupid-not-working> > > From: TechnoMage [mailto:mla...@technomage.com] > Sent: Wednesday, April 25, 2018 8:52 AM > To: Tzu-Li (Gordon) Tai > Cc: user

Re: Weird Kafka Connector issue

2018-04-25 Thread TechnoMage
Just in case it is a metrics bug, I will add a step to do my own counting in the Flink job. Michael > On Apr 25, 2018, at 9:52 AM, TechnoMage wrote: > > I have another java program reading the topic to monitor the test. It > receives 60,000 records on the “travel” topic, whi

Re: Weird Kafka Connector issue

2018-04-25 Thread TechnoMage
of your > test are incorrect? > Or are you assuming that it is incorrect based on the weird metric numbers > shown on the web ui? > > Cheers, > Gordon > > On 25 April 2018 at 6:13:07 AM, TechnoMage (mla...@technomage.com > <mailto:mla...@technomage.com>) wrot

Weird Kafka Connector issue

2018-04-23 Thread TechnoMage
I have been using the kafka connector sucessfully for a while now. But, am getting weird results in one case. I have a test that submits 3 streams to kafka topics, and monitors them on a separate process. The flink job has a source for each topic, and one such is fed to 3 separate map functio

Re: Kafka 0.11

2018-04-22 Thread TechnoMage
The different versions of the connector correspond to different versions of Kafka. If you are using Kafka 0.8 use 0.8 connector, etc. Versions of the connector after 0.10 support exactly once delivery, versions prior to that only offer at least once delivery. Kafka supports distributed proces

Re: Flink/Kafka POC performance issue

2018-04-17 Thread TechnoMage
the kafka messages in about 34 min while the flink job has yet to read all the kafka messages 1hr40min later. Michael > On Apr 17, 2018, at 12:58 PM, TechnoMage wrote: > > Also, I note some messages in the log about my java class not being a valid > POJO because it is missing acc

Re: Flink/Kafka POC performance issue

2018-04-17 Thread TechnoMage
Also, I note some messages in the log about my java class not being a valid POJO because it is missing accessors for a field. Would this impact performance significantly? Michael > On Apr 17, 2018, at 12:54 PM, TechnoMage wrote: > > No checkpoints are active. > I will try t

Re: Flink/Kafka POC performance issue

2018-04-17 Thread TechnoMage
eap and offheap (rocksdb). > - Do you have some expensive types (JSON, etc)? Try activating object reuse > (which avoids some extra defensive copies) > > On Tue, Apr 17, 2018 at 5:50 PM, TechnoMage <mailto:mla...@technomage.com>> wrote: > Memory use is steady through

Re: Flink/Kafka POC performance issue

2018-04-17 Thread TechnoMage
>> to disk like regular Linux, then that might be triggered if your JVM heap is >> bigger than can be handled within the available RAM. >> >> On Tue, Apr 17, 2018 at 9:26 AM, TechnoMage > <mailto:mla...@technomage.com>> wrote: >> I am doing a short Proo

Flink & Kafka multi-node config

2018-04-16 Thread TechnoMage
If I use defaults for the most part but configure flink to have parallelism 5 and kafka to have 5 brokers (one of each on 5 nodes) will the connector and kafka be smart enough to use the kafka partition on the same node as the flink task manager for the 5 partitions? Do I need to explicitly ass

Flink/Kafka POC performance issue

2018-04-16 Thread TechnoMage
I am doing a short Proof of Concept for using Flink and Kafka in our product. On my laptop I can process 10M inputs in about 90 min. On 2 different EC2 instances (m4.xlarge and m5.xlarge both 4core 16GB ram and ssd storage) I see the process hit a wall around 50min into the test and short of 7

Re: Question about parallelism

2018-04-16 Thread TechnoMage
ested the behavior on Flink 1.4.2 and setting the parallelism in the > flink-conf.yaml of the client was working correctly in a simple local setup. > > If this doesn't solve your problem, we'd need a bit more information about > the job submission and setup. > > Bes

Re: Question about parallelism

2018-04-16 Thread TechnoMage
ür the parallelism of a job or >> operator. The scheduler only cares about the number of slots. >> >> How did you set the default parallelism? In the config or in the program / >> StreamExecutionEnvironment? >> >> Best, Fabian >> >> >> T

Re: Any metrics to get the shuffled and intermediate data in flink

2018-04-13 Thread TechnoMage
If you look at the web UI for flink it will tell you the bytes received and sent for each stage of a job. I have not seen any similar metric for persisted state per stage, which would be nice to have as well. Michael > On Apr 13, 2018, at 6:37 AM, Darshan Singh wrote: > > Hi > > Is there an

Question about parallelism

2018-04-12 Thread TechnoMage
I am pretty new to flink. I have a flink job that has 10 transforms (mostly CoFlatMap with some simple filters and key extractrs as well. I have the config set for 6 slots and default parallelism of 6, but all my stages show paralellism of 1. Is that because there is only one task manager? S

Re: State management and heap usage

2018-04-12 Thread TechnoMage
ter/ops/state/state_backends.html#configuring-a-state-backend > > <https://ci.apache.org/projects/flink/flink-docs-master/ops/state/state_backends.html#configuring-a-state-backend> > > On Wed, Apr 11, 2018 at 11:04 PM, TechnoMage <mailto:mla...@technomage.com>> wrote: > I a

Re: Is Flink able to do real time stock market analysis?

2018-04-12 Thread TechnoMage
e code hard to read. I’m thinking about > watermark, but not sure how to do this. > > > -- > Thanks > Ivan > From: TechnoMage > Date: Thursday, 12 April 2018 at 3:21 AM > To: Ivan Wang > Cc: "user@flink.apache.org" > Subject: Re: Is Flink able to do

State management and heap usage

2018-04-11 Thread TechnoMage
I am pretty new to flink and have an initial streaming job working both locally and remotely. But, both ways if the data volume is too high it runs out of heap. I am using RichMapFunction to process multiple streams of data. I assumed Flink would manage keeping state in ram when possible, and

Re: Is Flink able to do real time stock market analysis?

2018-04-11 Thread TechnoMage
I am new to Flink so others may have more complete answer or correct me. If you are counting the events in a tumbling window you will get output at the end of each tumbling window, so a running count of events/window. It sounds like you want to compare the raw data to the smoothed data? You ca

Re: java.lang.Exception: TaskManager was lost/killed

2018-04-08 Thread TechnoMage
I have seen this when my task manager ran out of RAM. Increase the heap size. flink-conf.yaml: taskmanager.heap.mb jobmanager.heap.mb Michael > On Apr 8, 2018, at 2:36 AM, 王凯 wrote: > > > hi all, recently, i found a problem,it runs well when start. But after long > run,the exception displa

Volume question

2018-04-07 Thread TechnoMage
I have a use case that I wonder if Flink handles well: 1) 5M+ keys in a KeyedStream 2) Using RichFlatMap to track data on each key Will Flink spread one operator’s partitions over multiple machines/taskmanager/jobmanager? Michael

KeyedSream question

2018-04-04 Thread TechnoMage
I am new to Flink and trying to understand the keyBy and KeyedStream. From the short doc description I expected it to partition the data such that the following flatMap would only see elements with the same key. That events with different keys would be presented to different instances of FlatM