Re: Record has Long.MIN_VALUE timestamp (= no timestamp marker). Is the time characteristic set to 'ProcessingTime', or did you forget to call 'DataStream.assignTimestampsAndWatermarks(...)'?

2016-07-07 Thread David Olsen
Changing TimeCharacteristic to EventTime the flink still throws that runtime exception error. Is `env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)` the correct way to set that feature? Thanks. java.lang.RuntimeException: Record has Long.MIN_VALUE timestamp (= no timestamp marker). Is

Flink is Unstable when TM > 1

2016-07-07 Thread Saliya Ekanayake
Hi, I've been trying to run the provided KMeans example on a 16 node cluster. I was testing with 2 Task Managers (TM) per node because each node has 2 sockets (CPUs). A socket contains 12 cores, so I've set the number of slots per TM as 12.The total parallelism is 384 (12 slots x 2 TMs x 16 nodes)

Re: Flink stops deploying jobs on normal iteration

2016-07-07 Thread Nguyen Xuan Truong
Hi Vasia, You are right about the topDistance, it is the dataset which has only 1 double value. I already looked at the Aggregator and I can only get the value of an aggregator in the next iteration. However, my problem is a bit tricky because the topDistance controls how the newSeeds is calculate

Re: Adding and removing operations after execute

2016-07-07 Thread Jamie Grier
Hi Adam, Another way to do this, depending on your exact requirements, could be to consume a second stream that essentially "configures" the operators that make up the Flink job thus dynamically altering the behavior of the job at runtime. Whether or not this approach is feasible really depends o

Re: Parameters to Control Intra-node Parallelism

2016-07-07 Thread Saliya Ekanayake
I see two logs (attached), but there's only 1 TaskManger process. Also, the Web console says it can find only 1 TM. However, I see this part in JM log, which shows there was a second TM at one point, but it was unregistered. Any thoughts? -- - Registered TaskManager at j-

Re: Adding and removing operations after execute

2016-07-07 Thread Kostas Kloudas
Hi, The best way to do so is to use a Flink feature called savepoints. You can find more here: https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/savepoints.html In a nutshell, savepoint

Adding and removing operations after execute

2016-07-07 Thread adamlehenbauer
Hi, I'm exploring using Flink to replace an in-house micro-batch application. Many of the features and concepts are perfect for what I need, but the biggest gap is that there doesn't seem to be a way to add new operations at runtime after execute(). What is the preferred approach for adding new o

Re: Cep NFA formal declaration

2016-07-07 Thread Till Rohrmann
Hi Bence, Flink's CEP implementation is mainly based on this paper [1]. It mainly describes how the shared buffer is used to store efficiently the internal state of the NFA. The translation of the Pattern API into the NFA happens in the NFACompiler.compileFactory method. For every pattern entry,

Cep NFA formal declaration

2016-07-07 Thread Bence Ágocsi Kiss
Hi, I'm trying to use the CEP library, and for that, i've read the tutorial on the site and I tried the given examples. However, I would need a more accurate description to understand the behavior. I haven't found any formal description, about how the NFA works, and how they are built from the pat

Re: Streaming Exception error message Explanation

2016-07-07 Thread Márton Balassi
Hi Subash, Unfortunately you can not reference a DataStream (loop) within a Flink operator. To handle both casual and feedback data I suggest using CoOperators. Have a look at this mockup I did some time ago for a conceptually similar problem. [1] [1] https://github.com/streamline-eu/ML-Pipelines

Re: Accessing external DB inside RichFlatMap Function

2016-07-07 Thread Kostas Kloudas
Yes, Chesnay is right. You can open and close the connection in the open() and close() methods of your RichFlatMapFunction. Kostas > On Jul 7, 2016, at 11:03 AM, Chesnay Schepler wrote: > > Couldn't he do the same thing in his RichFlatMap? > > open the db connection in open(), close it in clo

Re: Accessing external DB inside RichFlatMap Function

2016-07-07 Thread Chesnay Schepler
Couldn't he do the same thing in his RichFlatMap? open the db connection in open(), close it in close(), do stuff within these calls. On 07.07.2016 10:58, Kostas Kloudas wrote: Hi Simon, If your job reads or writes to a DB, I would suggest to use one of the already existing Flink sources or

Re: Accessing external DB inside RichFlatMap Function

2016-07-07 Thread Kostas Kloudas
Hi Simon, If your job reads or writes to a DB, I would suggest to use one of the already existing Flink sources or sinks, as this allows for efficient connection handling and managing. If you want to write intermediate data to a DB from an operator, then I suppose that you should implement you

Re: Flink and Calcite

2016-07-07 Thread Stephan Ewen
Here is also the original design doc: https://docs.google.com/document/d/1TLayJNOTBle_-m1rQfgA6Ouj1oYsfqRjPcp1h2TVqdI On Wed, Jul 6, 2016 at 4:00 PM, Márton Balassi wrote: > Hey Radu, > > It is in master, you find the related module under > flink-libraries/flink-table in the directory structure.

Accessing external DB inside RichFlatMap Function

2016-07-07 Thread simon peyer
Hi guys Is there a easy way to handle external DB connections inside a RichFlatMap Function? --Thanks Simon

Re: Late arriving events

2016-07-07 Thread Kostas Kloudas
Hi Chen, Just to add in the previous discussion that we are currently discussing possible improvements of windowing mechanism/semantics here: https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit#heading=h.psfzjlv68tp

Re: Parameters to Control Intra-node Parallelism

2016-07-07 Thread Ufuk Celebi
No that should suffice. Can you check whether there are any task manager logs for the second TM on that machine (taskmanager-X-j-011.log where X is the TM number)? If yes, the task manager process does start up and there is another problem. If not, the task managers seems not to start even. – Ufuk

Re: Graph with stream of updates

2016-07-07 Thread Vasiliki Kalavri
Hi Milindu, as far as I know, there is currently no way to query the state from outside of Flink. That's a feature in the roadmap, but I'm not sure when it will be provided. Maybe someone else can give us an update. For now, you can either implement your queries inside you streaming job and output

Re: Flink stops deploying jobs on normal iteration

2016-07-07 Thread Vasiliki Kalavri
Hi Truong, I guess the problem is that you want to use topDistance as a broadcast set inside the iteration? If I understand correctly this is a dataset with a single value, right? Could you maybe compute it with an aggregator instead? -Vasia. On 5 July 2016 at 21:48, Nguyen Xuan Truong wrote: