Re: allowed lateness on windowed join?

2017-02-04 Thread Saiph Kappa
reams (as the general case of a join) > and JoinedStreams. > > Cheers, > Aljoscha > > On Mon, 30 Jan 2017 at 17:38 Saiph Kappa wrote: > >> Hi all, >> >> Is it possible to specify allowed lateness for a window join like the >> following one: >>

allowed lateness on windowed join?

2017-01-30 Thread Saiph Kappa
Hi all, Is it possible to specify allowed lateness for a window join like the following one: val tweetsAndWarning = warningsPerStock.join(tweetsPerStock).where(_.symbol).equalTo(_.symbol) .window(SlidingEventTimeWindows.of(Time.of(windowDurationSec, TimeUnit.SECONDS), Time.of(windowDurationS

Order by which windows are processed on event time

2016-11-11 Thread Saiph Kappa
Hi, I have a streaming application based on event time. When I issue a watermark that will close more than 1 window (and trigger their processment), I can see that windows are computed sequentially (at least using local machine) and that the computing order is not defined. Can I change this behavi

Re: Why tuples are not ignored after watermark?

2016-09-15 Thread Saiph Kappa
stance of those operators emits the > watermark because only one of those parallel instances sees the element > with _3 == 9000. For the watermark to advance at an operator it needs to > advance in all upstream operations. > > Cheers, > Aljoscha > > On Fri, 9 Sep 2016 at

Why tuples are not ignored after watermark?

2016-09-09 Thread Saiph Kappa
Hi, I have a streaming (event time) application where I am receiving events with the same assigned timestamp. I receive 1 events in total on a window of 5 minutes, but I emit water mark when 9000 elements have been received. This watermark is 6 minutes after the assigned timestamps. My questio

How to count number of records received per second in processing time while using event time characteristic

2016-06-28 Thread Saiph Kappa
Hi, I have a flink streaming application and I want to count records received per second (as a way of measuring the throughput of my application). However, I am using the EventTime time characteristic, as follows: val env = StreamExecutionEnvironment.getExecutionEnvironment env.setStreamTimeChara

Convert Scala DataStream to Java DataStream

2016-04-05 Thread Saiph Kappa
Hi, I'm programming in scala and using some extra libraries made in Java. My question is: how can I easily convert "org.apache.flink.streaming.scala.DataStream" to "org.apache.flink.streaming.api.datastream.DataStream"? Thanks.

Re: Counting tuples within a window in Flink Stream

2016-02-26 Thread Saiph Kappa
eyed-data-streams > > On Fri, Feb 26, 2016 at 4:07 PM, Saiph Kappa > wrote: > >> Why the ".keyBy"? I don't want to count tuples by Key. I simply want to >> count all tuples that are contained in a window. >> >> On Fri, Feb 26, 2016 at 9:14 A

Re: Counting tuples within a window in Flink Stream

2016-02-26 Thread Saiph Kappa
(Time.seconds(10)).fold(0, new > FoldFunction, Integer>() { > @Override > public Integer fold(Integer integer, Tuple2 o) throws > Exception { > return integer + 1; > } > }); > > Cheers, > Till > ​ > > On Thu, Feb 25, 2016 at 7:58 PM, Saiph Kappa

Counting tuples within a window in Flink Stream

2016-02-25 Thread Saiph Kappa
Hi, In Flink Stream what's the best way of counting the number of tuples within a window of 10 seconds? Using a map-reduce task? Asking because in spark there is the method rawStream.countByWindow(Seconds(x)). Thanks.

Re: How to use all available task managers

2016-02-24 Thread Saiph Kappa
s not working... you might have it set to 1 > already? > > On Wed, Feb 24, 2016 at 3:41 PM, Saiph Kappa > wrote: > > I set "parallelism.default: 6" on flink-conf.yaml of all 6 machines, and > > still, my job only uses 1 task manager. Why? > > > > > &g

Re: How to use all available task managers

2016-02-24 Thread Saiph Kappa
te jobs which have not parallelism defined with a DOP of 6. > > Cheers, > Till > ​ > > On Wed, Feb 24, 2016 at 1:43 AM, Saiph Kappa > wrote: > >> Hi, >> >> I am running a flink stream application on a cluster with 6 slaves/task >> managers. I have set

How to use all available task managers

2016-02-23 Thread Saiph Kappa
Hi, I am running a flink stream application on a cluster with 6 slaves/task managers. I have set in flink-conf.yaml of every machine "parallelization.degree.default: 6". However, when I run my application it just uses one task slot and not all of them. Am I missing something? Thanks.

Re: How to increase akka heartbeat?

2016-02-20 Thread Saiph Kappa
gt; Hi, >> can you maybe (if you want also private) send me the full logs of the >> jobmanager? The messages you've posted here are logged at DEBUG level. They >> don't indicate an erroneous behavior of the system. >> >> On Fri, Feb 19, 2016

Re: How to increase akka heartbeat?

2016-02-19 Thread Saiph Kappa
These were the parameters that I set btw: akka.watch.heartbeat.interval: 100 akka.transport.heartbeat.interval: 1000 On Fri, Feb 19, 2016 at 7:43 PM, Saiph Kappa wrote: > I am not sure. > > For that particular machine I get messages like these: > « > myip:6123/user/jobmanager#

Re: How to increase akka heartbeat?

2016-02-19 Thread Saiph Kappa
use the client disconnects? > > For the different timeouts, check the configuration page: > https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/config.html > and search for "heartbeat". > > On Fri, Feb 19, 2016 at 8:04 PM, Saiph Kappa > wrote: > >

How to increase akka heartbeat?

2016-02-19 Thread Saiph Kappa
Hi, I have a Flink client application that launches jobs to remote clusters. However I'm getting my jobs cancelled: "18:25:29,650 WARN akka.remote.ReliableDeliverySupervisor- Association with remote system [akka.tcp://flink@127.0.0.1:52929] has failed, address is now gated

Re: Flink Stream: How to ship results through socket server

2016-01-19 Thread Saiph Kappa
t executes this SocketStream-source task. > > I guess, it would be better not to use "localhost", but start your > SocketWriter-Thread on a dedicated machine in the cluster, and connect > your SocketStream-source to this machine via its host name. > > -Matthias > >

Re: Flink Stream: collect in an array all records within a window

2016-01-19 Thread Saiph Kappa
rray[Byte]], > actual: SimpleStringSchema > > Can you maybe share your code? > > -Matthias > > On 01/19/2016 01:57 PM, Saiph Kappa wrote: > > It's DataStream[String]. So it seems that SimpleStringSchema cannot be > > used in writeToSocket regardless of the type of the

Flink Stream: How to ship results through socket server

2016-01-19 Thread Saiph Kappa
Hi, This is a simple example that I found using Flink Stream. I changed it so the flink client can be executed on a remote cluster, and so that it can open a socket server to ship its results for any other consumer machine. It seems to me that the socket server is not being open in the remote clus

Re: Flink Stream: collect in an array all records within a window

2016-01-19 Thread Saiph Kappa
tringSchema. > > If you have a different type, just implement a customized > SerializationSchema. > > -Matthias > > > On 01/19/2016 11:26 AM, Saiph Kappa wrote: > > When I use SimpleStringSchema I get the error: Type mismatch, expected: > > SerializationSchema[Stri

Re: Flink Stream: collect in an array all records within a window

2016-01-19 Thread Saiph Kappa
? s.writeToSocket(host, port.toInt, new SimpleStringSchema()) Thanks. On Tue, Jan 19, 2016 at 10:34 AM, Matthias J. Sax wrote: > There is SimpleStringSchema. > > -Matthias > > On 01/18/2016 11:21 PM, Saiph Kappa wrote: > > Hi Matthias, > > > > Thanks for your re

Re: Flink Stream: collect in an array all records within a window

2016-01-18 Thread Saiph Kappa
); > > } > > }); > > If you consume all those value via an sink, the sink will run an the > cluster. You can use .writeToSocket(...) as sink: > > https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#data-sinks > > -Matthias > &g

Flink Stream: collect in an array all records within a window

2016-01-18 Thread Saiph Kappa
Hi, After performing a windowAll() on a DataStream[String], is there any method to collect and return an array with all Strings within a window (similar to .collect in Spark). I basically want to ship all strings in a window to a remote server through a socket, and want to use the same socket con

Flink DataStream and KeyBy

2016-01-13 Thread Saiph Kappa
Hi, This line «stream.keyBy(0)» only works if stream is of type DataStream[Tuple] - and this Tuple is not a scala tuple but a flink tuple (why not to use scala Tuple?). Currently keyBy can be applied to anything (at least in scala) like DataStream[String] and DataStream[Array[String]]. Can anyone

How to sort tuples in DataStream

2016-01-11 Thread Saiph Kappa
Hi, I'm trying to do a simple application in Flink Stream to count the top N words on a window-basis, but I don't know how to sort the words by their frequency in Flink. In spark streaming, I would do something like this: « val isAscending = true stream.reduceByKeyAndWindow(reduceFunc, Seconds(10