Create window before the first event

2016-07-11 Thread Xiang Zhang
Hi, I am trying to have a trigger fires every 5 mins, even when sometimes no event comes (just output default for empty window). The closest solution I got to work is this: datastream.windowAll(GlobalWindows.create()) .trigger(ContinuousProcessingTimeTrigger.of(Time.minutes(5)))

Re: Modifying start-cluster scripts to efficiently spawn multiple TMs

2016-07-11 Thread Saliya Ekanayake
Yes, I've password-less SSH to the job manager node. On Mon, Jul 11, 2016 at 4:53 PM, Greg Hogan wrote: > pdsh is only used for starting taskmanagers. How did you work around this? > You are able to passwordless-ssh to the jobmanager? > > The error looks to be from config.sh:318 in rotateLogFile

Re: Modifying start-cluster scripts to efficiently spawn multiple TMs

2016-07-11 Thread Greg Hogan
pdsh is only used for starting taskmanagers. How did you work around this? You are able to passwordless-ssh to the jobmanager? The error looks to be from config.sh:318 in rotateLogFile. The way we generate the taskmanager index assumes that taskmanagers are started sequentially (flink-daemon.sh:10

Re: ContinuousProcessingTimeTrigger on empty

2016-07-11 Thread Xiang Zhang
Hi Kostas, Yes, so I tried GlobalWindows. Is it possible to trigger every 5 mins on GlobalWindows? From the comments in the source for ContinuousProcessingTimeTrigger, it says: * A {@link Trigger} that continuously fires based on a given time interval as measured by * the clock of the machine on

Re: Modifying start-cluster scripts to efficiently spawn multiple TMs

2016-07-11 Thread Saliya Ekanayake
Looking at what happens with pdsh, there are two things that go wrong. 1. pdsh is installed in a node other than where the job manager would run, so invoking *start-cluster *from there does not spawn a job manager. Only if I do start-cluster from the node I specify as the job manager's node that i

Re: ContinuousProcessingTimeTrigger on empty

2016-07-11 Thread Kostas Kloudas
Hi Xiang, Currently this is not supported by the trigger provided by Flink, as a window with no data, is a non-existing window for Flink. What you could do is emit periodically dummy elements from your source (so that all windows have at least one element) and make sure that your windowing func

ContinuousProcessingTimeTrigger on empty

2016-07-11 Thread Xiang Zhang
Hi, I want to have a trigger fires every 5 seconds in processing time even when no event comes. I tried datastream.windowAll(GlobalWindows.create()) .trigger(ContinuousProcessingTimeTrigger.of(Time.seconds(5))) .apply { MY_APPLY_FUNCTION} However, ContinuousProce

Re: Modifying start-cluster scripts to efficiently spawn multiple TMs

2016-07-11 Thread Saliya Ekanayake
I meant, I'll check when current jobs are done and will let you know. On Mon, Jul 11, 2016 at 12:19 PM, Saliya Ekanayake wrote: > I am running some jobs now. I'll stop and restart using pdsh to see what > was the issue again > > On Mon, Jul 11, 2016 at 12:15 PM, Greg Hogan wrote: > >> I'd defin

Re: Modifying start-cluster scripts to efficiently spawn multiple TMs

2016-07-11 Thread Saliya Ekanayake
I am running some jobs now. I'll stop and restart using pdsh to see what was the issue again On Mon, Jul 11, 2016 at 12:15 PM, Greg Hogan wrote: > I'd definitely be interested to hear any insight into what failed when > starting the taskmanagers with pdsh. Did the command fail, or fallback to >

Re: Modifying start-cluster scripts to efficiently spawn multiple TMs

2016-07-11 Thread Greg Hogan
I'd definitely be interested to hear any insight into what failed when starting the taskmanagers with pdsh. Did the command fail, or fallback to standard ssh, a parse error on the slaves file? I'm wondering if we need to escape PDSH_SSH_ARGS_APPEND=$FLINK_SSH_OPTS as PDSH_SSH_ARGS_APPEND="${FL

Re: Parameters to Control Intra-node Parallelism

2016-07-11 Thread Saliya Ekanayake
Thank you Greg, I'll check if this was the cause for my TMs to disappear. On Mon, Jul 11, 2016 at 11:34 AM, Greg Hogan wrote: > The OOM killer doesn't give warning so you'll need to call dmesg or look > in /var/log/messages or similar. The following reports that Debian flavors > may use /var/log

Re: Parameters to Control Intra-node Parallelism

2016-07-11 Thread Greg Hogan
The OOM killer doesn't give warning so you'll need to call dmesg or look in /var/log/messages or similar. The following reports that Debian flavors may use /var/log/syslog. http://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killer On Sun, Jul 10, 2016 at 11:55

Re: Dynamic partitioning for stream output

2016-07-11 Thread Josh
Hi guys, I've been working on this feature as I needed something similar. Have a look at my issue here https://issues.apache.org/jira/browse/FLINK-4190 and changes here https://github.com/joshfg/flink/tree/flink-4190 The changes follow Kostas's suggestion in this thread. Thanks, Josh On Thu, Ma

Re: Flink Completed Jobs only has 5 entries

2016-07-11 Thread Saliya Ekanayake
Thank you, Ufuk! On Mon, Jul 11, 2016 at 10:46 AM, Ufuk Celebi wrote: > Yes, via jobmanager.web.history > ( > https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html#jobmanager-web-frontend > ) > > On Mon, Jul 11, 2016 at 4:45 PM, Saliya Ekanayake > wrote: > > Hi, > > > >

Re: Flink Completed Jobs only has 5 entries

2016-07-11 Thread Ufuk Celebi
Yes, via jobmanager.web.history (https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html#jobmanager-web-frontend) On Mon, Jul 11, 2016 at 4:45 PM, Saliya Ekanayake wrote: > Hi, > > It seems by default the completed job list only shows 5 entries. Is there a > way to increase

Flink Completed Jobs only has 5 entries

2016-07-11 Thread Saliya Ekanayake
Hi, It seems by default the completed job list only shows 5 entries. Is there a way to increase this? Thank you, saliya -- Saliya Ekanayake Ph.D. Candidate | Research Assistant School of Informatics and Computing | Digital Science Center Indiana University, Bloomington

Re: sampling function

2016-07-11 Thread Le Quoc Do
Hi all, Thank you all for your answers. By the way, I also recognized that Flink doesn't support "stratified sampling" function (only simple random sampling) for DataSet. It would be nice if someone can create a Jira for it, and assign the task to me so that I can work for it. Thank you, Do On

Re: The two inputs have different execution contexts.

2016-07-11 Thread Alieh Saeedi
HiI was joining two datasets which were from two different ExecutionEnviornment. It was my mistake. Thanks anyway. Best,Alieh On Monday, 11 July 2016, 11:33, Kostas Kloudas wrote: Hi Alieh, Could you share you code so that we can have a look?From the information you provide we cannot

Re: The two inputs have different execution contexts.

2016-07-11 Thread Kostas Kloudas
No problem Alieh! Kostas > On Jul 11, 2016, at 11:46 AM, Alieh Saeedi wrote: > > Hi > I was joining two datasets which were from two different > ExecutionEnviornment. It was my mistake. Thanks anyway. > > Best, > Alieh > > > On Monday, 11 July 2016, 11:33, Kostas Kloudas > wrote: > > >

Re: sampling function

2016-07-11 Thread Vasiliki Kalavri
Hi Do, Paris and Martha worked on sampling techniques for data streams on Flink last year. If you want to implement your own samplers, you might find Martha's master thesis helpful [1]. -Vasia. [1]: http://kth.diva-portal.org/smash/get/diva2:910695/FULLTEXT01.pdf On 11 July 2016 at 11:31, Kosta

Re: The two inputs have different execution contexts.

2016-07-11 Thread Kostas Kloudas
Hi Alieh, Could you share you code so that we can have a look? From the information you provide we cannot help. Thanks, Kostas > On Jul 10, 2016, at 3:13 PM, Alieh Saeedi wrote: > > I can not join or coGroup two tuple2 datasets of the same tome. The error is > java.lang.IllegalArgumentExcept

Re: sampling function

2016-07-11 Thread Kostas Kloudas
Hi Do, In DataStream you can always implement your own sampling function, hopefully without too much effort. Adding such functionality it to the API could be a good idea. But given that in sampling there is no “one-size-fits-all” solution (as not every use case needs random sampling and not al

dynamic streams and patterns

2016-07-11 Thread Claudia Wegmann
Hey everyone, I'm quite new to Apache Flink. I'm trying to build a system with Flink and wanted to hear your opinion and whether the proposed architecture is even possible with Flink. The environment for the system will be a microservice architecture handling messaging via async events. I want

Re: Extract type information from SortedMap

2016-07-11 Thread Timo Walther
Hi Yukun, I think the problem of the input type inference is that SortedMap is a GenericType and not a Flink native type (like Tuple or POJO). This case is not supported at the moment. You can create an issue if you like, maybe there is a way to support this special type inference case. Timo