Need some help to understand the cause of the error

2016-02-25 Thread Nirmalya Sengupta
Hello Flinksters, I am trying to use Flinkspector in a Scala code snippet of mine and Flink is complaining. The code is here: --- case class Reading(field1:String,field2:String,field3:Int)

suggestion for Quickstart

2016-02-25 Thread Tara Athan
On https://ci.apache.org/projects/flink/flink-docs-master/quickstart/setup_quickstart.html some of the instructions have been updated to 1.0-SNAPSHOT, but not all. The download link goes to http://www.apache.org/dyn/closer.cgi/flink/flink-0.9.1/flink-0.9.1-bin-hadoop1.tgz and all the links t

Re: Watermarks with repartition

2016-02-25 Thread Zach Cox
I think I found the information I was looking for: RecordWriter broadcasts each emitted watermark to all outgoing channels [1]. StreamInputProcessor tracks the max watermark received on each incoming channel separately, and computes the task's watermark as the min of all incoming watermarks [2].

Graph with stream of updates

2016-02-25 Thread Ankur Sharma
Hello, Is it possible to create and update graph with streaming edge and vertex data in flink? Best, Ankur Sharma 3.15 E1.1 Universität des Saarlandes 66123, Saarbrücken Germany Email: ankur.sha...@mpi-inf.mpg.de an...@stud.uni-saarland.de

Watermarks with repartition

2016-02-25 Thread Zach Cox
Hi - how are watermarks passed along parallel tasks where there is a repartition? For example, say I have a simple streaming job computing hourly counts per key, something like this: val environment = StreamExecutionEnvironment.getExecutionEnvironment environment.setParallelism(2) environment.setS

Re: The way to itearte instances in AllWindowFunction in current Master branch

2016-02-25 Thread Aljoscha Krettek
Hi Hung, I see one thing that could explain the problem, the timestamp assigner should look like this: new AssignerWithPeriodicWatermarks() { long curTimeStamp; @Override public long extractTimestamp(BizEvent biz, long currentTimestamp) {

Re: schedule tasks `inside` Flink

2016-02-25 Thread Michal Fijolek
Thanks for help guys! Eventually I did implemented it as a RichFunction using open() and closed() methods. Michał 2016-02-25 19:00 GMT+01:00 Stephan Ewen : > Fabian's suggestion with the co-map is good. You can use a "broadcast()" > connect to make sure the dictionary gets to all nodes. > > If y

Counting tuples within a window in Flink Stream

2016-02-25 Thread Saiph Kappa
Hi, In Flink Stream what's the best way of counting the number of tuples within a window of 10 seconds? Using a map-reduce task? Asking because in spark there is the method rawStream.countByWindow(Seconds(x)). Thanks.

Re: Read every file in a directory at once

2016-02-25 Thread Stephan Ewen
Thanks for sharing this solution! On Thu, Feb 18, 2016 at 4:02 PM, Flavio Pompermaier wrote: > My current solution is: > > List paths = new ArrayList(); > File dir = new File(BASE_DIR); > for (File f : dir.listFiles()) { > paths.add(f.getName()); > } > DataSet mail = env.fromCollection(pa

Re: schedule tasks `inside` Flink

2016-02-25 Thread Stephan Ewen
Fabian's suggestion with the co-map is good. You can use a "broadcast()" connect to make sure the dictionary gets to all nodes. If you want full control about how and when to read the data, a scheduled task is not that bad even as a solution. Make sure you implement this as a "RichFunction", so y

Re: Mapping two datasets

2016-02-25 Thread Saliya Ekanayake
Thank you. Any thoughts on the ParallelIteratorInputFormat in Flink? On Thu, Feb 25, 2016 at 12:07 PM, Márton Balassi wrote: > Hey Saliya, > > I recommend using DataSetUtils.zipWithIndex for this task. [1] It comes > with flink-java. > > [1] > https://github.com/apache/flink/blob/master/flink-ja

Re: Mapping two datasets

2016-02-25 Thread Márton Balassi
Hey Saliya, I recommend using DataSetUtils.zipWithIndex for this task. [1] It comes with flink-java. [1] https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/utils/DataSetUtils.java#L77 On Thu, Feb 25, 2016 at 5:52 PM, Saliya Ekanayake wrote: > Thank y

Re: Frequent exceptions killing streaming job

2016-02-25 Thread Nick Dimiduk
For what it's worth, I dug into the TM logs and found that this exception was not the root cause, merely a symptom of other backpressure building in the flow (actually, lock contention in another part of the stack). While Flink was helpful in finding and bubbling up this stack to the UI, it was ult

Re: The way to itearte instances in AllWindowFunction in current Master branch

2016-02-25 Thread HungChang
An update. The following situation works as expected. The data arrives after Flink job starts to execute. 1> (2016-02-25T17:46:25.00,13) 2> (2016-02-25T17:46:40.00,16) 3> (2016-02-25T17:46:50.00,11) 4> (2016-02-25T17:47:10.00,12) But for the data arrives long time before. Strange behavior appears.

Re: Mapping two datasets

2016-02-25 Thread Saliya Ekanayake
Thank you, Marton. That seems doable. However, is there a way I can create a dummy indexed data set? Like a way to partition the index range without data across parallel tasks. For example, if I could have something like, DataSet ds = ... then I can implement a custom method to load required dat

Re: Mapping two datasets

2016-02-25 Thread Márton Balassi
Hey Saliya, I would add a uniqe ID to both the DataSets, the variable you referred to as 'i'. Then you can join the two DataSets on the field containing 'i' and do the mapping on the joined result. Hope this helps, Marton On Thu, Feb 25, 2016 at 5:38 PM, Saliya Ekanayake wrote: > Hi, > > I've

Mapping two datasets

2016-02-25 Thread Saliya Ekanayake
Hi, I've two data sets like, DataSet a = ... DataSet b = ... They have the same type and same decomposition. I want to apply a map operator that need both *a* and *b. *For example, a.map( i -> OP) within this OP I need the corresponding (*i *th) element of *b* as well. Is there a way to do thi

Re: Master Thesis [Apache-flink paper references]

2016-02-25 Thread Stephan Ewen
Hi! What Flink implements on the Streaming side is not part of Stratosphere any more, so unfortunately not linked on that website. Here are some pointers: - Fault tolerance: http://arxiv.org/abs/1506.08603 - The streaming model (windows / event time) is based on the Dataflow model: http://p

Re: The way to itearte instances in AllWindowFunction in current Master branch

2016-02-25 Thread HungChang
Thank you for your reply. Please let me know if other classes o full code is needed. /** * Count how many total events */ StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironment(4, env_config); env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

Re: The way to itearte instances in AllWindowFunction in current Master branch

2016-02-25 Thread Aljoscha Krettek
Hi Hung, could you maybe post a more complete snippet of your program? This would allow me to figure out why the output changes between versions 0.10 and 1.0. @Matthias: The signature was changed to also allow window functions that don’t take an Iterable. For example, when doing WindowedStream.a

Re: The way to itearte instances in AllWindowFunction in current Master branch

2016-02-25 Thread Matthias J. Sax
Just out of curiosity: Why was it changes like this. Specifying "Iterable<...>" as type in AllWindowFunction seems rather unintuitive... -Matthias On 02/25/2016 01:58 PM, Aljoscha Krettek wrote: > Hi, > yes that is true. The way you would now write such a function is this: > > private static cla

Re: The way to itearte instances in AllWindowFunction in current Master branch

2016-02-25 Thread HungChang
Thanks you. I can be sure this way is correct now. I have tried this but the windows are not aggregating as well. Instead, the AllWindowFunction only works as flatMap. Shouldn't it only output for one window range? The most strange part is the first output is aggregating while others are not. 1>

Re: loss of TaskManager

2016-02-25 Thread Till Rohrmann
Hi Christoph, have you tried setting the blocks parameter of the SVM algorithm? That basically decides how many features are grouped together in one block. The lower the value is the more feature vectors are grouped together and, thus, the size of the block is increased. Increasing this value migh

Re: The way to itearte instances in AllWindowFunction in current Master branch

2016-02-25 Thread Aljoscha Krettek
Hi, yes that is true. The way you would now write such a function is this: private static class MyIterableFunction implements AllWindowFunction>, Tuple2, TimeWindow> { private static final long serialVersionUID = 1L; @Override public void apply( TimeWindow window, Ite

Re: The way to itearte instances in AllWindowFunction in current Master branch

2016-02-25 Thread HungChang
Thank you for your reply. The following in the current master looks like not iterable? because the parameter is IN rather than Iterable So I still have problem to iterate,,, @Public public interface AllWindowFunction extends Function, Serializable { /** * Evaluates the window an

Re: Flink Streaming ContinuousTimeTriggers

2016-02-25 Thread Ankur Sharma
Hello, Thanks for the reply. Target: I have a stream of tuples(append-only, no updates) that are appended to a file. These records have priority that is changing over time and is calculated as a function of the tuple’s timestamp and current system time. Once any tuple’s priority reaches some M

AW: loss of TaskManager

2016-02-25 Thread Boden, Christoph
Hi Ufuk, thanks for the hint. Unfortunately I cannot access the system log on the remote machine. But i re-ran the job with slightly increased memory.fraction (0.3 -> 0.4) and got an OutOfMemory Exception again: cloud-25 Error: com.esotericsoftware.kryo.KryoException: java.io.IOException: Fail

Re: [VOTE] Release Apache Flink 1.0.0 (RC1)

2016-02-25 Thread Aljoscha Krettek
I’ll look at the usual testing stuff and also focus on testing savepoints on a cluster. Btw, we don’t yet have the usual “testing checklist” document, do we? > On 25 Feb 2016, at 12:11, Márton Balassi wrote: > > Thanks for creating the candidate Robert and for the heads-up, Slim. > > I would

Re: loss of TaskManager

2016-02-25 Thread Ufuk Celebi
Hey Chris! I think that the full amount of memory to Flink leads to the TM process being killed by the OS. Can you check the OS logs whether the OOM killer shut it down? You should be able to see this in the system logs. – Ufuk On Thu, Feb 25, 2016 at 11:24 AM, Boden, Christoph wrote: > Dear F

Re: The way to itearte instances in AllWindowFunction in current Master branch

2016-02-25 Thread Aljoscha Krettek
Hi Hung, you are right, the generic parameters of AllWindowFunction changed from Iterable to IN. However, in the apply function on AllWindowedStream the parameter changed from IN to Iterable. What this means is that you can still do: windowed.apply(new MyIterableWindowFunction()) and iterate o

Re: [VOTE] Release Apache Flink 1.0.0 (RC1)

2016-02-25 Thread Márton Balassi
Thanks for creating the candidate Robert and for the heads-up, Slim. I would like to get a PR [1] in before 1.0.0 as it breaks hashing behavior of DataStream.keyBy. The PR has the feature implemented and the java tests adopted, there is still a bit of outstanding fix for the scala tests. Gábor Hor

Re: [VOTE] Release Apache Flink 1.0.0 (RC1)

2016-02-25 Thread Slim Baltagi
Dear Flink community It is great news that the vote for the first release candidate (RC1) of Apache Flink 1.0.0 is starting today February 25th, 2016! As a community, we need to double our efforts and make sure that Flink 1.0.0 is GA before these 2 upcoming major events: Strata + Hadoop World

Re: Kafka partition alignment for event time

2016-02-25 Thread Erdem Agaoglu
Hi Robert, I switched to SNAPSHOT and confirm that it works. Thanks! On Thu, Feb 25, 2016 at 10:50 AM, Robert Metzger wrote: > Hi Erdem, > > FLINK-3440 has been resolved. The fix is merged to master. 1.0-SNAPSHOT > should already contain the fix and it'll be in 1.0.0 (for which I'll post a > re

[VOTE] Release Apache Flink 1.0.0 (RC1)

2016-02-25 Thread Robert Metzger
Dear Flink community, Please vote on releasing the following candidate as Apache Flink version 1.0 .0. I've set user@flink.apache.org on CC because users are encouraged to help testing Flink 1.0.0 for their specific use cases. Please report issues (and successful tests!) on d...@flink.apache.org.

loss of TaskManager

2016-02-25 Thread Boden, Christoph
Dear Flink Community, I am trying to fit a support vector machine classifier using the CoCoA implementation provided in flink/ml/classification/ on a data set of moderate size (400k data points, 2000 features, approx. 12GB) on a cluster of 25 nodes with 28 GB memory each - and each worker node

Re: Flink Streaming ContinuousTimeTriggers

2016-02-25 Thread Aljoscha Krettek
Hi, in addition to the video I would also like to point out that the Continuous triggers should only be used with the GlobalWindows assigner. For TimeWindows the right thing to do would be using the EventTimeTrigger (or ProcessingTimeTrigger in case you are doing processing time windows). Could

Re: Kafka partition alignment for event time

2016-02-25 Thread Robert Metzger
Hi Erdem, FLINK-3440 has been resolved. The fix is merged to master. 1.0-SNAPSHOT should already contain the fix and it'll be in 1.0.0 (for which I'll post a release candidate today) as well. On Thu, Feb 18, 2016 at 3:24 PM, Erdem Agaoglu wrote: > Thanks Stephan > > On Thu, Feb 18, 2016 at 3:00