Mixing Batch & Streaming

2016-01-27 Thread Don Frascuchon
Hi everyone, There is any way of mixing dataStreams and dataSets ? For example, enrich messages from a dataStream with a precalculated info in a dataSet. Thanks in advance!

Re: Reading ORC format on Flink

2016-01-27 Thread Chiwan Park
Hi Phil, I think that you can read ORC file using OrcInputFormat [1] with readHadoopFile method. There is an example on MapReduce [2] in Stackoveflow. The approach works also on Flink. Maybe you have to use RichMapFunction [3] to initialize OrcSerde and StructObjectInspector object. Regards,

Kafka+Flink

2016-01-27 Thread Vinaya M S
Hi, I have a 3 node kafka cluster. In server.properties file of each of them I'm setting advertisedhost.name: to its IP. I have 4 node flink cluster. In each of kaka's node I'm setting host.name: to one of link's worker node. like, in kafka1: host.name: flink-data1IP in kafka2: host.name: fl

Re: Streaming left outer join

2016-01-27 Thread Stephan Ewen
Hi! I think this pull request may be implementing what you are looking for: https://github.com/apache/flink/pull/1527 Stephan On Wed, Jan 27, 2016 at 2:06 PM, Alexander Gryzlov wrote: > Hello Aljoscha, > > Indeed, it seems like I'd need a custom operator. I imagine this involves > implementin

Re: Imbalanced workload between workers

2016-01-27 Thread Stephan Ewen
Hi Pieter! Interesting, but good :-) I don't think we did much on the hash functions since 0.9.1. I am a bit surprised that it made such a difference. Well, as long as it improves with the newer version :-) Greetings, Stephan On Wed, Jan 27, 2016 at 9:42 PM, Pieter Hameete wrote: > Hi Till,

Re: Imbalanced workload between workers

2016-01-27 Thread Pieter Hameete
Hi Till, i've upgraded to Flink 0.10.1 and ran the job again without any changes to the code to see the bytes input and output of the operators and for the different workers.To my surprise it is very well balanced between all workers and because of this the job completed much faster. Are there an

Flume source support

2016-01-27 Thread Alexandr Dzhagriev
Hello, At the moment master branch contains the commented class FlumeSource. Unfortunately, I can't find any information regarding the future support of it. Can anyone, please, shed the light on it? Thanks, Alex.

Re: Maven artifacts scala 2.11 bug?

2016-01-27 Thread Stephan Ewen
Good to hear! Sorry for the hassle you have to go through. There is a lot of restructuring to make it clean for 1.0. Greetings, Stephan On Wed, Jan 27, 2016 at 9:16 PM, David Kim wrote: > Hi Stephan, Robert, > > Yes, I found a solution. Turns out that I shouldn't specify a suffix for > flink-

Re: Maven artifacts scala 2.11 bug?

2016-01-27 Thread David Kim
Hi Stephan, Robert, Yes, I found a solution. Turns out that I shouldn't specify a suffix for flink-core. I changed flink-core to not have any suffix. "org.apache.flink" %% "flink-core" % flinkVersion % "it,test" classifier " tests", "org.apache.flink" % "flink-core" % flinkVersion % "it,test" cla

Re: Maven artifacts scala 2.11 bug?

2016-01-27 Thread Stephan Ewen
Hi David! The dependencies that SBT marks as wrong (org.apache.flink:flink-shaded-hadoop2, org.apache.flink:flink-core, org.apache.flink:flink-annotations) are actually those that are Scala-independent, and have no suffix at all. It is possible your SBT file does not like miking dependencies with

Reading ORC format on Flink

2016-01-27 Thread Philip Lee
Hello, Question about reading ORC format on Flink. I want to use dataset after loadtesting csv to orc format by Hive. Can Flink support reading ORC format? If so, please let me know how to use the dataset in Flink. Best, Phil

Re: Maven artifacts scala 2.11 bug?

2016-01-27 Thread David Kim
Hi Robert, Here's the relevant snippet for my sbt config. My dependencies are listed in a file called Dependencies.scala. object Dependencies { val flinkVersion = "1.0-SNAPSHOT" val flinkDependencies = Seq( "org.apache.flink" %% "flink-scala" % flinkVersion, "org.apache.flink" %%

Re: Maven artifacts scala 2.11 bug?

2016-01-27 Thread Robert Metzger
Hi David, can you post your SBT build file as well? On Wed, Jan 27, 2016 at 7:52 PM, David Kim wrote: > Hello again, > > I saw the recent change to flink 1.0-SNAPSHOT on explicitly adding the > scala version to the suffix. > > I have a sbt project that fails. I don't believe it's a misconfigura

Maven artifacts scala 2.11 bug?

2016-01-27 Thread David Kim
Hello again, I saw the recent change to flink 1.0-SNAPSHOT on explicitly adding the scala version to the suffix. I have a sbt project that fails. I don't believe it's a misconfiguration error on my end because I do see in the logs that it tries to resolve everything with _2.11. Could this possib

Re: continous time triger

2016-01-27 Thread Brian Chhun
Hi Aljoscha, No problem with the change. I think it's more what a user would expect as well. On Wed, Jan 27, 2016 at 3:27 AM, Aljoscha Krettek wrote: > Hi Brian, > you are right about changing the behavior of windows when closing. Would > this be a problem for you? > > Cheers, > Aljoscha > > On

about blob.storage.dir and .buffer files

2016-01-27 Thread Gwenhael Pasquiers
Hello, We got a question about blob.storage.dir and it’s .buffer files : What are they ? And are they cleaned or is there a way to limit their size and to evaluate the necessary space ? We got a node root volume disk filled by those files (~20GB) and it crashed. Well, the root was filled becaus

Re: Imbalanced workload between workers

2016-01-27 Thread Pieter Hameete
Cheers for the quick reply Till. That would be very useful information to have! I'll upgrade my project to Flink 0.10.1 tongiht and let you know if I can find out if theres a skew in the data :-) - Pieter 2016-01-27 13:49 GMT+01:00 Till Rohrmann : > Could it be that your data is skewed? This c

Re: Streaming left outer join

2016-01-27 Thread Alexander Gryzlov
Hello Aljoscha, Indeed, it seems like I'd need a custom operator. I imagine this involves implementing org.apache.flink.streaming.api.operators.TwoInputStreamOperator? Could you provide those pointers please? Alex On Wed, Jan 27, 2016 at 12:03 PM, Aljoscha Krettek wrote: > Hi, > I’m afraid the

Re: Imbalanced workload between workers

2016-01-27 Thread Till Rohrmann
Could it be that your data is skewed? This could lead to different loads on different task managers. With the latest Flink version, the web interface should show you how many bytes each operator has written and received. There you could see if one operator receives more elements than the others.

Imbalanced workload between workers

2016-01-27 Thread Pieter Hameete
Hi guys, Currently I am running a job in the GCloud in a configuration with 4 task managers that each have 4 CPUs (for a total parallelism of 16). However, I noticed my job is running much slower than expected and after some more investigation I found that one of the workers is doing a majority o

Re: [NOTICE] Maven artifacts names now suffixed with Scala version

2016-01-27 Thread Ufuk Celebi
Thanks for the notice. I’ve added a warning to the snapshot docs and created a Wiki page with the changes: https://cwiki.apache.org/confluence/display/FLINK/Maven+artifact+names+suffixed+with+Scala+version – Ufuk > On 27 Jan 2016, at 12:20, Maximilian Michels wrote: > > Dear users and develop

[NOTICE] Maven artifacts names now suffixed with Scala version

2016-01-27 Thread Maximilian Michels
Dear users and developers, We have merged changes [1] that will affect how you build Flink programs with the latest snapshot version of Flink and with future releases. Maven artifacts which depend on Scala are now suffixed with the Scala major version, e.g. "2.10" or "2.11". While some of the Mav

Re: Task Manager metrics per job on Flink 0.9.1

2016-01-27 Thread Pieter Hameete
Hi Fabian and Till, thanks for the tips i'll see if I can work with the REST interface for now. I'll make a JIRA ticket as well. I might even be able to develop this feature but I wont have time to do that in the coming 2 months. It would be nice to be able to make a first contribution though. Kee

Re: Task Manager metrics per job on Flink 0.9.1

2016-01-27 Thread Till Rohrmann
Hi Pieter, you're right that it would be nice to record the metrics for a later analysis. However, at the moment this is not supported. You could use the REST interface to obtain the JSON representation of the shown data in the web interface. By doing this repeatedly and parsing the metric data yo

Re: Streaming left outer join

2016-01-27 Thread Aljoscha Krettek
Hi, I’m afraid there is currently now way to do what you want with the builtin window primitives. Each of the slices of the sliding windows is essentially evaluated independently. Therefore, there cannot be effects in one slice that influence processing of another slice. What you could do is sw

Re: Task Manager metrics per job on Flink 0.9.1

2016-01-27 Thread Fabian Hueske
Hi, it is correct that the metrics are collected from the task managers. In Flink 0.9.1 the metrics are visualized as charts in the web dashboard. This visualization was removed when the dashboard was redesigned and updated for 0.10. but will be hopefully be added again. For Flink 0.9.1, the metr

Re: continous time triger

2016-01-27 Thread Aljoscha Krettek
Hi Brian, you are right about changing the behavior of windows when closing. Would this be a problem for you? Cheers, Aljoscha > On 26 Jan 2016, at 17:53, Radu Tudoran wrote: > > Hi, > > Thank you for sharing your experience and also to Till for the advice. > What I would like to do is to be

Re: rowmatrix equivalent

2016-01-27 Thread Chiwan Park
There is a JIRA issue (FLINK-1873, [1]) that covers the distributed matrix implementation. [1]: https://issues.apache.org/jira/browse/FLINK-1873 Regards, Chiwan Park > On Jan 27, 2016, at 5:21 PM, Chiwan Park wrote: > > I hope the distributed matrix and vector implementation on Flink. :) > >

Re: rowmatrix equivalent

2016-01-27 Thread Chiwan Park
I hope the distributed matrix and vector implementation on Flink. :) Regards, Chiwan Park > On Jan 27, 2016, at 2:29 AM, Lydia Ickler wrote: > > Hi Till, > > maybe I will do that :) > If I have some other questions I will let you know! > > Best regards, > Lydia > > >> Am 24.01.2016 um 17:3