org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl

2017-03-23 Thread rimin515
hi,i read file from hdfs,but there is error when run jon on yarn clutster,---val dataSeg = env.readTextFile("hdfs:///user/hadoop/text").filter(!_.startsWith("#")).map { x => val values = x.split("\t") (values.apply(0),values.appl

Re: Question Regarding a sink..

2017-03-23 Thread Tzu-Li (Gordon) Tai
Hi Steve, This normally shouldn’t happen, unless there simply is two copies of the data. What is the source of the topology? Also, this might be obvious, but if you have broadcasted your input stream to the sink, then each sink instance would then get all records in the input stream. Cheers, G

Question Regarding a sink..

2017-03-23 Thread Steve Jerman
Hi, I have a sink writing data to InfluxDB. I’ve noticed that the sink gets multiple copies of upstream records.. Why does this happen, and how can I avoid it… ? Below is a trace …showing 2 records (I have a parallelism of two) for each record in the ‘.printToError’ for the same stream. Any h

Re: deploying flink cluster in AWS - Containerized

2017-03-23 Thread Chakravarthy varaga
I'm looking forward to hearing some updates on this... Any help here is highly appreciated !! On Thu, Mar 23, 2017 at 4:20 PM, Chakravarthy varaga < chakravarth...@gmail.com> wrote: > Hi Team, > > We are doing a PoC to deploy Flink cluster on AWS. All runtime > components will be dockerized

Re: flink-1.2 and unit testing / flinkspector

2017-03-23 Thread Tarandeep Singh
Hi Nancy, I also get 1 test failed when I build/run tests on flink-spector: - should stop if all triggers fire Run completed in 3 seconds, 944 milliseconds. Total number of tests run: 19 Suites: completed 5, aborted 0 Tests: succeeded 18, failed 1, canceled 0, ignored 0, pending 0 *** 1 TEST FAIL

Re: flink-1.2 and unit testing / flinkspector

2017-03-23 Thread Ted Yu
Nancy: You can start another thread for the failed unit tests. You can pass "-DskipTests" to get over the install command. Cheers On Thu, Mar 23, 2017 at 11:06 AM, Nancy Estrada wrote: > Hi Tarandeep and Ted, > > I am in this route now. I am trying to use Flinkspector with Flink 1.2 > using >

Re: flink-1.2 and unit testing / flinkspector

2017-03-23 Thread Nancy Estrada
Hi Tarandeep and Ted, I am in this route now. I am trying to use Flinkspector with Flink 1.2 using your instructions but failing miserably. After applying the changes, when I try to run "mvn clean install", some Tests fail and therefore I am not able to build successfully. I am wondering if ther

Re: Alternatives to FLINK_CONF_DIR, HADOOP_CONF_DIR, YARN_CONF_DIR

2017-03-23 Thread Foster, Craig
Thanks! I looked in the config.sh file that sort of works with both the configuration file and these environment variables. After inspection, it doesn’t make sense to set FLINK_CONF_DIR in that config file since that location determines where we would look for that file. However, I thought that

Re: Alternatives to FLINK_CONF_DIR, HADOOP_CONF_DIR, YARN_CONF_DIR

2017-03-23 Thread Bajaj, Abhinav
I think the FLINK_CONF_DIR points to the conf directory. This is the place where the Flink CLI looks for the flink-conf.yaml file. I think there is an alternate option for HADOOP_CONF_DIR, YARN_CONF_DIR but I am not sure. Check this https://ci.apache.org/projects/flink/flink-docs-release-1.3/se

Re: Cogrouped Stream never triggers tumbling event time window

2017-03-23 Thread Andrea Spina
Sorry, I forgot to put the Flink version. 1.1.2 Thanks, Andrea -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Cogrouped-Stream-never-triggers-tumbling-event-time-window-tp12373p12374.html Sent from the Apache Flink User Mailing List archi

Cogrouped Stream never triggers tumbling event time window

2017-03-23 Thread Andrea Spina
Dear Community, I'm really struggling on a co-grouped stream. The workload is the following: * val firstStream: DataStream[FirstType] = firstRaw.assignTimestampsAndWatermarks(new MyCustomFirstExtractor(maxOutOfOrder)) val secondStream: DataStream[SecondType] = secondRaw .assi

Alternatives to FLINK_CONF_DIR, HADOOP_CONF_DIR, YARN_CONF_DIR

2017-03-23 Thread Foster, Craig
Can I set these in the configuration file? This would be ideal vs. environment variables for me but I’m not seeing it in the documentation. Thanks, Craig

deploying flink cluster in AWS - Containerized

2017-03-23 Thread Chakravarthy varaga
Hi Team, We are doing a PoC to deploy Flink cluster on AWS. All runtime components will be dockerized. I have few questions in relation to discover & security: 1. How does Job Manager discover task managers? Do they talk to over TCP ? 2. If the runtime components TM, JM a

Re: [E] Re: unable to add more servers in zookeeper quorum peers in flink 1.2

2017-03-23 Thread Greg Hogan
A PR and review may have noted that the same regex is in stop-zookeeper-quorum.sh and recommended ignoring whitespace both before and after the equals sign. > On Mar 23, 2017, at 10:12 AM, Robert Metzger wrote: > > Thank you for verifying. Fixed in master: > http://git-wip-us.apache.org/repo

Re: Benchmarking streaming frameworks

2017-03-23 Thread Dominik Safaric
Dear Giselle, Various stream processing engines benchmarks already exist. Here are only a few of them I believe are worthwhile mentioning: http://ieeexplore.ieee.org/document/7530084/ https://www.usenix.org/node/188989

keyBy doesn't evenly distribute keys

2017-03-23 Thread Dongwon Kim
Hi all, I want keyBy() to evenly distribute records over operator subtasks especially for a small number of keys. I execute a test code (see below if interested) with varying numbers of keys while setting parallelism to 5. The key assignment to subtasks is as follows: - 5 keys over 5 subtasks :

Re: Task manager number mismatch container number on mesos

2017-03-23 Thread Robert Metzger
Could you provide the logs of the task manager that still runs as a container but doesn't show up as a Taskmanager? On Thu, Mar 23, 2017 at 11:38 AM, Renjie Liu wrote: > Permanent. I've waited for several minutes and the task manager is still > lost. > > On Thu, Mar 23, 2017 at 6:34 PM Ufuk Cele

Re: Task manager number mismatch container number on mesos

2017-03-23 Thread Renjie Liu
I'm not sure how to reproduce this bug, and I'll post it next time it happens. On Thu, Mar 23, 2017 at 10:21 PM Robert Metzger wrote: > Could you provide the logs of the task manager that still runs as a > container but doesn't show up as a Taskmanager? > > On Thu, Mar 23, 2017 at 11:38 AM, Renj

Re: Odd error

2017-03-23 Thread Robert Metzger
Hi, I assume the flatMap(new RecordSplit()) is emitting a RawRecord. Is it possible that you've also added an empty constructor to it while adding the compareTo() method? I think the problem is that one of your types (probably the schema) is recognized as a nested POJO. Check out this documentati

Re: [E] Re: unable to add more servers in zookeeper quorum peers in flink 1.2

2017-03-23 Thread Robert Metzger
Thank you for verifying. Fixed in master: http://git-wip-us.apache.org/repos/asf/flink/commit/3e860b40 On Wed, Mar 22, 2017 at 9:25 PM, wrote: > That worked.. Thanks Chesnay, > > > > > [image: Verizon] > > Kanagaraj Vengidasamy > RTCI > > 7701 E Telecom PKWY > Temple Te

Re: Flink AUR package is available

2017-03-23 Thread Robert Metzger
Amazing, thanks a lot! On Thu, Mar 23, 2017 at 10:36 AM, Tao Meng wrote: > Hi all, > > For arch linux users, I have created flink AUR package. > We can use the package manager to install flink and use the systemd > manager flink as service. > If you have any questions or suggestions please fee

flink Broadcast

2017-03-23 Thread rimin515
Hi ,alll, i have a 36000 documents,and the document all transfer a vector , one doc is a vector,and dimension is the same,so have DataSet val data :DataSet[(String,SparseVector)]= //36000 record val toData = data.collect() val docSims = data.map{x=> val fromId=x._

Re: RocksDB segfaults

2017-03-23 Thread Robert Metzger
Florian, can you post the log of the Taskmanager where the segfault happened ? On Wed, Mar 22, 2017 at 6:19 PM, Stefan Richter wrote: > Hi, > > for the first checkpoint, from the stacktrace I assume that the backend is > not accessed as part of processing an element, but by another thread. Is >

Re: Threading issue

2017-03-23 Thread Robert Metzger
Hi, how many unique combinations of your key "partition","threadNumber","schemaId" exist? In my opinion, all sinks should receive data if there are enough different keys. On Wed, Mar 22, 2017 at 3:41 AM, Telco Phone wrote: > I am looking to get readers from kafka / keyBy and Sink working with

Re: [POLL] Who still uses Java 7 with Flink ?

2017-03-23 Thread Theodore Vasiloudis
Hello all, I'm sure you've considered this already, but what this data does not include is all the potential future users, i.e. slower moving organizations (banks etc.) which could be on Java 7 still. Whether those are relevant is up for debate. Cheers, Theo On Thu, Mar 23, 2017 at 12:14 PM, Ro

Re: Benchmarking streaming frameworks

2017-03-23 Thread Michael Noll
A recent one is "Analytics on Fast Data: Main-Memory Database Systems versus Modern Streaming Systems" ( http://db.in.tum.de/~kipf/papers/fastdata.pdf) For the record, the paper above doesn't yet cover/realize that, nowadays, the Kafka project includes native stream processing capabilities aka the

Re: Benchmarking streaming frameworks

2017-03-23 Thread Felix Neutatz
Hi, our team already created a benchmark framework for batch processing (including MR,Yarn, Spark, Flink), maybe you like to extend it for streaming: https://github.com/peelframework/peel Best regards, Felix On Mar 23, 2017 11:51, "Christophe Salperwyck" wrote: Good idea! You could test Akka s

Re: Windows emit results at the end of the stream

2017-03-23 Thread Sonex
Thank you for your response Yassine, I forgot to mention that I use the Scala API. In Scala the equivalent code is: val inputFormat = new TextInputFormat(new Path("file/to/read.txt")) env.readFile(inputFormat,"file/to/read.txt", FileProcessingMode.PROCESS_CONTINUOUSLY,1L) Am I correct? But

Re: A question about iterations and prioritizing "control" over "data" inputs

2017-03-23 Thread Paris Carbone
Unless I got this wrong, if he meant relaxing FIFO processing per channel/stream partition then Robert is absolutely right. On 23 Mar 2017, at 12:28, Paris Carbone mailto:par...@kth.se>> wrote: I think what Theo meant is to allow for different: high/low priority on different channels (or data

Re: A question about iterations and prioritizing "control" over "data" inputs

2017-03-23 Thread Paris Carbone
I think what Theo meant is to allow for different: high/low priority on different channels (or data streams per se) for n-ary operators such as ConnectedStream binary maps, loops etc.. not to change the sequence of events within channels I guess. This does not violate the FIFO channel assumptio

Re: [POLL] Who still uses Java 7 with Flink ?

2017-03-23 Thread Robert Metzger
Yeah, you are right :) I'll put something in my calendar for end of May. On Thu, Mar 23, 2017 at 12:12 PM, Greg Hogan wrote: > Robert, > > Thanks for the report. Shouldn’t we be revisiting this decision at the > beginning of the new release cycle rather than near the end? There is > currently li

Re: A question about iterations and prioritizing "control" over "data" inputs

2017-03-23 Thread Robert Metzger
To very quickly respond to Theo's question: No, it is not possible to have records overtake each other in the buffer. This could potentially void the exactly once processing guarantees, in the case when records overtake checkpoint barriers. On Wed, Mar 15, 2017 at 5:58 PM, Kathleen Sharp wrote:

Re: [POLL] Who still uses Java 7 with Flink ?

2017-03-23 Thread Greg Hogan
Robert, Thanks for the report. Shouldn’t we be revisiting this decision at the beginning of the new release cycle rather than near the end? There is currently little cost to staying with Java 7 since no Flink code or pull requests have been written for Java 8. Greg > On Mar 23, 2017, at 6:3

Re: Telling if a job has caught up with Kafka

2017-03-23 Thread Robert Metzger
Sorry for joining this discussion late, but there is already a metric for the offset lag in our 0.9+ consumers. Its called the "records-lag-max": https://kafka.apache.org/documentation/#new_consumer_fetch_monitoring and its exposed via Flink's metrics system. The only issue is that it only shows th

Re: Benchmarking streaming frameworks

2017-03-23 Thread Christophe Salperwyck
Good idea! You could test Akka streams too. Lots of documents exist: https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at https://github.com/yahoo/streaming-benchmarks Cheers, Christophe 2017-03-23 11:09 GMT+01:00 Giselle van Dongen : > Dear users of Strea

Re: Windows emit results at the end of the stream

2017-03-23 Thread Yassine MARZOUGUI
Hi Sonex, When using readTextFile(...) with event time, only one watermark with the value Long.MAX_VALUE is sent at the end of the stream, which explais why the windows are stored until the whole file is processed. In order to have periodic watermarks, you need to process the file continuousely as

Re: accessing flink HA cluster with scala shell/zeppelin notebook

2017-03-23 Thread Robert Metzger
Hi Alexis, did you set the Zookeeper configuration for Flink in Zeppelin? On Mon, Mar 20, 2017 at 11:37 AM, Alexis Gendronneau < a.gendronn...@gmail.com> wrote: > Hello users, > > As Maciek, I'm currently trying to make apache Zeppelin 0.7 working with > Flink. I have two versions of flink avail

Re: Task manager number mismatch container number on mesos

2017-03-23 Thread Renjie Liu
Permanent. I've waited for several minutes and the task manager is still lost. On Thu, Mar 23, 2017 at 6:34 PM Ufuk Celebi wrote: > When it happens, is it temporary or permanent? > > Looping in Till and Eron who worked on the Mesos runner. > > – Ufuk > > On Thu, Mar 23, 2017 at 11:09 AM, Renjie

Re: [POLL] Who still uses Java 7 with Flink ?

2017-03-23 Thread Robert Metzger
Looks like 9% on twitter and 24% on the mailing list are still using Java 7. I would vote to keep supporting Java 7 for Flink 1.3 and then revisit once we are approaching 1.4 in September. On Thu, Mar 16, 2017 at 8:00 AM, Bowen Li wrote: > There's always a tradeoff we need to make. I'm in favor

Re: Task manager number mismatch container number on mesos

2017-03-23 Thread Ufuk Celebi
When it happens, is it temporary or permanent? Looping in Till and Eron who worked on the Mesos runner. – Ufuk On Thu, Mar 23, 2017 at 11:09 AM, Renjie Liu wrote: > Hi, all: > We are using flink 1.2.0 on mesos. We found the number of task managers > mismatches with container number occasinally.

Task manager number mismatch container number on mesos

2017-03-23 Thread Renjie Liu
Hi, all: We are using flink 1.2.0 on mesos. We found the number of task managers mismatches with container number occasinally. That's the mesos container still exists but it can't be found on the monitor web page of flink master. This case doesn't happen frequently and it's hard to reproduce. -- L

Benchmarking streaming frameworks

2017-03-23 Thread Giselle van Dongen
Dear users of Streaming Technologies, As a PhD student in big data analytics, I am currently in the process of compiling a list of benchmarks (to test multiple streaming frameworks) in order to create an expanded benchmarking suite. The benchmark suite is being developed as a part of my current wo

Re: Cassandra Sink version

2017-03-23 Thread Kostas Kloudas
Glad I could help! > On Mar 23, 2017, at 10:42 AM, Nancy Estrada wrote: > > The documentation you mentioned says: "The Java client driver 3.0.7 (branch > 3.0.x) is compatible with Apache Cassandra 1.2, 2.0, 2.1, 2.2 and 3.0". > > Thank you Kostas! > > > > -- > View this message in context:

Re: Cassandra Sink version

2017-03-23 Thread Nancy Estrada
The documentation you mentioned says: "The Java client driver 3.0.7 (branch 3.0.x) is compatible with Apache Cassandra 1.2, 2.0, 2.1, 2.2 and 3.0". Thank you Kostas! -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Cassandra-Sink-version-tp12

Flink AUR package is available

2017-03-23 Thread Tao Meng
Hi all, For arch linux users, I have created flink AUR package. We can use the package manager to install flink and use the systemd manager flink as service. If you have any questions or suggestions please feel free to contact me. $ yaourt -S apache-flink $ systemctl start apache-flink-jobmana

Re: Netty issues while deploying Flink with Yarn on MapR

2017-03-23 Thread Robert Metzger
Just FYI: There is now a documentation page on how to use Flink on MapR (Thanks to Gordon): https://ci.apache.org/projects/flink/flink-docs- release-1.3/setup/mapr_setup.html On Tue, Feb 7, 2017 at 6:34 PM, Robert Metzger wrote: > Hi, > cool! > > Yes, creating a JIRA for the problem is a good i

Windows emit results at the end of the stream

2017-03-23 Thread Sonex
Hi everyone, I am using a simple window computation on a stream with event time. The code looks like this: streamData.readTextFile(...) .map(...) .assignAscendingTimestamps(_.timestamp) .keyBy(_.id) .timeWindow(Time.seconds(3600),Time.seconds(3600)) .apply(new MyWindowFunction

[ANNOUNCE] Apache Flink 1.1.5 Released

2017-03-23 Thread Tzu-Li (Gordon) Tai
The Apache Flink community is pleased to announce the availability of Flink 1.1.5, which is the next bugfix release for the 1.1 series. The official release announcement: https://flink.apache.org/news/2017/03/23/release-1.1.5.html Release binaries: http://apache.lauf-forum.at/flink/flink-1.1.5 Fo