Re: Subtask keeps on discovering new Kinesis shard when using Kinesalite

2016-11-16 Thread Tzu-Li (Gordon) Tai
Hi Phillip, Thanks for testing it. From your log and my own tests, I can confirm the problem is with Kinesalite not correctly mocking the official Kinesis behaviour for the `describeStream` API. There’s a PR for the fix here: https://github.com/apache/flink/pull/2822. With this change, shard di

Re: Error handling

2016-11-16 Thread Aljoscha Krettek
Hi, is that the complete stack trace or is there more to it? I cannot really see where the exception originates. Cheers, Aljoscha On Wed, 16 Nov 2016 at 10:38 criss wrote: > Hi, > > I have this, architecture: kafka topic -> flink kafka stream -> flink > custom > sink to save data in a Postgresq

Enforce an operation to run in an exact host in flink

2016-11-16 Thread Eranga Heshan
Hi all, I have 4 VMs. There is an input stream and an output stream in my job. To get the exact time interval for an event to be processed, I need to run both input and output bolts inside the same host/node. Is there a way to run a bolt inside a host we define? Thank you, Regards, Eranga Hesha

Re: Flink job restart at checkpoint interval

2016-11-16 Thread Satish Chandra Gupta
Hi Till, Thanks. Yes, that is what I have been doing. But accessing GUI over VPN of Flink running on a yarn cluster on EMR sometime becomes very slow (not even execution plan gets shown :-) sometime), that's why I thought of this. Thanks, +satish On Wed, Nov 16, 2016 at 6:46 PM, Till Rohrmann w

Re: A custom FileInputFormat

2016-11-16 Thread Niklas Semmler
Hello Fabian, thanks for the response and my apologies for the late reply. I tried extending the InputFormat, but it felt to hackish, so I simply loaded the files before running the job and fed it to Flink via the env.fromCollections(...) statement. Cheers, Niklas On 01.11.2016 21:59, Fabian H

spark vs flink batch performance

2016-11-16 Thread CPC
Hi all, I am trying to compare spark and flink batch performance. In my test i am using ratings.csv in http://files.grouplens.org/datasets/movielens/ml-latest.zip dataset. I also concatenated ratings.csv 16 times to increase dataset size(total of 390465536 records almost 10gb).I am reading from go

Running the JobManager and TaskManager on the same node in a cluster

2016-11-16 Thread Dominik Safaric
Hi, It is generally recommended for streaming engines, also including Flink to run a separate master node - in the case of Flink, the JobManager. However, why should one in Flink run the JobManager on a separate node? Performance wise, the JobManager isn’t intense unlike of course TaskManager

Re: Tame Flink UI?

2016-11-16 Thread Chesnay Schepler
Hello, The WebInterfaces first pulls a list of all available metrics for a specific taskmanager/job/task (which is reasonable since how else would you select them), and then requests the values for all metrics by supplying the name of every single metric it just received, which is where things

Re: Tame Flink UI?

2016-11-16 Thread Ufuk Celebi
Hey Cliff, yes this has been recently merged to the master branch. I think you are right that this is not feasible. I thought that the metrics are pulled in selectively when you select them via the metrics list. It seems to be not the case. If it is really the case that everything is always req

Re: Why use Kafka after all?

2016-11-16 Thread Philipp Bussche
Hi Dromit I started using Flink with Kafka but am currently looking into Kinesis to replace Kafka. The reason behind this is that eventually my application will run in somebody's cloud and if I go for AWS then I don't have to take care of operating Kafka and Zookeeper myself. I understand this can

Re: Subtask keeps on discovering new Kinesis shard when using Kinesalite

2016-11-16 Thread Philipp Bussche
Hello Gordon, thank you for your help. I have set the discovery interval to 30 seconds and just starting the job on a clean kinesalite service (I am running it inside docker so every time the container gets stopped and removed to start from scratch). This is the output without actually any data i

Re: Why use Kafka after all?

2016-11-16 Thread Alberto Ramón
I have several times the same thoughts that Dromit: (Kafka cluster is a expensiver over cost) Can someone check this Ideas? Case 1: You dont need Replay, Exact One: - All values have time-stamp - Data Source insert the WaterMark in the Source. Some code example ? Case 2: Your DataSource is o

Any way to increase sort buffer size?

2016-11-16 Thread Gábor Hermann
Hi all, Is there any way to increase the sort buffer size other than increasing the overall TaskManager memory? The following error comes up running a job with huge matrix block objects on a cluster: Error obtaining the sorted input: Thread 'SortMerger Reading Thread' terminated due to an ex

Re: 33 segments problem with configuration set

2016-11-16 Thread otherwise777
Hello Vasia, thank you for your fast reply, I am aware that determining the betweenness is very demanding, however i still want to give a try at it to a certain extent in Flink, not using Flink is currently not an option since my project is partly about Flink. I will rethink my login, i guess it

Re: 33 segments problem with configuration set

2016-11-16 Thread otherwise777
Some additional information i just realized, it crashes on this line of code: collectionDataSet.print(); I tried placing it inside of the loop, it crashes at the 7th iteration now -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/33-segments-p

Re: 33 segments problem with configuration set

2016-11-16 Thread Vasiliki Kalavri
Dear Wouter, first of all, as I noted in another thread already, betweenness centrality is an extremely demanding algorithm and a distributed data engine such as Flink is probably not the best system to implement it into. On top of that, the message-passing model for graph computations would gener

Re: Additional steps needed for the Java quickstart guide

2016-11-16 Thread Kostas Kloudas
Hi Theodore, Thanks a lot for reporting this. It is true that many people have encountered it also during the training sessions. Cheers, Kostas > On Nov 16, 2016, at 2:49 PM, Theodore Vasiloudis > wrote: > > Hello all, > > I was preparing an exercise for some Master students and I went thro

Additional steps needed for the Java quickstart guide

2016-11-16 Thread Theodore Vasiloudis
Hello all, I was preparing an exercise for some Master students and I went through running the Java quickstart setup [1] again to verify everything works as expected. I ran into a problem when running from within IDEA, we've encountered this in the past during trainings. While the quickstart gui

Re: Flink job restart at checkpoint interval

2016-11-16 Thread Till Rohrmann
Hi Satish, I'm afraid but I think there is no such way to configure the name of the checkpoint file for a task at the moment. For the latest checkpoint you can see the state sizes for the individual subtask in the web ui under checkpoints. Cheers, Till On Tue, Nov 15, 2016 at 10:52 PM, Satish Ch

Re: Retrieving values from a dataset of datasets

2016-11-16 Thread Gábor Gévay
The short answer is that because DataSet is not serializable. I think the main underlying problem is that Flink needs to see all DataSet operations before launching the job. However, if you have a DataSet>, then operations on the inner DataSets will end up being specified inside the UDFs of operat

33 segments problem with configuration set

2016-11-16 Thread otherwise777
Hello Community, I'm trying to make a function to determine the betweenness of the Vertices in a Graph. I'm using Gelly for this and a custom shortestpath function This is my input graph: http://prntscr.com/d7y51y What i've done is use collect() on the vertice value

Re: Is incremental checkpointing already supported?

2016-11-16 Thread Kostas Kloudas
Hello, No, incremental checkpointing is not yet supported. Best, Kostas > On Nov 16, 2016, at 12:05 PM, 魏偉哲 wrote: > > Hi, > > Reply for the question below said the incremental checkpoint was not > implemented yet. > https://mail-archives.apache.org/mod_mbox/flink-user/201604.mbox/%3CCANMXwW

Is incremental checkpointing already supported?

2016-11-16 Thread 魏偉哲
Hi, Reply for the question below said the incremental checkpoint was not implemented yet. https://mail-archives.apache.org/mod_mbox/flink-user/201604.mbox/%3CCANMXwW1gZwuxAa0N+jRjLkstP9jZ=i1efuab8e6zjqg6dzm...@mail.gmail.com%3E But I saw the document of latest release version mentioned the increm

Re: Error handling

2016-11-16 Thread criss
Hi, I have this, architecture: kafka topic -> flink kafka stream -> flink custom sink to save data in a Postgresql database. For testing how the system will behave if an error occurs, I've done the following test: Activate checkpoints on my DataStream and put on kafka topic one item with special

Re: Kafka Stream to Database batch inserts

2016-11-16 Thread criss
Hi, Thank you very much for you hit! It works pretty well. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-Stream-to-Database-batch-inserts-tp10036p10140.html Sent from the Apache Flink User Mailing List archive. mailing list archive