Re: ClasNotFound when submitting job from command line

2016-04-20 Thread Robert Metzger
Hi Flavio, in which class are you calling Class.forName()? Is the class where the Class.forName() call is loaded from the user jar or is it a class from the Flink distribution? I'm asking because Class.forName() is using the classloader of the class where the call is located. So if the class has b

Re: Leader not found

2016-04-20 Thread Robert Metzger
Hi, I just tried it with Kafka 0.8.2.0 and 0.8.2.1 and for both versions everything worked fine. How many partitions does your topic have? Can you send me the full logs of the Kafka consumer? On Tue, Apr 19, 2016 at 6:05 PM, Balaji Rajagopalan < balaji.rajagopa...@olacabs.com> wrote: > Flink ver

Custom time window in Flink

2016-04-20 Thread Piyush Shrivastava
Hello,I wanted to enquire how a job I am trying to do with Flink can be done.I have also posted a question on StackOverflow. PFB the link:http://stackoverflow.com/questions/36720192/custom-windows-charging-in-flink I am using Flink's TimeWindow functionality to perform some computations. I am cr

Re: ClasNotFound when submitting job from command line

2016-04-20 Thread Flavio Pompermaier
At the moment my 2 PRs about jdbc batch connector and input format fix are pending so I have my customized implementation of the inputformat in my jar andas jdbc backend I need ijdbc6.jar as dependency (that I shade on the jar). Thus the class.forName() is called on the inputformat open(). On 20 Ap

Re: Compression - AvroOutputFormat and over network ?

2016-04-20 Thread Tarandeep Singh
Avro changes look easy. I think I can make those changes. To make changes to network data, I need some directions. @Ufuk please point me to corresponding code. thanks, Tarandeep On Mon, Apr 18, 2016 at 11:05 AM, Robert Metzger wrote: > Hi Tarandeep, > > I think for that you would need to set a

RE: Custom time window in Flink

2016-04-20 Thread Radu Tudoran
Hi, Tha way to do this is to create your own evictor. In the evictor you can than decide when the events are removed. I would suggest creating a symmetric trigger as well because I would assume that you also need to fire the computation first after 1 hour and then at each 5 minutes. The logic w

Re: Custom time window in Flink

2016-04-20 Thread Piyush Shrivastava
Hello,Thanks a lot for your reply. Can you share a sample code or example which I can refer while creating a custom evictor and trigger? Thanks and Regards,Piyush Shrivastava http://webograffiti.com On Wednesday, 20 April 2016 2:50 PM, Radu Tudoran wrote: #yiv9940457293 #yiv994045729

Re: Master (1.1-SNAPSHOT) Can't run on YARN

2016-04-20 Thread Stefano Baghino
Not exactly, I just wanted to let you know about it and know if someone else experimented this issue; perhaps it's more of a dev mailing list discussion, sorry for posting this here. Feel free to continue the discussion on the other list if you feel it's more appropriate. On Tue, Apr 19, 2016 at 6

Re: Checkpoint and restore states

2016-04-20 Thread Aljoscha Krettek
Hi, what seems to be the problem? Cheers, Aljoscha On Wed, 20 Apr 2016 at 03:52 Jack Huang wrote: > Hi all, > > I am doing a simple word count example and want to checkpoint the > accumulated word counts. I am not having any luck getting the counts saved > and restored. Can someone help? > > en

RE: Custom time window in Flink

2016-04-20 Thread Radu Tudoran
Hi, Easiest way is to just start from the code of an existing one https://github.com/apache/flink/tree/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/windowing/evictors If you take the example of Timeevictor you would just need to use the same code and modify the publi

Re: Sink Parallelism

2016-04-20 Thread Fabian Hueske
Hi Ravinder, your drawing is pretty much correct (Flink will inject a combiner between flat map and reduce which locally combines records with the same key). The partitioning between flat map and reduce is done with hash partitioning by default. However, you can also define a custom partitioner to

Re: Flink + S3

2016-04-20 Thread Ufuk Celebi
On Wed, Apr 20, 2016 at 1:35 AM, Michael-Keith Bernard wrote: > We're running on self-managed EC2 instances (and we'll eventually have a > mirror cluster in our colo). The provided documentation notes that for Hadoop > 2.6, we'd need such-and-such version of hadoop-aws and guice on the CP. If I

Re: Checkpoint and restore states

2016-04-20 Thread Stefano Baghino
Hi Jack, it seems you correctly enabled the checkpointing by calling `env.enableCheckpointing`. However, your UDFs have to either implement the Checkpointed interface or use the Key/Value State interface to make sure the state of the computation is snapshotted. The documentation explains how to d

Trying to detecting changes

2016-04-20 Thread toletum
Hi! I'm a beginner in Flink. I'm reading from a Kafka topic. In this topic, I receive a character each event, like that: Event.: 1 2 3 4 5 6 7 8 9... Data..: A A A B B B B C C... I would like to do a "trigger" when the character is different than before. For example: Event º1 fire because of A

Operation of Windows and Triggers

2016-04-20 Thread Piyush Shrivastava
I wanted to know how Windows and Triggers work in Flink. I am creating a time window of 20 seconds and a count trigger of 100. stream.keyBy(0) .timeWindow(Time.seconds(20)) .trigger(CountTrigger.of(100)) In this case, when will my window get triggered? When 20 seconds has

Re: Operation of Windows and Triggers

2016-04-20 Thread Fabian Hueske
Hi Piyush, if you explicitly set a trigger, the default trigger of the window is replaced. In your example, the time trigger is replaced by the count trigger, i.e., the window is only evaluated after the 100th element was received. This blog post discusses windows and triggers [1]. Best, Fabian

Re: Checkpoint and restore states

2016-04-20 Thread Aljoscha Krettek
Hi, the *withState() family of functions use the Key/Value state interface internally, so that should work. On Wed, 20 Apr 2016 at 12:33 Stefano Baghino wrote: > Hi Jack, > > it seems you correctly enabled the checkpointing by calling > `env.enableCheckpointing`. However, your UDFs have to eithe

Need to sleep the thread to let my Flink Zookeeper datasource with NodeDataChanged work

2016-04-20 Thread Sendoh
Hi, I'd like to implement a custom Zookeeper data source which reads zookeeper if NodeDataChange. Now it's not working perfectly because the thread needs to sleep otherwise it doesn't work. public static class ZKSource implements SourceFunction { private static final long serialVe

Re: Trying to detecting changes

2016-04-20 Thread Aljoscha Krettek
Hi, this could be done by implementing a user function that keeps state or by using windows with a custom Trigger. On only works, however, if you only have one Kafka partition and if your Flink job is executing with parallelism=1. Otherwise we don't have any ordering guarantees on streams. Cheers,

Re: Operation of Windows and Triggers

2016-04-20 Thread Piyush Shrivastava
Hi Fabian, Thanks for the information. I also quickly want to ask that if I implement a custom trigger that fires in one hour for the first time and then every five minutes, what all functions do I need to use?I am considering creating my own trigger referring the code here:https://github.com/a

Re: Checkpoint and restore states

2016-04-20 Thread Stefano Baghino
My bad, thanks for pointing that out. On Wed, Apr 20, 2016 at 1:49 PM, Aljoscha Krettek wrote: > Hi, > the *withState() family of functions use the Key/Value state interface > internally, so that should work. > > On Wed, 20 Apr 2016 at 12:33 Stefano Baghino < > stefano.bagh...@radicalbit.io> wro

Re: Trying to detecting changes

2016-04-20 Thread Stefano Baghino
Can the CEP library be used for this use case? On Wed, Apr 20, 2016 at 2:02 PM, Aljoscha Krettek wrote: > Hi, > this could be done by implementing a user function that keeps state or by > using windows with a custom Trigger. On only works, however, if you only > have one Kafka partition and if y

Re: Sink Parallelism

2016-04-20 Thread Ravinder Kaur
Hi Fabian, Thank you for the explanation. Could you also explain how keyBy() would work? I assume it should work same as groupBy(), but in streaming mode since the data is unbounded all elements that arrive in the first window are grouped/partitioned by keys and aggregated and so on until no more

Re: Flink + S3

2016-04-20 Thread Robert Metzger
Hi Michael-Keith, Welcome to the Flink community! Let me try answer your question regarding the "best" deployment options: >From what I see from the mailing list, most of our users are using one of the big hadoop distributions (including Amazon EMR) with YARN installed. Having YARN makes things q

Access Flink UI for jobs submitted using Eclipse

2016-04-20 Thread Ritesh Kumar Singh
Hi, Just a basic question, I am using flink via eclipse but when I execute my jobs, I can't access it via the web dashboard. It's basically a maven project and so I've added all the flink jars in my pom file and I'm executing my code by getting the execution environment as follows: ExecutionEnvir

Threads waiting on LocalBufferPool

2016-04-20 Thread Maciek Próchniak
Hi, I'm running my flink job on one rather large machine (20 cores with hyperthreading, 120GB RAM). Task manager has 20GB heap allocated. It does more or less: read csv from kafka -> keyBy one of the fields -> some custom state processing. Kafka topic has 24 partitions, so my parallelism is al

Re: Threads waiting on LocalBufferPool

2016-04-20 Thread Ufuk Celebi
Could be different things actually, including the parts of the network you mentioned. 1) Regarding the TM config: - It can help to increase the number of network buffers (you can go ahead and give it 4 GB, e.g. 134217728 buffers a 32 KB) - In general, you have way more memory available than you a

Re: Master (1.1-SNAPSHOT) Can't run on YARN

2016-04-20 Thread Ufuk Celebi
The user list is OK since you are reporting a bug here ;-) I'm confident that this will be fixed soon! :-) On Wed, Apr 20, 2016 at 11:28 AM, Stefano Baghino wrote: > Not exactly, I just wanted to let you know about it and know if someone else > experimented this issue; perhaps it's more of a dev

Re: Trying to detecting changes

2016-04-20 Thread Till Rohrmann
You could use CEP for that. First you would create a pattern of two states which matches everything. In the select function you could then check whether both elements are different. However, this would be a little bit of an overkill for this simple use case. You could for example simply use a flat

Re: Access Flink UI for jobs submitted using Eclipse

2016-04-20 Thread Till Rohrmann
Have you created a RemoteExecutionEnvironment to submit your job from within the IDE to the running cluster? See here [1] for more information. [1] https://ci.apache.org/projects/flink/flink-docs-master/apis/cluster_execution.html Cheers, Till ​ On Wed, Apr 20, 2016 at 3:41 PM, Ritesh Kumar Sing

Re: Trying to detecting changes

2016-04-20 Thread Stefano Baghino
Ok, thanks for the clarification Till. On Wed, Apr 20, 2016 at 4:46 PM, Till Rohrmann wrote: > You could use CEP for that. First you would create a pattern of two states > which matches everything. In the select function you could then check > whether both elements are different. > > However, th

Re: Sink Parallelism

2016-04-20 Thread Fabian Hueske
In batch / DataSet programs, groupBy() is execute by partitioning the data (usually hash partitioning) and sorting each partition to group all elements with the same key. keyBy() in DataStream programs also partitions the data and results in a KeyedStream. The KeyedStream has information about the

Re: Operation of Windows and Triggers

2016-04-20 Thread Fabian Hueske
Hi Piyush, that's not trivial to implement. You can only do that with a so-called GlobalWindow, i.e., a window which receives all elements of a partition, and a custom trigger which has state to decide whether it has triggered the first window or not. It won't work with a CountTrigger. Best, Fabi

Re: Operation of Windows and Triggers

2016-04-20 Thread Aljoscha Krettek
Just to clarify, the state of a Trigger on GlobalWindows is still local to the key of the element that is in the window(s). On Wed, 20 Apr 2016 at 18:11 Fabian Hueske wrote: > Hi Piyush, > > that's not trivial to implement. You can only do that with a so-called > GlobalWindow, i.e., a window whi

Re: Leader not found

2016-04-20 Thread Balaji Rajagopalan
/usr/share/kafka_2.11-0.8.2.1/bin$ ./kafka-topics.sh --describe --topic capi --zookeeper (someserver) Topic:capi PartitionCount:1 ReplicationFactor:1 Configs: Topic: capi Partition: 0 Leader: 0 Replicas: 0 Isr: 0 There are no events to consume from this topic, this I confirm by running the cons

Re: Leader not found

2016-04-20 Thread Balaji Rajagopalan
Robert, Sorry I gave the information about wrong topic. Here is the right one. balajirajagopalan@megatron-server02:/usr/share/kafka_2.11-0.8.2.1/bin$ ./kafka-topics.sh --describe --topic location --zookeeper (someserver) Topic:location PartitionCount:150 ReplicationFactor:1 Configs: Topic: loca

Re: How to perform this join operation?

2016-04-20 Thread Till Rohrmann
Hi Elias, sorry for the late reply. You're right that with the windowed join you would have to deal with pairs where the timestamp of (x,y) is not necessarily earlier than the timestamp of z. Moreover, by using sliding windows you would receive duplicates as you've described. Using tumbling window

Re: Checkpoint and restore states

2016-04-20 Thread Jack Huang
@Aljoscha: For this word count example I am using a kafka topic as the input stream. The problem is that when I cancel the task and restart it, the task loses the accumulated word counts so far and start counting from 1 again. Am I missing something basic here? @Stefano: I also tried to implements

Re: Checkpoint and restore states

2016-04-20 Thread Stefano Baghino
Hello again, thanks for giving a shot at my advice anyway but Aljoscha is far more knowledgeable then me regarding Flink. :) I hope I'm not getting mixed up again but I think gracefully canceling your job means you lose your job state. Am I right in saying that the state is preserved in case of a

Count windows missing last elements?

2016-04-20 Thread Kostya Kulagin
I have a pretty big but final stream and I need to be able to window it by number of elements. In this case from my observations flink can 'skip' the latest chunk of data if it has lower amount of elements than window size: StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecuti

Re: asm IllegalArgumentException with 1.0.0

2016-04-20 Thread David Kim
Hi Stephan! Following up on this issue, it seems the issue doesn't show itself when using version 1.0.1. I'm able to run our unit tests in IntelliJ now. Thanks! David On Wed, Apr 13, 2016 at 1:59 PM Stephan Ewen wrote: > Does this problem persist? (It might have been caused by maven caches wit

Values are missing, probably due parallelism?

2016-04-20 Thread Kostya Kulagin
I think it has smth to do with parallelism and I probably do not have clear understanding how parallelism works in flink but in this example: StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStreamSource source = env.addSource(new SourceFunction()

Set up Flink Cluster on Windows machines

2016-04-20 Thread Yifei Li
Hi, Is it possible to set up cluster on Windows machines? I have git installed and when I use start-cluster.sh in git cmd, it asks me to type in password for other machines. After I did it, it showed nohup:command not found. I am just wondering if it is possible to set up clusters on Windows mac

Re: Set up Flink Cluster on Windows machines

2016-04-20 Thread Fabian Hueske
Hi Yifei, I think this has not been done before. At least I am not aware of anybody running Flink in cluster mode on Windows. In principle this should work. It is possible to start a local instance on Windows (start-local.bat) and to locally execute Flink programs on this instance using the flink.

Control triggering on empty window

2016-04-20 Thread Maxim
I have the following use case: Input stream of timestamped "on" and "off" events received out of order. I need to produce an event with time that system was "on" every 15 minutes. Events should be produced only for intervals that system was "on". When 15 minute window has at least one record it i

Join DataStream with dimension tables?

2016-04-20 Thread Srikanth
Hello, I have a fairly typical streaming use case but not able to figure how to implement it best in Flink. I want to join records read from a kafka stream with one(or more) dimension tables which are saved as flat files. As per this jira its not

Re: Set up Flink Cluster on Windows machines

2016-04-20 Thread Jamie Grier
I should also think that using Cygwin may work quite well and allow you to just use the current Unix-oriented shell scripts with maybe slight modifications. -Jamie On Wed, Apr 20, 2016 at 4:11 PM, Fabian Hueske wrote: > Hi Yifei, > > I think this has not been done before. At least I am not awa

Re: Threads waiting on LocalBufferPool

2016-04-20 Thread Maciek Próchniak
Hi Ufuk, thanks for quick reply. Actually I had a little time to try both things. 1) helped only temporarily - it just took a bit longer to saturate the pool. After few minutes, periodically all kafka threads were waiting for bufferPool. 2) This seemed to help. I also reduced checkpoint interva