checkpoint state keeps on increasing

2016-08-17 Thread Janardhan Reddy
Hi, I am noticing that the checkpointing state has been constantly growing for the below subtask. Only the current active window elements should be checkpointed ? why is it constantly growing ? finalStream.keyBy("<>").countWindow(2,1) .apply((_, _, input: scala.Iterable[], out: Collector[]) =>

problem running flink using remote environment

2016-08-17 Thread Baswaraj Kasture
I am using flink 1.1.1. I am trying to run flink streaming program (kafka as source). It works perfectly when I use StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); But, problem is when I use one of the following to create env. StreamExecutionEnvironment env

Re: applying where after group by in dataset

2016-08-17 Thread Jark Wu
Hi, You can only apply aggregate after groupBy. But you can apply filter before groupBy. So you can do like this: distancePoints.filter(filterFunction).groupBy(1) - Jark Wu > 在 2016年8月17日,下午5:29,subash basnet 写道: > > here

Compress DataSink Output

2016-08-17 Thread Wesley Kerr
Hello - Forgive me if this has been asked before, but I'm trying to determine the best way to add compression to DataSink Outputs (starting with TextOutputFormat). Realistically I would like each partition file (based on parallelism) to be compressed independently with gzip, but am open to other

Re: Cannot use WindowedStream.fold with EventTimeSessionWindows

2016-08-17 Thread Jack Huang
Hi Till, The session I am dealing with does not have a reliable "end-of-session" event. It could stop sending events all of sudden or it could keep sending events forever. I need to be able to determine when a session expire due to inactivity or to kill off a session if it lives longer than it sho

off heap memory deallocation

2016-08-17 Thread Janardhan Reddy
Hi, When does off heap memory gets deallocated ? Does it get deallocated only when gc is triggered ? When does the gc gets triggered other than when the direct memory reached -XX::MaxDirectMemory limit passed in jvm flag. Thanks

Re: Sorting in datastream

2016-08-17 Thread subash basnet
Hello Stephan, Okey, then it's the same reason why there is no *count()* function in Data streams as well I suppose. Regards, Subash On Wed, Aug 17, 2016 at 6:26 PM, Stephan Ewen wrote: > Hi! > > Data streams are inifnite. It's quite hard to sort something infinite ;-) > That's why the operat

Re: Sorting in datastream

2016-08-17 Thread Stephan Ewen
Hi! Data streams are inifnite. It's quite hard to sort something infinite ;-) That's why the operation does not exist on DataStream. Stephan On Wed, Aug 17, 2016 at 6:22 PM, subash basnet wrote: > Hello all, > > I found the *sortPartition()* function in dataset for ordering the > dataset elem

Sorting in datastream

2016-08-17 Thread subash basnet
Hello all, I found the *sortPartition()* function in dataset for ordering the dataset elements as below: DataSet> data; DataSet> partitionedData = data.sortPartition(0, Order.DESCENDING); But I couldn't find any methods to sort the elements in datastream. DataStream> data; DataStream> partitioned

Re: Programmatically Creating a Flink Cluster On YARN

2016-08-17 Thread Maximilian Michels
Hi Benjamin, Please apologize the late reply. In the latest code base and also Flink 1.1.1, the Flink configuration doesn't have to be loaded via a file location read from an environment variable and it doesn't throw an exception if it can't find the config upfront (phew). Instead, you can also se

Re: partial savepoints/combining savepoints

2016-08-17 Thread Till Rohrmann
Hi Claudia, On Wed, Aug 17, 2016 at 5:06 PM, Claudia Wegmann wrote: > Hi again, > > > > I was thinking this over and tried some stuff. Some questions remain, > though: > > > > To 2) > > a) I only have used standalone mode yet. What would be the upsides of > using Yarn? > The upside would be tha

Re: counting elements in datastream

2016-08-17 Thread subash basnet
Correction: dataset* On Wed, Aug 17, 2016 at 4:39 PM, subash basnet wrote: > Hello all, > > There is *count()* function in database to count the number of elements > in the dataset. But it's not there in case of datastream. What could be the > way to count the number of elements in case of datas

AW: partial savepoints/combining savepoints

2016-08-17 Thread Claudia Wegmann
Hi again, I was thinking this over and tried some stuff. Some questions remain, though: To 2) a) I only have used standalone mode yet. What would be the upsides of using Yarn? b) Does an easy way to start Flink (JobManager+TaskManagers) programmatically already exist? c) I tried to build my o

counting elements in datastream

2016-08-17 Thread subash basnet
Hello all, There is *count()* function in database to count the number of elements in the dataset. But it's not there in case of datastream. What could be the way to count the number of elements in case of datastream. Best Regards, Subash Basnet

Re: TumblingEventTimeWindows with time characteristic set to 'ProcessingTime'

2016-08-17 Thread Stephan Ewen
You can still mix processing time and event time. You only need to set the base environment to event time in order to activate the timestamp/watermark transport. On Wed, Aug 17, 2016 at 1:40 PM, Al-Isawi Rami wrote: > Hi Till, > > Yes, I understand that. I am just asking why. If I am assigning t

Re: TumblingEventTimeWindows with time characteristic set to 'ProcessingTime'

2016-08-17 Thread Al-Isawi Rami
Hi Till, Yes, I understand that. I am just asking why. If I am assigning the timestamps. why can’t flink deal with the window based on the event time? i.e TumblingEventTimeWindows. Why should I set the whole env to be ticking on event time? It is just that I have some windows that needs to be

Re: Error joining with Python API

2016-08-17 Thread Chesnay Schepler
Found the issue, there was a missing tab in the chaining method... On 16.08.2016 12:12, Chesnay Schepler wrote: looks like a bug, will look into it. :) On 16.08.2016 10:29, Ufuk Celebi wrote: I think that this is actually a bug in Flink. I'm cc'ing Chesnay who originally contributed the Python

applying where after group by in dataset

2016-08-17 Thread subash basnet
Hello all, In the following dataset: DataSet> distancePoints; I wanted to count the number of *distancePoints* where boolean value is either true of false. distancePoints.groupBy(1). didn't find how to apply there 'where' clause here. Best Regards, Subash Basnet

Re: Cannot use WindowedStream.fold with EventTimeSessionWindows

2016-08-17 Thread Till Rohrmann
Hi Jack, the problem with session windows and a fold operation, which is an incremental operation, is that you don't have a way to combine partial folds when merigng windows. As a workaround you have to specify a window function where you get an iterator over all your window elements and then perf

Re: Azure Blob Storage Connector

2016-08-17 Thread Lau Sennels
I have successfully connected Azure blob storage to Flink-1.1. Below are the steps necessary: - Add hadoop-azure-2.7.2.jar (assuming you are using a Hadoop 2.7 Flink binary) and azure-storage-4.3.0.jar to /lib, and set file permissions / ownership accordingly. - Add the following to a file 'core-s

Re: TumblingEventTimeWindows with time characteristic set to 'ProcessingTime'

2016-08-17 Thread Till Rohrmann
Hi Rami, have you set the event time characteristic to event time with env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);? Otherwise the event time windows won’t work. Cheers, Till ​ On Tue, Aug 16, 2016 at 2:42 PM, Al-Isawi Rami wrote: > Hi, > > Why this combination is not possibl