Hi,
I am noticing that the checkpointing state has been constantly growing for
the below subtask. Only the current active window elements should be
checkpointed ? why is it constantly growing ?
finalStream.keyBy("<>").countWindow(2,1)
.apply((_, _, input: scala.Iterable[], out: Collector[]) =>
I am using flink 1.1.1.
I am trying to run flink streaming program (kafka as source).
It works perfectly when I use
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
But, problem is when I use one of the following to create env.
StreamExecutionEnvironment env
Hi,
You can only apply aggregate after groupBy. But you can apply filter before
groupBy.
So you can do like this:
distancePoints.filter(filterFunction).groupBy(1)
- Jark Wu
> 在 2016年8月17日,下午5:29,subash basnet 写道:
>
> here
Hello -
Forgive me if this has been asked before, but I'm trying to determine the
best way to add compression to DataSink Outputs (starting with
TextOutputFormat). Realistically I would like each partition file (based
on parallelism) to be compressed independently with gzip, but am open to
other
Hi Till,
The session I am dealing with does not have a reliable "end-of-session"
event. It could stop sending events all of sudden or it could keep sending
events forever. I need to be able to determine when a session expire due to
inactivity or to kill off a session if it lives longer than it sho
Hi,
When does off heap memory gets deallocated ? Does it get deallocated only
when gc is triggered ? When does the gc gets triggered other than when the
direct memory reached -XX::MaxDirectMemory limit passed in jvm flag.
Thanks
Hello Stephan,
Okey, then it's the same reason why there is no *count()* function in Data
streams as well I suppose.
Regards,
Subash
On Wed, Aug 17, 2016 at 6:26 PM, Stephan Ewen wrote:
> Hi!
>
> Data streams are inifnite. It's quite hard to sort something infinite ;-)
> That's why the operat
Hi!
Data streams are inifnite. It's quite hard to sort something infinite ;-)
That's why the operation does not exist on DataStream.
Stephan
On Wed, Aug 17, 2016 at 6:22 PM, subash basnet wrote:
> Hello all,
>
> I found the *sortPartition()* function in dataset for ordering the
> dataset elem
Hello all,
I found the *sortPartition()* function in dataset for ordering the dataset
elements as below:
DataSet> data;
DataSet> partitionedData = data.sortPartition(0,
Order.DESCENDING);
But I couldn't find any methods to sort the elements in datastream.
DataStream> data;
DataStream> partitioned
Hi Benjamin,
Please apologize the late reply. In the latest code base and also
Flink 1.1.1, the Flink configuration doesn't have to be loaded via a
file location read from an environment variable and it doesn't throw
an exception if it can't find the config upfront (phew). Instead, you
can also se
Hi Claudia,
On Wed, Aug 17, 2016 at 5:06 PM, Claudia Wegmann
wrote:
> Hi again,
>
>
>
> I was thinking this over and tried some stuff. Some questions remain,
> though:
>
>
>
> To 2)
>
> a) I only have used standalone mode yet. What would be the upsides of
> using Yarn?
>
The upside would be tha
Correction: dataset*
On Wed, Aug 17, 2016 at 4:39 PM, subash basnet wrote:
> Hello all,
>
> There is *count()* function in database to count the number of elements
> in the dataset. But it's not there in case of datastream. What could be the
> way to count the number of elements in case of datas
Hi again,
I was thinking this over and tried some stuff. Some questions remain, though:
To 2)
a) I only have used standalone mode yet. What would be the upsides of using
Yarn?
b) Does an easy way to start Flink (JobManager+TaskManagers) programmatically
already exist?
c) I tried to build my o
Hello all,
There is *count()* function in database to count the number of elements in
the dataset. But it's not there in case of datastream. What could be the
way to count the number of elements in case of datastream.
Best Regards,
Subash Basnet
You can still mix processing time and event time. You only need to set the
base environment to event time in order to activate the timestamp/watermark
transport.
On Wed, Aug 17, 2016 at 1:40 PM, Al-Isawi Rami
wrote:
> Hi Till,
>
> Yes, I understand that. I am just asking why. If I am assigning t
Hi Till,
Yes, I understand that. I am just asking why. If I am assigning the timestamps.
why can’t flink deal with the window based on the event time? i.e
TumblingEventTimeWindows. Why should I set the whole env to be ticking on event
time?
It is just that I have some windows that needs to be
Found the issue, there was a missing tab in the chaining method...
On 16.08.2016 12:12, Chesnay Schepler wrote:
looks like a bug, will look into it. :)
On 16.08.2016 10:29, Ufuk Celebi wrote:
I think that this is actually a bug in Flink. I'm cc'ing Chesnay who
originally contributed the Python
Hello all,
In the following dataset:
DataSet> distancePoints;
I wanted to count the number of *distancePoints* where boolean value is
either true of false.
distancePoints.groupBy(1).
didn't find how to apply there 'where' clause here.
Best Regards,
Subash Basnet
Hi Jack,
the problem with session windows and a fold operation, which is an
incremental operation, is that you don't have a way to combine partial
folds when merigng windows. As a workaround you have to specify a window
function where you get an iterator over all your window elements and then
perf
I have successfully connected Azure blob storage to Flink-1.1.
Below are the steps necessary:
- Add hadoop-azure-2.7.2.jar (assuming you are using a Hadoop 2.7 Flink
binary) and azure-storage-4.3.0.jar to /lib, and set file
permissions / ownership accordingly.
- Add the following to a file 'core-s
Hi Rami,
have you set the event time characteristic to event time with
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);? Otherwise
the event time windows won’t work.
Cheers,
Till
On Tue, Aug 16, 2016 at 2:42 PM, Al-Isawi Rami
wrote:
> Hi,
>
> Why this combination is not possibl
21 matches
Mail list logo