Thanks,
Is there any way to measure shuffle data (read and write) on Flink or
Dashboard?
I did not find the network usage metric in it.
Best,
Phil
On Mon, Jan 25, 2016 at 5:06 PM, Fabian Hueske wrote:
> You can start a job and then periodically request and store information
> about the runnin
Nice,
you guys rock!
From: fhue...@gmail.com
Date: Thu, 28 Jan 2016 23:34:58 +0100
Subject: Re: Window stream using timestamp key for time
To: user@flink.apache.org
Hi Emmanuel,
the feature you are looking for is called event time processing in Flink.
These blog posts should help you to become f
Hi Emmanuel,
the feature you are looking for is called event time processing in Flink.
These blog posts should help you to become familiar with the concepts:
1) Event-Time concepts:
http://data-artisans.com/how-apache-flink-enables-new-streaming-applications-part-1/
2) Windows in Flink:
http://fl
Hello,
I have used Flink to stream data and do analytics on the stream, using time
windows...
Now, this is assuming the data is effectively coming in real time. However I
have a use case where the data is 'batched' upstream, and comes in bursts, but
has a timestamp.It obviously messes up the win
Hi,
I want to register a custom aggregation convergence criterion to a bulk
iteration and I want to use the scala API.
It appears to me that this is not possible at the moment, right?
The AggregatorRegistry is exposed by IterativeDataSet.java, which is
hidden by DataSet.scala:
def iterate
If the dataset is too large for a file, you can put it behind a service and
have your stream operators query the service for enrichment. You can even
support updates to that dataset in a style very similar to the "lambda
architecture" discussed elsewhere.
On Thursday, January 28, 2016, Fabian Hues
Hi to all,
I was reading about optimal Parquet file size and HDFS block size.
The ideal situation for Parquet is when its block size (and thus the
maximum size of each row group) is equal to the HDFS block size. The
default behaviour of Flink is that the output file's size depends on the
output pa
Hi Gwenhael!
Let's look into this and fix anything we find. Can you briefly tell us:
- How much data is in the blob-store directory, versus in the buffer
files?
- How many buffer files do you have and how large are they in average?
Greetings,
Stephan
On Thu, Jan 28, 2016 at 10:18 AM, Ti
Hi,
I think hacking the StreamJoinOperator could work for you. In Flink a
StreamOperator is essentially a more low-level version of a FlatMap. It
receives elements by the processElement() method and can emit elements using
the Output object output which is a souped up Collector that also allows
Hello Stephan,
Yes, I've seen this one before, but AFAIU this is a different use-case:
they need an inner join with 2 different windows, whereas I'm ok with a
single window, but need an outer join with different semantics...
Their StreamJoinOperator, however looks roughly fitting, so I'll probably
Hi Robert,
Thanks for the quick reply. I'll try.
Best regards, Alex.
On Thu, Jan 28, 2016 at 11:09 AM, Robert Metzger
wrote:
> Hi Alex,
>
> I'm sorry that you found that class commented. I think so far nobody was
> interested in using the Flume source.It was commented out during a
> refactorin
Hi Alex,
I'm sorry that you found that class commented. I think so far nobody was
interested in using the Flume source.It was commented out during a
refactoring of the stream sources:
https://github.com/apache/flink/pull/659#discussion-diff-29854032.
I just looked a bit through the (commented) cod
Hi Vinaya,
this blog post explains how to connect Flink and Kafka:
http://data-artisans.com/kafka-flink-a-practical-how-to/
On Thu, Jan 28, 2016 at 1:28 AM, Vinaya M S wrote:
> Hi,
>
> I have a 3 node kafka cluster. In server.properties file of each of them
> I'm setting
> advertisedhost.name:
Hi Lydia,
what do you mean with master? Usually when you submit a program to the
cluster and don’t specify the parallelism in your program, then it will be
executed with the parallelism.default value as parallelism. You can specify
the value in your cluster configuration flink-config.yaml file.
Al
Hi Gwenhael,
in theory the blob storage files can be any binary data. At the moment,
this is however only used to distribute the user code jars. The jars are
kept around as long as the job is running. Every
library-cache-manager.cleanup.interval interval the files are checked and
those which are n
Hi,
this is currently not support yet. However, this feature is on our roadmap
and has been requested for a few times.
So I hope somebody will pick it up soon.
If the static data set is small enough, you can read the full data set
(e.g., as a file) in the open method of FlatMapFunction, build a
h
Hi all,
I am doing some operations on a DataSet> … (see
code below)
When I run my program on a cluster with 3 machines I can see within the web
client that only my master is executing the program.
Do I have to specify somewhere that all machines have to participate? Usually
the cluster execute
17 matches
Mail list logo