Is there any way to set akka.client.timeout (or other flink config) when
calling bin/flink run instead of editing flink-conf.yaml? I tried to add it
as a -yD flag but couldn't get it working.
Related: https://issues.apache.org/jira/browse/FLINK-3964
Related issue: https://issues.apache.org/jira/browse/FLINK-2672
On Wed, May 25, 2016 at 9:21 AM, Juho Autio wrote:
> Thanks, indeed the desired behavior is to flush if bucket size exceeds a
> limit but also if the bucket has been open long enough. Contrary to the
> current RollingSink we don't w
Maybe, I don't know, but with streaming. How about batch?
Srikanth wrote
> Isn't this related to -- https://issues.apache.org/jira/browse/FLINK-2672
> ??
>
> This can be achieved with a RollingSink[1] & custom Bucketer probably.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/a
RollingSink is part of Flink Streaming API. Can it be used in Flink Batch
jobs, too?
As implied in FLINK-2672, RollingSink doesn't support dynamic bucket paths
based on the tuple fields. The path must be given when creating the
RollingSink instance, ie. before deploying the job. Yes, a custom Buck
Thanks, indeed the desired behavior is to flush if bucket size exceeds a
limit but also if the bucket has been open long enough. Contrary to the
current RollingSink we don't want to flush all the time if the bucket
changes but have multiple buckets "open" as needed.
In our case the date to use for
Hi,
Has anyone tried to submit a Flink Job remotely to Yarn running in AWS ?
The case I am stuck with is where the Flink client is on my laptop and YARN is
running on AWS.
@Robert, Did you get a chance to try this out?
Regards,
Abhi
From: "Bajaj, Abhinav" mailto:abhinav.ba...@here.com>>
Date:
Hey,
Is it true that since taskmanager (a jvm) may have multiple subtasks
implementing the same operator and thus same logic and loading the same
classes, no separating classloading is done right?
So if i use a scala object or static code as in java within that logic then
that is shared among the
Mentioning 100TB "in my context" is more like "saving current state" at
some point of time to "backup" or "direct access" storage and continue with
next 100TB/hours/days of streamed data.
So - no, it is not about a finite data set.
On Mon, May 23, 2016 at 11:13 AM, Matthias J. Sax wrote:
> Are y
I have been trying to send data from several processes to my Flink
application. I want to use a single port that will receive data from
multiple clients. I have implemented my own SourceFunction, but I have two
doubts.
My TCP server has multiple threads receiving data from multiple clients, is
ca
Hi,
I have the following question. Does Flink support incremental updates?
In particular, I have a custom StateValue object and during the checkpoints
I would like to save only the fields that changed since the previous
checkpoint. Is that possible?
Regards,
Gosia
I'm looking for a way to avoid thread starvation in my tasks, by returning
future but i don't see how is that possible.
Hence i would like to know, how flink handle the case where in your job you
have to perform network calls (I use akka http or spray) or any IO
operation and use the result of it.
Hi Ufuk,
I get the error before running it. I mean somehow the syntax is also not
right.
I am trying to do the following:
ConnectedStreams connectedStream =
br.connect(mainInput);
IterativeStream.ConnectedIterativeStreams
iteration
=
Do you have any suggestion about how to reproduce the error on a subset of
the data?
I'm trying changing the following but I can't find a configuration causing
the error :(
rivate static ExecutionEnvironment getLocalExecutionEnv() {
org.apache.flink.configuration.Configuration c = new
org.
Hi Bart,
From what I understand, you want to do a partial (per node) aggregation before
shipping the result
for the final one at the end. In addition, the keys do not seem to change
between aggregations, right?
If this is the case, this is the functionality of the Combiner in batch.
In Batch
Hi Ufuk, I am willing to do some work for this issue and has a basic
solution for it. And wish to get professional suggestion from you. What is the
next step for it ? Looking forward to your reply!
Zhijiang
Wang--发件人:Ufuk
(migrated from IRC)
Hello All,
My situation is this:
I have a large amount of data partitioned in kafka by "session" (natural
partitioning). After I read the data, I would like to do as much as possible
before incurring re-serialization or network traffic due to the size of the
data. I am
Isn't this related to -- https://issues.apache.org/jira/browse/FLINK-2672 ??
This can be achieved with a RollingSink[1] & custom Bucketer probably.
[1]
https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/connectors/fs/RollingSink.html
Srikanth
On Tue, May
Hi Juho,
If I understand correctly, you want a custom RollingSink that caches some
buckets, one for each topic/date key, and whenever the volume of data buffered
exceeds a limit, then it flushes to disk, right?
If this is the case, then you are right that this is not currently supported
out-of-
Hi Josh,
for the first part of your question you might be interested in our ongoing
work of adding side inputs to Flink. I started this design doc:
https://docs.google.com/document/d/1hIgxi2Zchww_5fWUHLoYiXwSBXjv-M5eOv-MKQYN3m4/edit?usp=sharing
It's still somewhat rough around the edges but could
Hi Josh,
You can trigger an occasional refresh, e.g. on every 100 elements
received. Or, you could start a thread that does that every 100
seconds (possible with a lock involved to prevent processing in the
meantime).
Cheers,
Max
On Mon, May 23, 2016 at 7:36 PM, Josh wrote:
>
> Hi Max,
>
> Than
Could you suggest how to dynamically partition data with Flink streaming?
We've looked at RollingSink, that takes care of writing batches to S3, but
it doesn't allow defining the partition dynamically based on the tuple
fields.
Our data is coming from Kafka and essentially has the kafka topic and
The error look really strange. Flavio, could you compile a test program
with example data and configuration to reproduce the problem. Given that,
we could try to debug the problem.
Cheers,
Till
Hi Thomas,
if you want to run multiple Flink cluster in HA mode, you should configure
for every cluster a specific recovery.zookeeper.path.root in your
configuration. This will define the root path in ZooKeeper under which the
meta checkpoint state handles and the job handles are stored. If you do
Till mentioned the fact that 'spilling on disk' was managed through
exception catch. The last serialization error was related to bad management
of Kryo buffer that was not cleaned after spilling on exception management.
Is it possible we are dealing with an issue similar to this but caused by
anoth
24 matches
Mail list logo