Thank you!
On Wednesday, 29 June 2016, Aljoscha Krettek wrote:
> Hi,
> the result of splitting by key is that processing can easily be
> distributed among the workers because the windows for individual keys can
> be processed independently. This should improve cluster utilization.
>
> Cheers,
>
Thank you, I'll check these.
In 2.) you said they are likely to exchange through memory. Is there a case
why they wouldn't?
On Thu, Jun 30, 2016 at 5:03 AM, Ufuk Celebi wrote:
> On Thu, Jun 30, 2016 at 1:44 AM, Saliya Ekanayake
> wrote:
> > 1. What parameters are available to control paralleli
I’m trying to submit a stand-alone Flink job to a YARN cluster running on EMR
(Elastic MapReduce) nodes in AWS. When it tries to start a container for the
Job Manager, it fails. The error message from the container is below. The
command I’m using is:
$ flink run -m yarn-cluster -yn 1 -ynm test1
Hej,
I'm currently playing around with some machine learning algorithms in Flink
streaming.
I have an input stream that I partition by key and then do a map on each of
the keys, feeding a model and producing a prediction output. Periodically
each operator needs to send model updates to all other
Also, you're using the FsStateBackend, correct?
Reason I'm asking is that the problem should not occur for the RocksDB
state backend. There, we don't serialize any user code, only binary data. A
while back I wanted to change the FsStateBackend to also work like this.
Now might be a good time to ac
Yes that's the way to go at the moment.
Cheers,
Till
On Thu, Jun 30, 2016 at 12:47 PM, Márton Balassi
wrote:
> Hi Josh,
>
> Yes, currently that is a reasonable workaround.
>
> Best,
>
> Marton
>
> On Thu, Jun 30, 2016 at 12:38 PM, Josh wrote:
>
>> Hi Till,
>>
>> Thanks, that's very helpful!
>>
Hi Prabhu,
the rolling file sinks should not suffer from data loss. The reason is the
following: The checkpointed state, bucket state, contains the current file,
the offset and all pending file which are ready to be moved. Once a
checkpoint is completed, the notifyCheckpointComplete method is call
Hi Josh,
you could also try to replace your anonymous classes by explicit class
definitions. This should assign these classes a fixed name independent of
the other anonymous classes. Then the class loader should be able to
deserialize your serialized data.
Cheers,
Till
On Thu, Jun 30, 2016 at 1:
Hi,
are you taking about *enableFullyAsyncSnapshots()* in the RocksDB backend.
If not, there is this switch that is described in the JavaDoc:
/**
* Enables fully asynchronous snapshotting of the partitioned state held in
RocksDB.
*
* By default, this is disabled. This means that RocksDB state is c
Good to hear! :)
On Wed, Jun 29, 2016 at 12:08 PM, ANDREA SPINA
<74...@studenti.unimore.it> wrote:
> Hi,
>
> the problem was solved after I figured out there was an istance of Flink
> TaskManager running on a node of the cluster.
> Thank you,
> Andrea
>
> 2016-06-28 12:17 GMT+02:00 ANDREA SPINA <7
Hi Josh,
I think in your case the problem is that Scala might choose different names
for synthetic/generated classes. This will trip up the code that is trying
to restore from a snapshot that was done with an earlier version of the
code where classes where named differently.
I'm afraid I don't kno
Hi Josh,
You have to assign UIDs to all operators to change the topology. Plus,
you have to add dummy operators for all UIDs which you removed; this
is a limitation currently because Flink will attempt to find all UIDs
of the old job.
Cheers,
Max
On Wed, Jun 29, 2016 at 9:00 PM, Josh wrote:
> H
Hi Josh,
Yes, currently that is a reasonable workaround.
Best,
Marton
On Thu, Jun 30, 2016 at 12:38 PM, Josh wrote:
> Hi Till,
>
> Thanks, that's very helpful!
> So I guess in that case, since it isn't possible to increase the job
> parallelism later, it might be sensible to use say 10x the p
Hi Till,
Thanks, that's very helpful!
So I guess in that case, since it isn't possible to increase the job
parallelism later, it might be sensible to use say 10x the parallelism that
I need right now (even if only running on a couple of task managers) - so
that it's possible to scale the job in th
Hi Josh,
at the moment it is not possible to dynamically increase the parallelism of
your job. The same holds true for a restarting a job from a savepoint. But
we're currently working on exactly this. So in order to change the
parallelism of your job, you would have to restart the job from scratch
You are right, this is not very well-documented. You can do it like this:
env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE);
With this the operators don't wait for all barriers to align. Example
for checkpoint mode setting is here:
https://ci.apache.org/projects/flin
On Thu, Jun 30, 2016 at 1:44 AM, Saliya Ekanayake wrote:
> 1. What parameters are available to control parallelism within a node?
Task Manager processing slots:
https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html#configuring-taskmanager-processing-slots
> 2. Does Flink
Hi,
I think there is no way to get the output from these log statements into
the Yarn logs. The reason is that this code is only executed on the client
and not in any Yarn context/container. This code is setting up everything
for Yarn and then control is handed over so it is executed before the Job
18 matches
Mail list logo