Stephan Ewen created FLINK-2333:
---
Summary: Stream Data Sink that periodically rolls files
Key: FLINK-2333
URL: https://issues.apache.org/jira/browse/FLINK-2333
Project: Flink
Issue Type: New F
David Heller created FLINK-2334:
---
Summary: IOException: Channel to path could not be opened
Key: FLINK-2334
URL: https://issues.apache.org/jira/browse/FLINK-2334
Project: Flink
Issue Type: Bug
Gyula Fora created FLINK-2335:
-
Summary: Rework iteration construction in StreamGraph
Key: FLINK-2335
URL: https://issues.apache.org/jira/browse/FLINK-2335
Project: Flink
Issue Type: Improvement
William Saar created FLINK-2336:
---
Summary: ArrayIndexOufOBoundsException in TypeExtractor when
mapping
Key: FLINK-2336
URL: https://issues.apache.org/jira/browse/FLINK-2336
Project: Flink
Issu
Matthias J. Sax created FLINK-2337:
--
Summary: Multiple SLF4J bindings using Storm compatibility layer
Key: FLINK-2337
URL: https://issues.apache.org/jira/browse/FLINK-2337
Project: Flink
Iss
Matthias J. Sax created FLINK-2338:
--
Summary: Shut down "Storm Topologies" clenaly
Key: FLINK-2338
URL: https://issues.apache.org/jira/browse/FLINK-2338
Project: Flink
Issue Type: Improvemen
hi, everyoneThe doc say Flink Streaming use "Barriers" to ensure "exactly
once."Does the DataSet job use the same mechanism to ensue "exactly once" if a
map task is failed?thanks
No, it doesn't; periodic snapshots are not needed in DataSet programs, as
DataSets are of finite size and failed partitions can be replayed
completely.
On Thu, Jul 9, 2015 at 2:43 PM, 马国维 wrote:
> hi, everyoneThe doc say Flink Streaming use "Barriers" to ensure
> "exactly once."Does the DataSe
As Kostas mentioned the failure mechanisms for streaming and batch
processing are different, but you can expect exactly once processing
guarantees from both of them.
On Thu, Jul 9, 2015 at 2:43 PM, 马国维 wrote:
> hi, everyoneThe doc say Flink Streaming use "Barriers" to ensure
> "exactly once."
DataExchangeMode is Piped
If Two operators use Piped Mode to exchange the data , Failed partitions have
already send some data to the receiver before it failed.So Does Replaying all
the failed partitions cause some duplicate records ?
> Date: Thu, 9 Jul 2015 14:47:29 +0200
> Subject: Re: Doe
Currently, Flink restarts the entire job upon failure.
There is WIP that restricts this to all tasks involved in the pipeline of
the failed task.
Let's say we have pipelined MapReduce. If a mapper fails, the reducers that
have received some data already have to be restarted as well.
In that case
Stephan Ewen created FLINK-2339:
---
Summary: Prevent asynchronous checkpoint calls from overtaking
each other
Key: FLINK-2339
URL: https://issues.apache.org/jira/browse/FLINK-2339
Project: Flink
DataSet result = in.rebalance()
.map(new Mapper());In the case does the 'map'
receive all the data then begin to worker?Will rebalance operator failed cause
some duplicate record if the above answer is false ?
> Date: Thu, 9 Jul 2015 15:40:18 +0200
> Subject: Re: Does
In pipelined execution, the mapper will start once it receives data. In
batch-only execution, the mapper will start once it received all data. In
either case, there won't be any duplicate records. If an error occurs, the
entire job will be restarted.
As Stephan mentioned, we will soon have a per-t
Any operator in a batch job will receive all of its elements in one
complete successful run.
The mapper starts its work immediately. On a failure, a fresh mapper is
used, and all of the data is replayed. You can think of it as if there was
only a single checkpoint at the very beginning (before any
thank you!I see.
thank all of you.
发自我的 iPhone
> 在 2015年7月9日,下午10:48,Stephan Ewen 写道:
>
> Any operator in a batch job will receive all of its elements in one
> complete successful run.
>
> The mapper starts its work immediately. On a failure, a fresh mapper is
> used, and all of the data is re
Maximilian Michels created FLINK-2340:
-
Summary: Provide standalone mode for web interface of the
JobManager
Key: FLINK-2340
URL: https://issues.apache.org/jira/browse/FLINK-2340
Project: Flink
BTW: Duplicates on interaction with the outside world cannot be avoided in
the general case. If any program (batch or streaming) inserts data into
some outside system (for example a database), then that outside system
needs to cooperate to prevent duplicates.
- Either, the system could eliminate d
Stephan Ewen created FLINK-2341:
---
Summary: Deadlock in SpilledSubpartitionViewAsyncIO
Key: FLINK-2341
URL: https://issues.apache.org/jira/browse/FLINK-2341
Project: Flink
Issue Type: Bug
The problem is that I am not able to incorporate this into a Flink
iteration since the MultipleLinearRegression() already contains an
iteration. Therefore this would be a nested iteration ;-)
So at the moment I just use a for loop like this:
data: DataSet[model_ID, DataVector]
for (current_model
It's perfectly fine. That way you will construct a data flow graph which
will calculate the linear model for all different groups once you define an
output and trigger the execution.
On Thu, Jul 9, 2015 at 6:05 PM, Felix Neutatz
wrote:
> The problem is that I am not able to incorporate this into
Hi,
I want to use t-digest by Ted Dunning (
https://github.com/tdunning/t-digest/blob/master/src/main/java/com/tdunning/math/stats/ArrayDigest.java)
on Flink.
Locally that works perfectly. But on the cluster I get the following error:
java.lang.Exception: Call to registerInputOutput() of invokab
Hi Stephan,
thanks for the input :).
I have some further questions on how the data is segmented before/when
it is moved over the network:
1. What does the number of ResultSubpartition instances in the
ResultPartition correspond to? Is one assigned to each consuming task?
If so, how can I fi
23 matches
Mail list logo