hi every one:thanks a lot for all you help.
In my case , there is a data stream and a huge data set. Each element in the
data stream wants to join the huge data set to produce a new data stream.
But it can’t use the join method like the shuffle method or the broadcast
method because of
That would also work. I thought about it already, too. Thanks for the
feedback. If two people have similar idea, it might be the right way to
got. I will just include all this stuff and open an PR. Than we can
evaluate it again.
-Matthias
On 06/30/2015 12:01 AM, Gyula Fóra wrote:
> By declare I m
By declare I mean we assume a Flink Tuple datatype and the user declares
the name mapping (sorry its getting late).
Gyula Fóra ezt írta (időpont: 2015. jún. 29., H,
23:57):
> Ah ok, now I get what I didn't get before :)
>
> So you want to take some input stream , and execute a bolt implementatio
Ah ok, now I get what I didn't get before :)
So you want to take some input stream , and execute a bolt implementation
on it. And the question is what input type to assume when the user wants to
use field name based access.
Can't we force the user to declare the names of the inputs/outputs even i
Well. If a whole Storm topology is executed, this is of course the way
to got. However, I want to have named-attribute access in the case of an
embedded bolt (as a single operator) in a Flink program. And is this
case, fields are not declared and do not have a name (eg, if the bolt's
consumers emit
Hey,
I didn't look through the whole code so I probably don't get something but
why don't you just do what storm does? Keep a map from the field names to
indexes somewhere (make this accessible from the tuple) and then you can
just use a simple Flink tuple.
I think this is what's happening in stor
>From the same series of experiments:
I am basically running an algorithm that simulates a Gather Sum Apply
Iteration that performs Traingle Count (Why simulate it? Because you just
need a superstep -> useless overhead if you use the runGatherSumApply
function in Graph).
What happens, at a high le
Andra,
why don't you simply print to standard output and gather your metrics from
the taskmanagers' log files after execution?
Wouldn't that work for you?
-V.
On 29 June 2015 at 22:36, Andra Lungu wrote:
> Caution! I am getting philosophical. Stop me if I'm talking nonsense!
>
> You are sugges
Caution! I am getting philosophical. Stop me if I'm talking nonsense!
You are suggesting a list that will have one or two entries per vertex =
(approx) billions. Won't this over-saturate my memory? I am already filling
it with lots of junk resulted from the computation...
On Mon, Jun 29, 2015 at
Hi,
I started to work on a missing feature for the Storm compatibility
layer: named attribute access
In Storm, each attribute of an input tuple can be accessed via index or
by name. Currently, only index access is supported. In order to support
this feature in Flink (embedded Bolt in Flink progra
Hey all!
Just to add something new to the end of the discussion list. After some
discussion with Seif, and Paris, I have added a commit that replaces the
use of the Checkpointed interface with field annotations.
This is probably the most lightweight state declaration so far and it will
probably w
Till Rohrmann created FLINK-2291:
Summary: Use ZooKeeper to elect JobManager leader and send
information to TaskManagers
Key: FLINK-2291
URL: https://issues.apache.org/jira/browse/FLINK-2291
Project:
Aljoscha Krettek created FLINK-2290:
---
Summary: CoRecordReader Does Not Read Events From Both Inputs When
No Elements Arrive
Key: FLINK-2290
URL: https://issues.apache.org/jira/browse/FLINK-2290
Proj
Have you tried to use a custom accumulator that just appends to a list?
2015-06-29 12:59 GMT+02:00 Andra Lungu :
> Hey Fabian,
>
> I am aware of the way open, preSuperstep(), postSuperstep() etc can help me
> within an interation, unfortunately I am writing my own method here. I
> could try to br
Hi,
I am working on https://issues.apache.org/jira/browse/FLINK-2111
Stephan and I had a discussion about the name of the signal. See:
https://github.com/apache/flink/pull/750
Because we cannot agree on either "terminate" or "stop" we would
appreciate some feedback about it. If anybody has an th
Hey Fabian,
I am aware of the way open, preSuperstep(), postSuperstep() etc can help me
within an interation, unfortunately I am writing my own method here. I
could try to briefly describe it:
public static final class PropagateNeighborValues implements
NeighborsFunctionWithVertexValue(...) {
Hi Chiwan,
at the moment the single element PredictOperation only supports
non-distributed models. This means that it expects the model to be a single
element DataSet which can be broadcasted to the predict mappers.
If you need more flexibility, you can either extend the PredictOperation
interfac
As you can see from status.apache.org, the host is currently down. I hope
the fix the issue soon.
On Mon, Jun 29, 2015 at 11:50 AM, Sachin Goel
wrote:
> I've been experiencing the same problem.
>
> Regards
> Sachin Goel
>
> On Mon, Jun 29, 2015 at 3:09 PM, Felix Neutatz
> wrote:
>
> > Hi,
> >
>
Hi,
it seems that the Flink documentation is currently not available due to
security issues with the ci.apache.org host.
-- Forwarded message --
From: Tony Stevenson
Date: Mon, Jun 29, 2015 at 11:57 AM
Subject: [ NOTICE ] Services Unavailable
To: committ...@apache.org, infrastru
Thank you Till.
I have another question. Can I use a DataSet object as Model? In KNN, we need
to DataSet given in fit operation.
But when I defined Model generic parameter to DataSet in PredictOperation,
the getModel method’s return type is DataSet[DataSet]. I’m confused with this
situation.
If
I've been experiencing the same problem.
Regards
Sachin Goel
On Mon, Jun 29, 2015 at 3:09 PM, Felix Neutatz
wrote:
> Hi,
>
> when I want to access the documentation on the website I get the following
> error:
>
> http://ci.apache.org/projects/flink/flink-docs-release-0.9
>
> Service Unavailable
You are right, one cannot use the current window-join implementation to
this.
A workaround is to implement your custom binary stream operator that will
wait until it receives the whole file, then starts joining.
For instance a filestream.connect(streamToJoinWith).flatMap(
CustomCoFlatMap that does
I am wondering what the semantics of a DataStream created from a file
is. It should be a regular (but finite) stream. From my understanding, a
Window-Join is defined with some ts-constraint. So the static file part
will also have this restriction in the join, right? However, a
file-stream-join shou
Hi,
when I want to access the documentation on the website I get the following
error:
http://ci.apache.org/projects/flink/flink-docs-release-0.9
Service Unavailable
The server is temporarily unable to service your request due to maintenance
downtime or capacity problems. Please try again later.
The second issue is related to parallel time based aggregations. I think we
should fix this for 0.9.1.
Also since the fix as you said is rather straight-forward there is no harm
doing it. As I understand if we keep the functionality of having time based
global windows, the implementations for merg
I have found two off-by-one issues in the windowing code.
The first may result in duplicate data in the last window and is easy to
fix. [1]
The second may result data being swallowed in the last window, and is also
not difficult to fix. [2]
I've talked to Aljoscha about fixing the second one, an
If you only want to "join" a finite data set (like a file) to a stream, you
can do that. you can create a DataStream from a (distributed) file.
If you want specific batch-api operations, this is still on the roadmap,
not in yet, as Marton said.
On Sun, Jun 28, 2015 at 10:45 AM, Márton Balassi
wr
Till Rohrmann created FLINK-2289:
Summary: Make JobManager highly available
Key: FLINK-2289
URL: https://issues.apache.org/jira/browse/FLINK-2289
Project: Flink
Issue Type: Improvement
@Matthias:
I think using the KeyedDataStream will simply result in smaller programs.
May be hard for some users to make the connection to a
1-element-tumbling-window, simply because they want to use state. Not
everyone is a deep into that stuff as you are ;-)
On Sun, Jun 28, 2015 at 1:13 AM, Mat
Ufuk Celebi created FLINK-2288:
--
Summary: Setup ZooKeeper for distributed coordination
Key: FLINK-2288
URL: https://issues.apache.org/jira/browse/FLINK-2288
Project: Flink
Issue Type: Sub-task
Ufuk Celebi created FLINK-2287:
--
Summary: Implement JobManager high availability
Key: FLINK-2287
URL: https://issues.apache.org/jira/browse/FLINK-2287
Project: Flink
Issue Type: Improvement
Márton Balassi created FLINK-2286:
-
Summary: Window ParallelMerge sometimes swallows elements of the
last window
Key: FLINK-2286
URL: https://issues.apache.org/jira/browse/FLINK-2286
Project: Flink
Márton Balassi created FLINK-2285:
-
Summary: Active policy emits elements of the last window twice
Key: FLINK-2285
URL: https://issues.apache.org/jira/browse/FLINK-2285
Project: Flink
Issue T
Stephan Ewen created FLINK-2284:
---
Summary: Confusing/inconsistent PartitioningStrategy
Key: FLINK-2284
URL: https://issues.apache.org/jira/browse/FLINK-2284
Project: Flink
Issue Type: Bug
You can measure the time of each iteration in the open() methods operators
within an iteration. open() will be called before each iteration.
The times can be collected by either printing to std out (you need to
collect the files then...) or by implementing a list accumulator. Each time
should inclu
Thank you all for the help, it is appreciated!
2015-06-29 9:12 GMT+01:00 Stephan Ewen :
> Hi Nuno!
>
> Ultimately, the delay is a property of the ExecutionGraph. The
> ExecutionGraph is the data structure on the master (JobManager) that tracks
> the distributed execution.
>
> The property is set
Hi Nuno!
Ultimately, the delay is a property of the ExecutionGraph. The
ExecutionGraph is the data structure on the master (JobManager) that tracks
the distributed execution.
The property is set in the method "JobManager.submitJob(...)", which is
called when a new job is sent to the cluster.
Hop
Why don't you use Flink dataset output functions (like writeAsText,
writeAsCsv, etc..)?
Or if they are not sufficient you can implement/override your own
InputFormat.
>From what is my experience static variables are evil in distributed
environments..
Moreover, one of the main strengths of Flink ar
Done
On Mon, Jun 29, 2015 at 9:33 AM, Chiwan Park wrote:
> We should assign FLINK-2066 to Nuno. :)
>
> Regards,
> Chiwan Park
>
> > On Jun 29, 2015, at 1:21 PM, Márton Balassi
> wrote:
> >
> > Hey,
> >
> > Thanks for picking up the issue. This value can be specified as
> > "execution-retries.de
Hi Chiwan,
when you use the single element predict operation, you always have to
implement the `getModel` method. There you have access to the resulting
parameters and even to the instance to which the `PredictOperation`
belongs. Within in this `getModel` method you can initialize all the
informat
We should assign FLINK-2066 to Nuno. :)
Regards,
Chiwan Park
> On Jun 29, 2015, at 1:21 PM, Márton Balassi wrote:
>
> Hey,
>
> Thanks for picking up the issue. This value can be specified as
> "execution-retries.delay" in the flink-conf.yaml. Hence you can check the
> associated value in the C
41 matches
Mail list logo