date:20150629

RE: Is there Any api that let DataStream join DataSet ?

2015-06-29 Thread 马国维

hi every one:thanks a lot for all you help. In my case , there is a data stream and a huge data set. Each element in the data stream wants to join the huge data set to produce a new data stream. But it can’t use the join method like the shuffle method or the broadcast method because of

Re: Storm Compatibility Improvement

2015-06-29 Thread Matthias J. Sax

That would also work. I thought about it already, too. Thanks for the feedback. If two people have similar idea, it might be the right way to got. I will just include all this stuff and open an PR. Than we can evaluate it again. -Matthias On 06/30/2015 12:01 AM, Gyula Fóra wrote: > By declare I m

Re: Storm Compatibility Improvement

2015-06-29 Thread Gyula Fóra

By declare I mean we assume a Flink Tuple datatype and the user declares the name mapping (sorry its getting late). Gyula Fóra ezt írta (időpont: 2015. jún. 29., H, 23:57): > Ah ok, now I get what I didn't get before :) > > So you want to take some input stream , and execute a bolt implementatio

Re: Storm Compatibility Improvement

2015-06-29 Thread Gyula Fóra

Ah ok, now I get what I didn't get before :) So you want to take some input stream , and execute a bolt implementation on it. And the question is what input type to assume when the user wants to use field name based access. Can't we force the user to declare the names of the inputs/outputs even i

Re: Storm Compatibility Improvement

2015-06-29 Thread Matthias J. Sax

Well. If a whole Storm topology is executed, this is of course the way to got. However, I want to have named-attribute access in the case of an embedded bolt (as a single operator) in a Flink program. And is this case, fields are not declared and do not have a name (eg, if the bolt's consumers emit

Re: Storm Compatibility Improvement

2015-06-29 Thread Gyula Fóra

Hey, I didn't look through the whole code so I probably don't get something but why don't you just do what storm does? Keep a map from the field names to indexes somewhere (make this accessible from the tuple) and then you can just use a simple Flink tuple. I think this is what's happening in stor

[Runtime] Division by Zero Exception

2015-06-29 Thread Andra Lungu

>From the same series of experiments: I am basically running an algorithm that simulates a Gather Sum Apply Iteration that performs Traingle Count (Why simulate it? Because you just need a superstep -> useless overhead if you use the runGatherSumApply function in Graph). What happens, at a high le

Re: Monitoring a Flink Job

2015-06-29 Thread Vasiliki Kalavri

Andra, why don't you simply print to standard output and gather your metrics from the taskmanagers' log files after execution? Wouldn't that work for you? -V. On 29 June 2015 at 22:36, Andra Lungu wrote: > Caution! I am getting philosophical. Stop me if I'm talking nonsense! > > You are sugges

Re: Monitoring a Flink Job

2015-06-29 Thread Andra Lungu

Caution! I am getting philosophical. Stop me if I'm talking nonsense! You are suggesting a list that will have one or two entries per vertex = (approx) billions. Won't this over-saturate my memory? I am already filling it with lots of junk resulted from the computation... On Mon, Jun 29, 2015 at

Storm Compatibility Improvement

2015-06-29 Thread Matthias J. Sax

Hi, I started to work on a missing feature for the Storm compatibility layer: named attribute access In Storm, each attribute of an input tuple can be accessed via index or by name. Currently, only index access is supported. In order to support this feature in Flink (embedded Bolt in Flink progra

Replacing Checkpointed interface with field annotations

2015-06-29 Thread Gyula Fóra

Hey all! Just to add something new to the end of the discussion list. After some discussion with Seif, and Paris, I have added a commit that replaces the use of the Checkpointed interface with field annotations. This is probably the most lightweight state declaration so far and it will probably w

[jira] [Created] (FLINK-2291) Use ZooKeeper to elect JobManager leader and send information to TaskManagers

2015-06-29 Thread Till Rohrmann (JIRA)

Till Rohrmann created FLINK-2291: Summary: Use ZooKeeper to elect JobManager leader and send information to TaskManagers Key: FLINK-2291 URL: https://issues.apache.org/jira/browse/FLINK-2291 Project:

[jira] [Created] (FLINK-2290) CoRecordReader Does Not Read Events From Both Inputs When No Elements Arrive

2015-06-29 Thread Aljoscha Krettek (JIRA)

Aljoscha Krettek created FLINK-2290: --- Summary: CoRecordReader Does Not Read Events From Both Inputs When No Elements Arrive Key: FLINK-2290 URL: https://issues.apache.org/jira/browse/FLINK-2290 Proj

Re: Monitoring a Flink Job

2015-06-29 Thread Fabian Hueske

Have you tried to use a custom accumulator that just appends to a list? 2015-06-29 12:59 GMT+02:00 Andra Lungu : > Hey Fabian, > > I am aware of the way open, preSuperstep(), postSuperstep() etc can help me > within an interation, unfortunately I am writing my own method here. I > could try to br

[DISCUSS / VOTE] Signal name to "kill" streaming jobs

2015-06-29 Thread Matthias J. Sax

Hi, I am working on https://issues.apache.org/jira/browse/FLINK-2111 Stephan and I had a discussion about the name of the signal. See: https://github.com/apache/flink/pull/750 Because we cannot agree on either "terminate" or "stop" we would appreciate some feedback about it. If anybody has an th

Re: Monitoring a Flink Job

2015-06-29 Thread Andra Lungu

Hey Fabian, I am aware of the way open, preSuperstep(), postSuperstep() etc can help me within an interation, unfortunately I am writing my own method here. I could try to briefly describe it: public static final class PropagateNeighborValues implements NeighborsFunctionWithVertexValue(...) {

Re: [flink-ml] How to use ParameterMap in predict method?

2015-06-29 Thread Till Rohrmann

Hi Chiwan, at the moment the single element PredictOperation only supports non-distributed models. This means that it expects the model to be a single element DataSet which can be broadcasted to the predict mappers. If you need more flexibility, you can either extend the PredictOperation interfac

Re: Documentation not reachable

2015-06-29 Thread Robert Metzger

As you can see from status.apache.org, the host is currently down. I hope the fix the issue soon. On Mon, Jun 29, 2015 at 11:50 AM, Sachin Goel wrote: > I've been experiencing the same problem. > > Regards > Sachin Goel > > On Mon, Jun 29, 2015 at 3:09 PM, Felix Neutatz > wrote: > > > Hi, > > >

FYI: Flink documentation currently offline Fwd: [ NOTICE ] Services Unavailable

2015-06-29 Thread Robert Metzger

Hi, it seems that the Flink documentation is currently not available due to security issues with the ci.apache.org host. -- Forwarded message -- From: Tony Stevenson Date: Mon, Jun 29, 2015 at 11:57 AM Subject: [ NOTICE ] Services Unavailable To: committ...@apache.org, infrastru

Re: [flink-ml] How to use ParameterMap in predict method?

2015-06-29 Thread Chiwan Park

Thank you Till. I have another question. Can I use a DataSet object as Model? In KNN, we need to DataSet given in fit operation. But when I defined Model generic parameter to DataSet in PredictOperation, the getModel method’s return type is DataSet[DataSet]. I’m confused with this situation. If

Re: Documentation not reachable

2015-06-29 Thread Sachin Goel

I've been experiencing the same problem. Regards Sachin Goel On Mon, Jun 29, 2015 at 3:09 PM, Felix Neutatz wrote: > Hi, > > when I want to access the documentation on the website I get the following > error: > > http://ci.apache.org/projects/flink/flink-docs-release-0.9 > > Service Unavailable

Re: Is there Any api that let DataStream join DataSet ?

2015-06-29 Thread Gyula Fóra

You are right, one cannot use the current window-join implementation to this. A workaround is to implement your custom binary stream operator that will wait until it receives the whole file, then starts joining. For instance a filestream.connect(streamToJoinWith).flatMap( CustomCoFlatMap that does

Re: Is there Any api that let DataStream join DataSet ?

2015-06-29 Thread Matthias J. Sax

I am wondering what the semantics of a DataStream created from a file is. It should be a regular (but finite) stream. From my understanding, a Window-Join is defined with some ts-constraint. So the static file part will also have this restriction in the join, right? However, a file-stream-join shou

Documentation not reachable

2015-06-29 Thread Felix Neutatz

Hi, when I want to access the documentation on the website I get the following error: http://ci.apache.org/projects/flink/flink-docs-release-0.9 Service Unavailable The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.

Re: Off-by-one issues in the windowing code

2015-06-29 Thread Gyula Fóra

The second issue is related to parallel time based aggregations. I think we should fix this for 0.9.1. Also since the fix as you said is rather straight-forward there is no harm doing it. As I understand if we keep the functionality of having time based global windows, the implementations for merg

Off-by-one issues in the windowing code

2015-06-29 Thread Márton Balassi

I have found two off-by-one issues in the windowing code. The first may result in duplicate data in the last window and is easy to fix. [1] The second may result data being swallowed in the last window, and is also not difficult to fix. [2] I've talked to Aljoscha about fixing the second one, an

Re: Is there Any api that let DataStream join DataSet ?

2015-06-29 Thread Stephan Ewen

If you only want to "join" a finite data set (like a file) to a stream, you can do that. you can create a DataStream from a (distributed) file. If you want specific batch-api operations, this is still on the roadmap, not in yet, as Marton said. On Sun, Jun 28, 2015 at 10:45 AM, Márton Balassi wr

[jira] [Created] (FLINK-2289) Make JobManager highly available

2015-06-29 Thread Till Rohrmann (JIRA)

Till Rohrmann created FLINK-2289: Summary: Make JobManager highly available Key: FLINK-2289 URL: https://issues.apache.org/jira/browse/FLINK-2289 Project: Flink Issue Type: Improvement

Re: Thoughts About Streaming

2015-06-29 Thread Stephan Ewen

@Matthias: I think using the KeyedDataStream will simply result in smaller programs. May be hard for some users to make the connection to a 1-element-tumbling-window, simply because they want to use state. Not everyone is a deep into that stuff as you are ;-) On Sun, Jun 28, 2015 at 1:13 AM, Mat

[jira] [Created] (FLINK-2288) Setup ZooKeeper for distributed coordination

2015-06-29 Thread Ufuk Celebi (JIRA)

Ufuk Celebi created FLINK-2288: -- Summary: Setup ZooKeeper for distributed coordination Key: FLINK-2288 URL: https://issues.apache.org/jira/browse/FLINK-2288 Project: Flink Issue Type: Sub-task

[jira] [Created] (FLINK-2287) Implement JobManager high availability

2015-06-29 Thread Ufuk Celebi (JIRA)

Ufuk Celebi created FLINK-2287: -- Summary: Implement JobManager high availability Key: FLINK-2287 URL: https://issues.apache.org/jira/browse/FLINK-2287 Project: Flink Issue Type: Improvement

[jira] [Created] (FLINK-2286) Window ParallelMerge sometimes swallows elements of the last window

2015-06-29 Thread JIRA

Márton Balassi created FLINK-2286: - Summary: Window ParallelMerge sometimes swallows elements of the last window Key: FLINK-2286 URL: https://issues.apache.org/jira/browse/FLINK-2286 Project: Flink

[jira] [Created] (FLINK-2285) Active policy emits elements of the last window twice

2015-06-29 Thread JIRA

Márton Balassi created FLINK-2285: - Summary: Active policy emits elements of the last window twice Key: FLINK-2285 URL: https://issues.apache.org/jira/browse/FLINK-2285 Project: Flink Issue T

[jira] [Created] (FLINK-2284) Confusing/inconsistent PartitioningStrategy

2015-06-29 Thread Stephan Ewen (JIRA)

Stephan Ewen created FLINK-2284: --- Summary: Confusing/inconsistent PartitioningStrategy Key: FLINK-2284 URL: https://issues.apache.org/jira/browse/FLINK-2284 Project: Flink Issue Type: Bug

Re: Monitoring a Flink Job

2015-06-29 Thread Fabian Hueske

You can measure the time of each iteration in the open() methods operators within an iteration. open() will be called before each iteration. The times can be collected by either printing to std out (you need to collect the files then...) or by implementing a list accumulator. Each time should inclu

Re: FLINK-2066

2015-06-29 Thread Nuno Santos

Thank you all for the help, it is appreciated! 2015-06-29 9:12 GMT+01:00 Stephan Ewen : > Hi Nuno! > > Ultimately, the delay is a property of the ExecutionGraph. The > ExecutionGraph is the data structure on the master (JobManager) that tracks > the distributed execution. > > The property is set

Re: FLINK-2066

2015-06-29 Thread Stephan Ewen

Hi Nuno! Ultimately, the delay is a property of the ExecutionGraph. The ExecutionGraph is the data structure on the master (JobManager) that tracks the distributed execution. The property is set in the method "JobManager.submitJob(...)", which is called when a new job is sent to the cluster. Hop

Re: Monitoring a Flink Job

2015-06-29 Thread Flavio Pompermaier

Why don't you use Flink dataset output functions (like writeAsText, writeAsCsv, etc..)? Or if they are not sufficient you can implement/override your own InputFormat. >From what is my experience static variables are evil in distributed environments.. Moreover, one of the main strengths of Flink ar

Re: FLINK-2066

2015-06-29 Thread Till Rohrmann

Done On Mon, Jun 29, 2015 at 9:33 AM, Chiwan Park wrote: > We should assign FLINK-2066 to Nuno. :) > > Regards, > Chiwan Park > > > On Jun 29, 2015, at 1:21 PM, Márton Balassi > wrote: > > > > Hey, > > > > Thanks for picking up the issue. This value can be specified as > > "execution-retries.de

Re: [flink-ml] How to use ParameterMap in predict method?

2015-06-29 Thread Till Rohrmann

Hi Chiwan, when you use the single element predict operation, you always have to implement the `getModel` method. There you have access to the resulting parameters and even to the instance to which the `PredictOperation` belongs. Within in this `getModel` method you can initialize all the informat

Re: FLINK-2066

2015-06-29 Thread Chiwan Park

We should assign FLINK-2066 to Nuno. :) Regards, Chiwan Park > On Jun 29, 2015, at 1:21 PM, Márton Balassi wrote: > > Hey, > > Thanks for picking up the issue. This value can be specified as > "execution-retries.delay" in the flink-conf.yaml. Hence you can check the > associated value in the C

41 matches

Mail list logo