date:20160607

Re: Reading whole files (from S3)

2016-06-07 Thread Andrea Cisternino

Jamie, Suneel thanks a lot, your replies have been very helpful. I will definitely take a look at XMLInputFormat. In any case the files are not very big: on average 100-200kB up to a max of a couple of MB. On 8 June 2016 at 04:23, Suneel Marthi wrote: > You can use Mahout XMLInputFormat with

Dedicated ExecutionContext inside Flink

2016-06-07 Thread Soumya Simanta

What is the recommended practice for using a dedicated ExecutionContexts inside Flink code? We are making some external network calls using Futures. Currently all of them are running on the global execution context (import scala.concurrent.ExecutionContext.Implicits.global). Thanks -Soumya

Uploaded jar disappears when web monitor restarts

2016-06-07 Thread Emanuele Cesena

Hi, When the web monitor restarts the uploaded jars disappear — in fact, every time it restarts the upload directory is different. This seems intentional: https://github.com/apache/flink/blob/master/flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/WebRuntimeMonitor.java#L162

Re: Reading whole files (from S3)

2016-06-07 Thread Suneel Marthi

You can use Mahout XMLInputFormat with Flink - HAdoopInputFormat definitions. See http://stackoverflow.com/questions/29429428/xmlinputformat-for-apache-flink http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Read-XML-from-HDFS-td7023.html On Tue, Jun 7, 2016 at 10:11 PM, Jamie Grie

Re: Reading whole files (from S3)

2016-06-07 Thread Jamie Grier

Hi Andrea, How large are these data files? The implementation you've mentioned here is only usable if they are very small. If so, you're fine. If not read on... Processing XML input files in parallel is tricky. It's not a great format for this type of processing as you've seen. They are tric

Re: Does Flink allows for encapsulation of transformations?

2016-06-07 Thread Ser Kho

Chesnay:Just want to thank you. I might have one or two related questions later on, but now just thanks. On Tuesday, June 7, 2016 8:18 AM, Greg Hogan wrote: "The question is how to encapsulate numerous transformations into one object or may be a function in Apache Flink Java setting

Re: Multi-field "sum" function just like "keyBy"

2016-06-07 Thread Jamie Grier

I'm assuming what you're trying to do is essentially sum over two different fields of your data. I would do this with my own ReduceFunction. stream .keyBy("someKey") .reduce(CustomReduceFunction) // sum whatever fields you want and return the result I think it does make sense that Flink cou

Re: Event timestamps after data transformations

2016-06-07 Thread Jamie Grier

You can handle this multiple ways.. If there is a natural timestamp in StreamB you can just use it very naturally by doing this: streamB .assignTimestamps(...) // your assigner .connect(streamA) .flatMap(...) // your CoFlatMapFunction .timeWindow(...) .whatever() Here the event t

Re: Hourly top-k statistics of DataStream

2016-06-07 Thread Jamie Grier

Suggestions in-line below... On Mon, Jun 6, 2016 at 7:26 PM, Yukun Guo wrote: > Hi, > > I'm working on a project which uses Flink to compute hourly log statistics > like top-K. The logs are fetched from Kafka by a FlinkKafkaProducer and > packed > into a DataStream. > > The problem is, I find th

Event timestamps after data transformations

2016-06-07 Thread Chris Wildman

I have a question around event timestamps after a flatMap transformation, I am using the event time time characteristic. I have two streams entering a CoFlatMap. Stream A simply updates state in that CoFlatMap and does not output any events. Stream B inserts events of type B which then output a thi

Re: Submit Flink Jobs to YARN running on AWS

2016-06-07 Thread Ashutosh Kumar

If you use open vpn for accessing aws then you can use private IP of ec2 machine from your laptop. Thanks Ashutosh On Tue, Jun 7, 2016 at 11:00 PM, Shannon Carey wrote: > We're also starting to look at automating job deployment/start to Flink > running on EMR. There are a few options: > >-

Re: Window start and end issue with TumblingProcessingTimeWindows

2016-06-07 Thread Soumya Simanta

Thanks for the clarification. On Tue, Jun 7, 2016 at 9:15 PM, Aljoscha Krettek wrote: > Hi, > I'm afraid you're running into a bug into the special processing-time > window operator. A suggested workaround would be to switch to > characteristic IngestionTime and use TumblingEventTimeWindows. > >

Re: Window start and end issue with TumblingProcessingTimeWindows

2016-06-07 Thread Aljoscha Krettek

Hi, I'm afraid you're running into a bug into the special processing-time window operator. A suggested workaround would be to switch to characteristic IngestionTime and use TumblingEventTimeWindows. I also open a Jira issue for the bug so that we can keep track of it: https://issues.apache.org/jir

Re: Kafka producer sink message loss?

2016-06-07 Thread Elias Levy

On Tue, Jun 7, 2016 at 4:52 AM, Stephan Ewen wrote: > The concern you raised about the sink being synchronous is exactly what my > last suggestion should address: > > The internal state backends can return a handle that can do the sync in a > background thread. The sink would continue processing

Re: Multi-field "sum" function just like "keyBy"

2016-06-07 Thread Gábor Gévay

Ah, sorry, you are right. You could also call keyBy again before the second sum, but maybe someone else has a better idea. Best, Gábor 2016-06-07 16:18 GMT+02:00 Al-Isawi Rami : > Thanks Gábor, but the first sum call will return > > SingleOutputStreamOperator > > I could not do another sum call

Reading whole files (from S3)

2016-06-07 Thread Andrea Cisternino

Hi all, I am evaluating Apache Flink for processing large sets of Geospatial data. The use case I am working on will involve reading a certain number of GPX files stored on Amazon S3. GPX files are actually XML files and therefore cannot be read on a line by line basis. One GPX file will produce

Re: Multi-field "sum" function just like "keyBy"

2016-06-07 Thread Al-Isawi Rami

Thanks Gábor, but the first sum call will return SingleOutputStreamOperator I could not do another sum call on that. Would tell me how did you manage to do stream.sum().sum() Regards, -Rami On 7 Jun 2016, at 16:13, Gábor Gévay mailto:gga...@gmail.com>> wrote: Hello, In the case of "sum", yo

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-06-07 Thread Flavio Pompermaier

After a second look to KryoSerializer I fear that Input and Output are never closed..am I right? On Tue, Jun 7, 2016 at 3:06 PM, Flavio Pompermaier wrote: > Hi Aljoscha, > of course I can :) > Thanks for helping me..do you think it is the right thing to do calling > reset()? > Actually, I don't

Re: Multi-field "sum" function just like "keyBy"

2016-06-07 Thread Gábor Gévay

Hello, In the case of "sum", you can just specify them one after the other, like: stream.sum(1).sum(2) This works, because summing the two fields are independent. However, in the case of "keyBy", the information is needed from both fields at the same time to produce the key. Best, Gábor 2016

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-06-07 Thread Flavio Pompermaier

Hi Aljoscha, of course I can :) Thanks for helping me..do you think it is the right thing to do calling reset()? Actually, I don't know whether this is meaningful or not, but I already ran the job successfully once on the cluster (a second attempt is curerntly running) after my accidental modificat

Re: Window start and end issue with TumblingProcessingTimeWindows

2016-06-07 Thread Soumya Simanta

The problem is why is the window end time in the future ? For example if my window size is 60 seconds and my window is being evaluated at 3.00 pm then why is the window end time 3.01 pm and not 3.00 pm even when the data that is being evaluated falls in the window 2.59 - 3.00. Sent from my iP

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-06-07 Thread Aljoscha Krettek

That's nice. Can you try it on your cluster with an added "reset" call on the buffer? On Tue, 7 Jun 2016 at 14:35 Flavio Pompermaier wrote: > After "some" digging into this problem I'm quite convinced that the > problem is caused by a missing reset of the buffer during the Kryo > deserialization

Multi-field "sum" function just like "keyBy"

2016-06-07 Thread Al-Isawi Rami

Hi, Is there any reason why “keyBy" accepts multi-field, while for example “sum” does not. -Rami Disclaimer: This message and any attachments thereto are intended solely for the addressed recipient(s) and may contain confidential information. If you are not the intended recipient, please notif

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-06-07 Thread Flavio Pompermaier

After "some" digging into this problem I'm quite convinced that the problem is caused by a missing reset of the buffer during the Kryo deserialization, likewise to what has been fixed by FLINK-2800 ( https://github.com/apache/flink/pull/1308/files). That fix added an output.clear() in theKryoExcept

Re: Does Flink allows for encapsulation of transformations?

2016-06-07 Thread Greg Hogan

"The question is how to encapsulate numerous transformations into one object or may be a function in Apache Flink Java setting." Implement CustomUnaryOperation. This can then be applied to a DataSet by calling `DataSet result = DataSet.runOperation(new MyOperation<>(...));`. On Mon, Jun 6, 2016 a

Re: Does Flink allows for encapsulation of transformations?

2016-06-07 Thread Chesnay Schepler

1a. ah. yeah i see how it could work, but i wouldn't count on it in a cluster. you would (most likely) run the the sub-job (calculating pi) only on a single node. 1b. different execution environments generally imply different flink programs. 2. sure it does, since it's a normal flink job. yo

Re: Kafka producer sink message loss?

2016-06-07 Thread Stephan Ewen

Hi Elias! The concern you raised about the sink being synchronous is exactly what my last suggestion should address: The internal state backends can return a handle that can do the sync in a background thread. The sink would continue processing messages, and the checkpoint would only be acknowled

Re: Does Flink allows for encapsulation of transformations?

2016-06-07 Thread Ser Kho

Chesnay: 1a. The code actually works, that is the point. 1b. What restrict for a Flink program to have several execution environments?2. I am not sure that your modification allows for parallelism. Does it?3. This code is a simple example of writing/organizing large and complicated programs, wh

Re: Window start and end issue with TumblingProcessingTimeWindows

2016-06-07 Thread Chesnay Schepler

could you state a specific problem? On 07.06.2016 06:40, Soumya Simanta wrote: I've a simple program which takes some inputs from a command line (Socket stream) and then aggregates based on the key. When running this program on my local machine I see some output that is counter intuitive to m

Re: Does Flink allows for encapsulation of transformations?

2016-06-07 Thread Chesnay Schepler

from what i can tell from your code you are trying to execute a job within a job. This just doesn't work. your main method should look like this: |publicstaticvoidmain(String[]args)throwsException{doublepi =new classPI().compute();System.out.println("We estimate Pi to be: "+pi);}| On 06.0

Re: Logs show `Marking the coordinator 2147483637 dead` in Flink-Kafka conn

2016-06-07 Thread Sendoh

Hi Robert, Thank you for checking the issue. That INFO is the only information Flink workers say. I agree your point of view. Looks like it closes the connections to all other topics which is not used(idle) although it's a bit misleading. Ref: https://github.com/edenhill/librdkafka/issues/437

Re: Periodically evicting internal states when using mapWithState()

2016-06-07 Thread Aljoscha Krettek

Hi Jack, right now this is not possible except when writing a custom operator. We are working on support for a time-to-live setting on states, this should solve your problem. For writing a custom operator, check out DataStream.transform() and StreamMap, which is the operator implementation for Map

Re: Custom keyBy(), look for similaties

2016-06-07 Thread iñaki williams

Thanks for your answer Ufuk. However, I have been reading about KeySelector and I don't understand completely how it works with my idea. I am using an algorithm that gives me an score between some different strings. My idea is: if the score is higher than 0'80 for example, then those two strings

Re: Reading whole files (from S3)

Dedicated ExecutionContext inside Flink

Uploaded jar disappears when web monitor restarts

Re: Reading whole files (from S3)

Re: Reading whole files (from S3)

Re: Does Flink allows for encapsulation of transformations?

Re: Multi-field "sum" function just like "keyBy"

Re: Event timestamps after data transformations

Re: Hourly top-k statistics of DataStream

Event timestamps after data transformations

Re: Submit Flink Jobs to YARN running on AWS

Re: Window start and end issue with TumblingProcessingTimeWindows

Re: Window start and end issue with TumblingProcessingTimeWindows

Re: Kafka producer sink message loss?

Re: Multi-field "sum" function just like "keyBy"

Reading whole files (from S3)

Re: Multi-field "sum" function just like "keyBy"

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

Re: Multi-field "sum" function just like "keyBy"

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

Re: Window start and end issue with TumblingProcessingTimeWindows

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

Multi-field "sum" function just like "keyBy"

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

Re: Does Flink allows for encapsulation of transformations?

Re: Does Flink allows for encapsulation of transformations?

Re: Kafka producer sink message loss?

Re: Does Flink allows for encapsulation of transformations?

Re: Window start and end issue with TumblingProcessingTimeWindows

Re: Does Flink allows for encapsulation of transformations?

Re: Logs show `Marking the coordinator 2147483637 dead` in Flink-Kafka conn

Re: Periodically evicting internal states when using mapWithState()

Re: Custom keyBy(), look for similaties

33 matches

Site Navigation

Mail list logo

Footer information