Re: Flink parquet read.write performance

2017-08-18 Thread Aljoscha Krettek
Hi, The Sink cannot be chained to the previous two operations because there are two operations. Chaining only works if there is one predecessor operation. Data transfer should still be pipelined but you will see serialisation overhead. What kind of TypeSerializer is used at that boundary? Best

Re: Question about parallelism

2017-08-18 Thread Jerry Peng
I guess my previous question is also asking if the parallelism is set for the operator or "data stream". Is there implied repartitioning when the parallelism changes? On Fri, Aug 18, 2017 at 2:08 PM, Jerry Peng wrote: > Hello all, > > I have a question about parallelism and partitioning in the >

Question about parallelism

2017-08-18 Thread Jerry Peng
Hello all, I have a question about parallelism and partitioning in the DataStreams API. In Flink, a user can the parallelism of a data source as well as operators. So when I set the parallelism of a data source e.g. DataStream text = env.readTextFile(params.get("input")).setParallelism(5) does

Re: Efficient grouping and parallelism on skewed data

2017-08-18 Thread Jakes John
Thanks for your reply. I don't have any special aggregation. My only requirement is, for every message in kafka with a particular id, write into a corresponding index in Elasticsearch.( I might need to enrich each message before writing into ES, but there are no aggregations on incoming stream)

PageRank - 4x slower then Spark?!

2017-08-18 Thread Kaepke, Marc
Hi everyone, I compared Flink and Spark by using PageRank. I guessed Flink will beat Spark or have the same level. But Spark is up to 4x faster then Flink. I hope I did a mistake. So please help me to improve the performance of my cluster and config. The cluster has 4 computers: One JobManager

Re: akka timeout

2017-08-18 Thread Steven Wu
Till/Chesnay, thanks for the answers. Look like this is a result/symptom of underline stability issue that I am trying to track down. It is Flink 1.2. On Fri, Aug 18, 2017 at 12:24 AM, Chesnay Schepler wrote: > The MetricFetcher always use the default akka timeout value. > > > On 18.08.2017 09:

Re: Flink parquet read.write performance

2017-08-18 Thread Aljoscha Krettek
Hi Billy, Do you also have the data (picture) from the "Timeline" tab of the completed job? This would give some hints about how long that other DataSource (with chain) was active. It might be that the sink is waiting for the other input to become online. Best, Aljoscha > On 18. Aug 2017, at

Measure Job Execution Time

2017-08-18 Thread Paolo Cristofanelli
I would like to measure exactly the job execution time when executing a job in standalone mode. The result provided by the web interface is not accurate enough. I have seen the answer provided at this link https://stackoverflow.com/questions/34243365/measure-job-execution-time-in-flink , but it see

Re: Slack invitation

2017-08-18 Thread Aljoscha Krettek
Hi Billy, I'm afraid there is no Slack for the Flink community. Best, Aljoscha > On 18. Aug 2017, at 14:55, Newport, Billy wrote: > > Email address bi...@billynewport.com > > Thanks

Re: Kafka 0.11

2017-08-18 Thread Piotr Nowojski
Hi, Yes, Flink Connector for Kafka 0.10 should work without problems with Kafka 0.11. There is also a pending work for a Kafka 0.11 connector that will add support for exactly-once semantic. Piotrek > On Aug 18, 2017, at 5:21 PM, Gabriele Di Bernardo > wrote: > > Hi guys, > > Is the F

Kafka 0.11

2017-08-18 Thread Gabriele Di Bernardo
Hi guys, Is the Flink Connector Kafka 0.10 fully compatible with Kafka 0.11? Thank you in advance. Best, Gabriele

Re: Flink CEP questions

2017-08-18 Thread Basanth Gowda
Thank you very much Biplob and David Thanks David for those links . That is exactly what I was looking for. On Fri, Aug 18, 2017 at 5:16 AM, Dawid Wysakowicz < wysakowicz.da...@gmail.com> wrote: > Hi Basanth, > > Ad.3 Unfortunately right now, you cannot reset, but there is ongoing work > to i

Re: Great number of jobs and numberOfBuffers

2017-08-18 Thread Nico Kruber
Hi Gwenhael, the effect you describe sounds a bit strange. Just to clarify your setup: 1) Is the loop you were posting part of the application you run on yarn? 2) How many nodes are you running with? 3) What is the error you got when you tried to run the full program without splitting it? 4) can

Slack invitation

2017-08-18 Thread Newport, Billy
Email address bi...@billynewport.com Thanks

Test, plz ignore

2017-08-18 Thread Newport, Billy

Re:Re:Re:Re:Re:Re:How to verify the data to Elasticsearch whether correct or not ?

2017-08-18 Thread mingleizhang
Hello , Gordon Thank you very much. And I do save the extra redundant the code. But I have to write the following code as I use a spring boot tech to get the data from elasticsearch. As refers to other codes can delete. If I do not write the json.put("json", "JsonFormat.printToString(element)"

Re: Avro Serialization and RocksDB Internal State

2017-08-18 Thread Biplob Biswas
Thanks a lot Gordon, that really helps a lot. :) One last thing, is there any way to verify that an object has been serialized with a specific serializer? except trying to deserialize with a different deserializer and failing? -- View this message in context: http://apache-flink-user-mailing-l

Re: Avro Serialization and RocksDB Internal State

2017-08-18 Thread Tzu-Li (Gordon) Tai
Hi Biplob, Yes, your assumptions are correct [1]. To be a bit more exact, the `AvroSerializer` will be used to serialize your POJO data types. That would be the case for data transfers and state serialization (unless for state serialization you specify a custom state serializer; see [2]) If not

Re: Flink CEP questions

2017-08-18 Thread Dawid Wysakowicz
Hi Basanth, Ad.3 Unfortunately right now, you cannot reset, but there is ongoing work to introduce AfterMatchSkipStrategies(https://issues.apache.org/jira/browse/FLINK-7169?filter=12339990). This will allow the behaviour you described with the SKIP_PAST_LAST strategy. Ad.4 If I understand corr

Re: Avro Serialization and RocksDB Internal State

2017-08-18 Thread Biplob Biswas
Can anyone please shed some light on this? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Avro-Serialization-and-RocksDB-Internal-State-tp14912p15002.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble

Re: Flink CEP questions

2017-08-18 Thread Biplob Biswas
Hi Basanth, AFAIK, CEP works like sessions window and a session is started for each event which comes in and expires at the end of the time limit. Technically the count is kept separately for each event, so there's no reset. For ex, if you have 6 events, 1,2,3,4,5,6 (and they arrive in order lik

Re: akka timeout

2017-08-18 Thread Chesnay Schepler
The MetricFetcher always use the default akka timeout value. On 18.08.2017 09:07, Till Rohrmann wrote: Hi Steven, I thought that the MetricFetcher picks up the right timeout from the configuration. Which version of Flink are you using? The timeout is not a critical problem for the job health

Re: akka timeout

2017-08-18 Thread Till Rohrmann
Hi Steven, I thought that the MetricFetcher picks up the right timeout from the configuration. Which version of Flink are you using? The timeout is not a critical problem for the job health. Cheers, Till On Fri, Aug 18, 2017 at 7:22 AM, Steven Wu wrote: > > We have set akka.ask.timeout to 60