Re: "Read size does not match expected size" error when using HyperLogLog

2016-04-14 Thread Hironori Ogibayashi
Aljoscha, Thanks for the fix. I tried recent master branch and it worked! Hironori 2016-04-14 23:03 GMT+09:00 Aljoscha Krettek : > Hi, > I'm afraid you found a bug. I created a Jira Issue: > https://issues.apache.org/jira/browse/FLINK-3760. I already have a fix and > hope we'll get it in the 1.0

Task Slots and Heterogeneous Tasks

2016-04-14 Thread Maxim
I'm trying to understand a behavior of Flink in case of heterogeneous operations. For example in our pipelines some operation might accumulate large windows while another performs high latency calls to external services. Obviously the former needs task slot with a large memory allocation, while the

Re: Flink performance pre-packaged vs. self-compiled

2016-04-14 Thread Robert Schmidtke
You're obviously right, the configs were different. In the downloaded version I had set off heap memory to true, whereas in the version I compiled myself this one-time change to flink-conf.yaml was overwritten by recompiling. I have fixed it now and performance is the same. For the record, I had 3

Accessing StateBackend snapshots outside of Flink

2016-04-14 Thread igor.berman
Hi, we are evaluating Flink for new solution and several people raised concern of coupling too much to Flink - 1. we understand that if we want to get full fault tolerance and best performance we'll need to use Flink managed state(probably RocksDB backend due to volume of state) 2. but then if we

WindowOperator initial watermark

2016-04-14 Thread Michael Radford
This is a really minor issue, but it confused me during testing. The WindowOperator initial watermark is -1: https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/operators/windowing/WindowOperator.java#L136 Whereas TimestampsAndPunctua

Re: printing datastream to the log - instead of stdout

2016-04-14 Thread Christian Kreutzfeldt
Hi Chen I haven't tried the following code out but it should be as easy as: stream.addSink(new SinkFunction() { private final Logger LOG = LogManager.getLogger("loggerName"); public void invoke(String value) throws Exception { Log.info(value); } }); It's JAVA but I think it suffice to geht the b

Re: Monitoring and alerting mechanisms for Flink on YARN

2016-04-14 Thread Christian Kreutzfeldt
Hi Soumya, we are using a StatsD / Graphite setup to extract metrics from our running Flink applications. At least for alerting and monitoring based on time series it works perfectly well. Just take a look at https://github.com/tim-group/java-statsd-client which is widely deployed in our source co

Re: Flink performance pre-packaged vs. self-compiled

2016-04-14 Thread Ovidiu-Cristian MARCU
Hi, Your assumption may be incorrect related to the TeraSort use case for eastcirclek's implementation. How many time did you run your program? It would be helpful to give more details about your experiment, in terms of configuration, dataset size. Best, Ovidiu > On 14 Apr 2016, at 17:14, Rob

Re: streaming join implementation

2016-04-14 Thread Andrew Coates
Extending on what Henry is asking... What if data can be more that a day late, or in a more streaming nature, what if updates can come through for previous values? This would obviously involve storing a great deal of state. The use case I'm thinking of has large large volumes per day. So an extern

printing datastream to the log - instead of stdout

2016-04-14 Thread Chen Bekor
hi, the .print() method will print a dataset / datastream to the stdout. how can I print the stream to the standard logger (logback/log4j)? I'm using flink scala - so scala example code is much appreciated. p.s - I noticed that there's a PrintFunction that I can implement but it feels like I'm

Re: streaming join implementation

2016-04-14 Thread Henry Cai
Cogroup is nice, thanks. But if I define a tumbling window of one day, does that mean flink needs to cache all the data for one day in memory? I have about 5TB of data coming for one day. About 50% records will find a matching records (the other 50% doesn't). On Thu, Apr 14, 2016 at 9:05 AM, A

Re: streaming join implementation

2016-04-14 Thread Aljoscha Krettek
Hi, right now, Flink does not give you a way to get the the records that where not joined for a join. You can, however use a co-group operation instead of a join to figure out which records did not join with records from the other side and treat them separately. Let me show an example: val input1

Re: Flink event processing immediate feedback

2016-04-14 Thread Igor Berman
Yes, indeed this is direction we are trying currently thanks On 14 April 2016 at 18:31, Aljoscha Krettek wrote: > Could still be, as I described it by using a message queue to do the > communication between Flink and the front end. > > On Thu, 14 Apr 2016 at 17:30 Igor Berman wrote: > >> Hi A

Re: Flink event processing immediate feedback

2016-04-14 Thread Aljoscha Krettek
Could still be, as I described it by using a message queue to do the communication between Flink and the front end. On Thu, 14 Apr 2016 at 17:30 Igor Berman wrote: > Hi Aljoscha, > thanks for the response > > Synchronous - in our case means that request by end-client to frontend(say > some REST

Re: Flink event processing immediate feedback

2016-04-14 Thread Igor Berman
Hi Aljoscha, thanks for the response Synchronous - in our case means that request by end-client to frontend(say some REST call) needs to wait until processing in backend(Flink) is done and should return response(e.g. alert) back to end-client(i.e. end-client -> frontend -> kafka-> flink) those req

Re: Flink performance pre-packaged vs. self-compiled

2016-04-14 Thread Robert Schmidtke
I have tried multiple Maven and Scala Versions, but to no avail. I can't seem to achieve performance of the downloaded archive. I am stumped by this and will need to do more experiments when I have more time. Robert On Thu, Apr 14, 2016 at 1:13 PM, Robert Schmidtke wrote: > Hi Robert, > > thank

Re: How to perform this join operation?

2016-04-14 Thread Elias Levy
Anyone from Data Artisans have some idea of how to go about this? On Wed, Apr 13, 2016 at 5:32 PM, Maxim wrote: > You could simulate the Samza approach by having a RichFlatMapFunction over > cogrouped streams that maintains the sliding window in its ListState. As I > understand the drawback is t

Re: "Read size does not match expected size" error when using HyperLogLog

2016-04-14 Thread Aljoscha Krettek
Hi, I'm afraid you found a bug. I created a Jira Issue: https://issues.apache.org/jira/browse/FLINK-3760. I already have a fix and hope we'll get it in the 1.0.2 release that we are just about to release. Cheers, Aljoscha On Wed, 13 Apr 2016 at 07:25 Hironori Ogibayashi wrote: > Hello, > > I am

Re: Flink event processing immediate feedback

2016-04-14 Thread Aljoscha Krettek
Hi, what do you mean by "synchronous". If I understood it correctly then some events entering the Flink pipeline would trigger an alert while some others would not trigger an alert. How would the component that receives such alerts know when to wait and when to don't wait. As I see it you can pus

Re: Flink performance pre-packaged vs. self-compiled

2016-04-14 Thread Robert Schmidtke
Hi Robert, thanks for the hint! Looks like something I could have figured out myself -.-" I'll let you know if I find something. Robert On Thu, Apr 14, 2016 at 1:06 PM, Robert Metzger wrote: > Hi Robert, > > check out the tools/create_release_files.sh file in the source tree. There > you can s

Re: Flink performance pre-packaged vs. self-compiled

2016-04-14 Thread Robert Metzger
Hi Robert, check out the tools/create_release_files.sh file in the source tree. There you can see how we are building the release binaries. It would be quite interesting to find out what caused the performance difference. On Wed, Apr 13, 2016 at 5:03 PM, Robert Schmidtke wrote: > Hi everyone, >