Re: Flink, Kappa and Lambda

2015-11-11 Thread Nick Dimiduk
The first and 3rd points here aren't very fair -- they apply to all data systems. Systems downstream of your database can lose data in the same way; the database retention policy expires old data, downstream fails, and back to the tapes you must go. Likewise with 3, a bug in any ETL system can caus

Re: Flink, Kappa and Lambda

2015-11-11 Thread rss rss
Hello, regarding the Lambda architecture there is a following book - https://www.manning.com/books/big-data (Big Data. Principles and best practices of scalable realtime data systems Nathan Marz and James Warren). Regards, Roman 2015-11-12 4:47 GMT+03:00 Welly Tambunan : > Hi Stephan, > > >

Re: Flink, Kappa and Lambda

2015-11-11 Thread Welly Tambunan
Hi Stephan, Thanks for your response. We are trying to justify whether it's enough to use Kappa Architecture with Flink. This more about resiliency and message lost issue etc. The article is worry about message lost even if you are using Kafka. No matter the message queue or broker you rely o

Re: Apache Flink Operator State as Query Cache

2015-11-11 Thread Welly Tambunan
Hi Stephan, >Storing the model in OperatorState is a good idea, if you can. On the roadmap is to migrate the operator state to managed memory as well, so that should take care of the GC issues. Is this using off the heap memory ? Which version we expect this one to be available ? Another question

Error handling

2015-11-11 Thread Nick Dimiduk
Heya, I don't see a section in the online manual dedicated to this topic, so I want to raise the question here: How should errors be handled? Specifically I'm thinking about streaming jobs, which are expected to "never go down". For example, errors can be raised at the point where objects are seri

Accumulators/Metrics

2015-11-11 Thread Nick Dimiduk
Hello, I'm interested in exposing metrics from my UDFs. I see FLINK-1501 exposes task manager metrics via a UI; it would be nice to plug into the same MetricRegistry to register my own (ie, gauges). I don't see this exposed via runtime context. This did lead me to discovering the Accumulators API.

Re: Implementing samza table/stream join

2015-11-11 Thread Nick Dimiduk
Yes, I observed the RC votes underway. I did wire up 0.10 dependencies a couple days back and saw there were API changes. I will continue to work toward stabilizing my prototype before moving to the new API, hopefully timing will coincide with your release. Thanks again for being such a communicat

RE: Cluster installation gives java.lang.NoClassDefFoundError for everything - solved

2015-11-11 Thread Camelia Elena Ciolac
Hello, Thank you very much for your advices, Stephan, Robert and Maximilian. Upgrading to Java 1.7 on the cluster solved the problem indeed. java version "1.7.0_75" OpenJDK Runtime Environment (rhel-2.5.4.0.el6_6-x86_64 u75-b13) OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode) Best regard

Re: Cluster installation gives java.lang.NoClassDefFoundError for everything

2015-11-11 Thread Robert Metzger
Is "jar tf /users/camelia/thecluster/flink-0.9.1/lib/flink-dist-0.9.1.jar" listing for example the "org/apache/flink/client/CliFrontend" class? On Wed, Nov 11, 2015 at 12:09 PM, Maximilian Michels wrote: > Hi Camelia, > > Flink 0.9.X supports Java 6. So this can't be the issue. > > Out of curios

Re: finite subset of an infinite data stream

2015-11-11 Thread Robert Metzger
I think what you call "union" is a "connected stream" in Flink. Have a look at this example: https://gist.github.com/fhueske/4ea5422edb5820915fa4 It shows how to dynamically update a list of filters by external requests. Maybe that's what you are looking for? On Wed, Nov 11, 2015 at 12:15 PM, St

Re: finite subset of an infinite data stream

2015-11-11 Thread Stephan Ewen
Hi! I don not really understand what exactly you want to do, especially the "union an infinite real time data stream with filtered persistent data where the condition of filtering is provided by external requests". If you want to work on substreams in general, there are two options: 1) Create th

Re: Cluster installation gives java.lang.NoClassDefFoundError for everything

2015-11-11 Thread Maximilian Michels
Hi Camelia, Flink 0.9.X supports Java 6. So this can't be the issue. Out of curiosity, I gave it a spin on a Linux machine with OpenJDK 6. I was able to start the command-line interface, job manager and task managers. java version "1.6.0_36" OpenJDK Runtime Environment (IcedTea6 1.13.8) (6b36-1.

Re: Checkpoints in batch processing & JDBC Output Format

2015-11-11 Thread Stephan Ewen
Hi! You can use both the DataSet API or the DataStream API for that. In case of failures, they would behave slightly differently. DataSet: Fault tolerance for the DataSet API works by restarting the job and redoing all of the work. In some sense, that is similar to what happens in MapReduce, onl

Re: Apache Flink Operator State as Query Cache

2015-11-11 Thread Stephan Ewen
Hi! In general, if you can keep state in Flink, you get better throughput/latency/consistency and have one less system to worry about (external k/v store). State outside means that the Flink processes can be slimmer and need fewer resources and as such recover a bit faster. There are use cases for

RE: Cluster installation gives java.lang.NoClassDefFoundError for everything - detailed debugging

2015-11-11 Thread Camelia Elena Ciolac
Hello, As promised, I come back with debugging details. So: *** In start-cluster.sh , the following echo's echo "start-cluster ~" echo $HOSTLIST echo "start-cluster ~" echo $FLINK_BIN_DIR echo "start-cluster ~~~

Re: Flink, Kappa and Lambda

2015-11-11 Thread Stephan Ewen
Hi! Can you explain a little more what you want to achieve? Maybe then we can give a few more comments... I briefly read through some of the articles you linked, but did not quite understand their train of thoughts. For example, letting Tomcat write to Cassandra directly, and to Kafka, might just

Re: Implementing samza table/stream join

2015-11-11 Thread Robert Metzger
In Flink 0.9.1 keyBy is called "groupBy()". We've reworked the DataStream API between 0.9 and 0.10, that's why we had to rename the method. On Wed, Nov 11, 2015 at 9:37 AM, Stephan Ewen wrote: > I would encourage you to use the 0.10 version of Flink. Streaming has made > some major improvements

Re: Implementing samza table/stream join

2015-11-11 Thread Stephan Ewen
I would encourage you to use the 0.10 version of Flink. Streaming has made some major improvements there. The release is voted on now, you can refer to these repositories for the release candidate code: http://people.apache.org/~mxm/flink-0.10.0-rc8/ https://repository.apache.org/content/reposit

Re: Cluster installation gives java.lang.NoClassDefFoundError for everything

2015-11-11 Thread Stephan Ewen
Hi! Flink requires at least Java 1.7, so one of the reasons could also be that the classes are rejected for an incompatible version (class format 1.7, JVM does not understand it since it is only version 1.6). That could explain things... Greetings, Stephan On Wed, Nov 11, 2015 at 9:01 AM, Came

RE: Cluster installation gives java.lang.NoClassDefFoundError for everything

2015-11-11 Thread Camelia Elena Ciolac
Good morning, Thank you Stephan! I keep on testing and in the meantime I'm wondering if the Java version on the cluster may be part of the issue: java version "1.6.0_36" OpenJDK Runtime Environment (IcedTea6 1.13.8) (rhel-1.13.8.1.el6_7-x86_64) OpenJDK 64-Bit Server VM (build 23.25-b01, mixed mo