The first and 3rd points here aren't very fair -- they apply to all data
systems. Systems downstream of your database can lose data in the same way;
the database retention policy expires old data, downstream fails, and back
to the tapes you must go. Likewise with 3, a bug in any ETL system can
caus
Hello,
regarding the Lambda architecture there is a following book -
https://www.manning.com/books/big-data (Big Data. Principles and best
practices of scalable realtime data systems
Nathan Marz and James Warren).
Regards,
Roman
2015-11-12 4:47 GMT+03:00 Welly Tambunan :
> Hi Stephan,
>
>
>
Hi Stephan,
Thanks for your response.
We are trying to justify whether it's enough to use Kappa Architecture with
Flink. This more about resiliency and message lost issue etc.
The article is worry about message lost even if you are using Kafka.
No matter the message queue or broker you rely o
Hi Stephan,
>Storing the model in OperatorState is a good idea, if you can. On the
roadmap is to migrate the operator state to managed memory as well, so that
should take care of the GC issues.
Is this using off the heap memory ? Which version we expect this one to be
available ?
Another question
Heya,
I don't see a section in the online manual dedicated to this topic, so I
want to raise the question here: How should errors be handled? Specifically
I'm thinking about streaming jobs, which are expected to "never go down".
For example, errors can be raised at the point where objects are seri
Hello,
I'm interested in exposing metrics from my UDFs. I see FLINK-1501 exposes
task manager metrics via a UI; it would be nice to plug into the same
MetricRegistry to register my own (ie, gauges). I don't see this exposed
via runtime context. This did lead me to discovering the Accumulators API.
Yes, I observed the RC votes underway. I did wire up 0.10 dependencies a
couple days back and saw there were API changes. I will continue to work
toward stabilizing my prototype before moving to the new API, hopefully
timing will coincide with your release.
Thanks again for being such a communicat
Hello,
Thank you very much for your advices, Stephan, Robert and Maximilian.
Upgrading to Java 1.7 on the cluster solved the problem indeed.
java version "1.7.0_75"
OpenJDK Runtime Environment (rhel-2.5.4.0.el6_6-x86_64 u75-b13)
OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)
Best regard
Is "jar tf /users/camelia/thecluster/flink-0.9.1/lib/flink-dist-0.9.1.jar"
listing for example the "org/apache/flink/client/CliFrontend" class?
On Wed, Nov 11, 2015 at 12:09 PM, Maximilian Michels wrote:
> Hi Camelia,
>
> Flink 0.9.X supports Java 6. So this can't be the issue.
>
> Out of curios
I think what you call "union" is a "connected stream" in Flink. Have a look
at this example: https://gist.github.com/fhueske/4ea5422edb5820915fa4
It shows how to dynamically update a list of filters by external requests.
Maybe that's what you are looking for?
On Wed, Nov 11, 2015 at 12:15 PM, St
Hi!
I don not really understand what exactly you want to do, especially the "union
an infinite real time data stream with filtered persistent data where the
condition of filtering is provided by external requests".
If you want to work on substreams in general, there are two options:
1) Create th
Hi Camelia,
Flink 0.9.X supports Java 6. So this can't be the issue.
Out of curiosity, I gave it a spin on a Linux machine with OpenJDK 6. I was
able to start the command-line interface, job manager and task managers.
java version "1.6.0_36"
OpenJDK Runtime Environment (IcedTea6 1.13.8) (6b36-1.
Hi!
You can use both the DataSet API or the DataStream API for that. In case of
failures, they would behave slightly differently.
DataSet:
Fault tolerance for the DataSet API works by restarting the job and redoing
all of the work. In some sense, that is similar to what happens in
MapReduce, onl
Hi!
In general, if you can keep state in Flink, you get better
throughput/latency/consistency and have one less system to worry about
(external k/v store). State outside means that the Flink processes can be
slimmer and need fewer resources and as such recover a bit faster. There
are use cases for
Hello,
As promised, I come back with debugging details.
So:
*** In start-cluster.sh , the following echo's
echo "start-cluster ~"
echo $HOSTLIST
echo "start-cluster ~"
echo $FLINK_BIN_DIR
echo "start-cluster ~~~
Hi!
Can you explain a little more what you want to achieve? Maybe then we can
give a few more comments...
I briefly read through some of the articles you linked, but did not quite
understand their train of thoughts.
For example, letting Tomcat write to Cassandra directly, and to Kafka,
might just
In Flink 0.9.1 keyBy is called "groupBy()". We've reworked the DataStream
API between 0.9 and 0.10, that's why we had to rename the method.
On Wed, Nov 11, 2015 at 9:37 AM, Stephan Ewen wrote:
> I would encourage you to use the 0.10 version of Flink. Streaming has made
> some major improvements
I would encourage you to use the 0.10 version of Flink. Streaming has made
some major improvements there.
The release is voted on now, you can refer to these repositories for the
release candidate code:
http://people.apache.org/~mxm/flink-0.10.0-rc8/
https://repository.apache.org/content/reposit
Hi!
Flink requires at least Java 1.7, so one of the reasons could also be that
the classes are rejected for an incompatible version (class format 1.7, JVM
does not understand it since it is only version 1.6).
That could explain things...
Greetings,
Stephan
On Wed, Nov 11, 2015 at 9:01 AM, Came
Good morning,
Thank you Stephan!
I keep on testing and in the meantime I'm wondering if the Java version on the
cluster may be part of the issue:
java version "1.6.0_36"
OpenJDK Runtime Environment (IcedTea6 1.13.8) (rhel-1.13.8.1.el6_7-x86_64)
OpenJDK 64-Bit Server VM (build 23.25-b01, mixed mo
20 matches
Mail list logo