Hi Jamie,
Thanks for the reply.
Yeah i looked at save points, i want to start my job only from the last
checkpoint, this means I have to keep track of when the checkpoint was
taken and the trigger a save point. I am not sure this is the way to go. My
state backend is HDFS and I can see that the c
Jamie,
Thank you for your insight. To answer your questions I am running on AWS
and have access to S3. Further I already have Zookeeper in the mix (its
used by Mesos as well as Kafka). I was hoping to avoid the complexities of
an automated HA setup by running a single jvm and then migrate to HA do
Hi Ufuk,
Looking at the document you sent it seems only 1 task manager per node
exist and within that you have multiple slots. Is it possible to run more
than 1 task manager per node? Also, within a task manager is the
parallelism done through threads or processes?
Thank you,
Saliya
On Thu, Jun
I started to answer these questions and then realized I was making an
assumption about your environment. Do you have a reliable persistent file
system such as HDFS or S3 at your disposal or do you truly mean to run on a
single node?
If the you are truly thinking to run on a single node only there
Hi Prabhu,
Have you taken a look at Flink's savepoints feature? This allows you to
make snapshots of your job's state on demand and then at any time restart
your job from that point:
https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/savepoints.html
Also know that you can
I know this is really basic but have you verified that you're Flink lib
folder contains log4j-1.2.17.jar? I imagine that's fine given the
yarn-session.sh approach is working fine. What version of EMR are you
running? What version of Flink?
-Jamie
On Thu, Jun 30, 2016 at 11:28 AM, Hanson, Bruc
Hi Mindis,
This does actually sound like a good use case for Flink. Without knowing
more details it's a bit hard to say which of the options you mention would
be most efficient but my gut feeling is that the "one big dataset" approach
would be the way to go.
I think there probably is a simplifie
Hi,
I have a flink streaming job that reads from kafka, performs a aggregation
in a window, it ran fine for a while however when the number of events in a
window crossed a certain limit , the yarn containers failed with Out Of
Memory. The job was running with 10G containers.
We have about 64G me
what do you mean exactly?
On 1 Jul 2016 18:58, "Aljoscha Krettek" wrote:
> Hi,
> do you have any data in the coGroup/groupBy operators that you use,
> besides the input data?
>
> Cheers,
> Aljoscha
>
> On Fri, 1 Jul 2016 at 14:17 Flavio Pompermaier
> wrote:
>
>> Hi to all,
>> I have a Flink job
Hi,
do you have any data in the coGroup/groupBy operators that you use, besides
the input data?
Cheers,
Aljoscha
On Fri, 1 Jul 2016 at 14:17 Flavio Pompermaier wrote:
> Hi to all,
> I have a Flink job that computes data correctly when launched locally from
> my IDE while it doesn't when launche
Ah, this might be in code that runs at a different layer from the
StateBackend. Can you maybe pinpoint which of your user classes is this
anonymous class and where it is used? Maybe by replacing them by
non-anonymous classes and checking which replacement fixes the problem.
-
Aljoscha
On Fri, 1 J
Hi,
I'm afraid the only way to do it right now is using the wrapper that can
contain both, as you suggested.
Cheers,
Aljoscha
On Thu, 30 Jun 2016 at 16:50 Martin Neumann wrote:
> Hej,
>
> I'm currently playing around with some machine learning algorithms in
> Flink streaming.
>
> I have an inpu
I've just double checked and I do still get the ClassNotFound error for an
anonymous class, on a job which uses the RocksDBStateBackend.
In case it helps, this was the full stack trace:
java.lang.RuntimeException: Failed to deserialize state handle and
setup initial operator state.
at org
Hi,
I am evaluating flink for use in stateful streaming application. Some
information about the intended use:
- Will run in a mesos cluster and deployed via marathon in a docker
container
- Initial throughput ~ 100 messages per second (from kafka)
- Will need to scale to 10x that soon after la
Hi to all,
I have a Flink job that computes data correctly when launched locally from
my IDE while it doesn't when launched on the cluster.
Is there any suggestion/example to understand the problematic operators in
this way?
I think the root cause is the fact that some operator (e.g.
coGroup/group
Thanks guys, that's very helpful info!
@Aljoscha I thought I saw this exception on a job that was using the
RocksDB state backend, but I'm not sure. I will do some more tests today to
double check. If it's still a problem I'll try the explicit class
definitions solution.
Josh
On Thu, Jun 30, 201
Unfortunately, this is not possible at the moment. This optimization
definitely makes sense in certain situations. How large is your state
and how long does it take to recover?
On Fri, Jul 1, 2016 at 9:18 AM, Chia-Hung Lin wrote:
> After reading the document and configuring to test failure strate
After reading the document and configuring to test failure strategy,
it seems to me Flink restarts the job once any failures (e.g.
exception thrown, etc.) occur.
https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html
My question:
Is it possible to configure in
18 matches
Mail list logo