Re: Monitoring job w/LocalStreamEnvironment

2017-10-16 Thread Piotr Nowojski
Hi,

Regarding metrics please check the "Writing an Integration test for 
flink-metrics” recent mailing list question. You can either use JMXReporter or 
write some custom reporter for this purpose.

Piotrek

> On 13 Oct 2017, at 20:57, Ken Krugler  wrote:
> 
> Hi Piotr,
> 
> Thanks for responding, see below.
> 
>> On Oct 12, 2017, at 7:51 AM, Piotr Nowojski > > wrote:
>> 
>> Hi,
>> 
>> Have you read the following doc?
>> https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/testing.html
>>  
>> 
>> 
>> There are some hints regarding testing your application. Especially take a 
>> look at the example with using static field to communicate with the running 
>> job.
> 
> Yes, I’d read those.
> 
> I already have a bunch of Flink metrics, I was hoping to leverage those to 
> know when my test can safely terminate my iteration;
> 
> I guess I could create a metrics wrapper that also logs to a static class 
> during tests.
> 
> Regards,
> 
> — Ken
> 
> 
>>> On 12 Oct 2017, at 16:33, Ken Krugler >> > wrote:
>>> 
>>> Hi all,
>>> 
>>> With an iteration-based workflow, it’s helpful to be able to monitor the 
>>> job counters and explicitly terminate when the test has completed.
>>> 
>>> I didn’t see support for async job creation, though.
>>> 
>>> So I extended LocalStreamEnvironment to add an executeAsync(), which 
>>> returns the LocalFlinkMiniCluster.submitJobDetached() result.
>>> 
>>> But it appears like I need to have a ClusterClient in order to actually 
>>> monitor this job.
>>> 
>>> And ClusterClient is bound in with a lot of CLI code, so I’m hesitant to 
>>> try to extract what I need.
>>> 
>>> Is there an easier/recommended approach to the above?
>>> 
>>> Thanks!
>>> 
>>> — Ken
>>> 
>>> 
>>> http://about.me/kkrugler 
>>> +1 530-210-6378
>>> 
>> 
> 
> 
> http://about.me/kkrugler 
> +1 530-210-6378



start-cluster.sh not working in HA mode

2017-10-16 Thread Marchant, Hayden
I am attempting to run Flink 1.3.2 in HA mode with zookeeper.

When I run the start-cluster.sh, the job manager is not started, even though 
the task manager is started. When I delved into this, I saw that the  command:

ssh -n $FLINK_SSH_OPTS $master -- "nohup /bin/bash -l 
\"${FLINK_BIN_DIR}/jobmanager.sh\" start cluster ${master} ${webuiport} &"

is not actually running anything on the host. i.e. I do not see "Starting 
jobmanager daemon on host ."

Only when I remove ALL quotes, do I see it working. i.e. if I run:

ssh -n $FLINK_SSH_OPTS $master -- nohup /bin/bash -l 
${FLINK_BIN_DIR}/jobmanager.sh start cluster ${master} ${webuiport} &

I see that it manages to run the job manager - I see " Starting jobmanager 
daemon on host.".

Did anyone else experience a similar problem? Any elegant workarounds without 
having to change source code?

Thanks,
Hayden Marchant



Fwd: Exception in WordCount code.

2017-10-16 Thread Arunima Singh
Hi all,

I am new to Apache Flink and I followed http://training.data-
artisans.com/devEnvSetup.html
to start with Flink setup and sample code demo.

I am facing issue while running WordCount.java code in eclipse IDE.

I am getting " java.lang.NoClassDefFoundError:
org/jboss/netty/logging/InternalLoggerFactory"
exception.

Below is the console output -

19:35:33,364 INFO  org.apache.flink.api.java.ExecutionEnvironment
 - The job has 0 registered types and 0 default Kryo serializers
19:35:33,643 INFO  org.apache.flink.runtime.minicluster.FlinkMiniCluster
  - Disabled queryable state server
Exception in thread "main" java.lang.NoClassDefFoundError:
org/jboss/netty/logging/InternalLoggerFactory
at org.apache.flink.runtime.minicluster.FlinkMiniCluster.<
init>(FlinkMiniCluster.scala:90)
at org.apache.flink.runtime.minicluster.LocalFlinkMiniCluster.(
LocalFlinkMiniCluster.scala:65)
at org.apache.flink.runtime.minicluster.LocalFlinkMiniCluster.(
LocalFlinkMiniCluster.scala:74)
at org.apache.flink.client.LocalExecutor.start(LocalExecutor.java:115)
at org.apache.flink.client.LocalExecutor.executePlan(LocalExecutor.java:176)
at org.apache.flink.api.java.LocalEnvironment.execute(
LocalEnvironment.java:91)
at org.apache.flink.api.java.ExecutionEnvironment.execute(
ExecutionEnvironment.java:926)
at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)
at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
at org.apache.flink.quickstart.WordCount.main(WordCount.java:59)
Caused by: java.lang.ClassNotFoundException: org.jboss.netty.logging.
InternalLoggerFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 10 more


Please help me out.

Thanks in Advance.

Regards,
Arunima Singh


Unbalanced job scheduling

2017-10-16 Thread AndreaKinn
Hi all,
I want to expose you my program flow.

I have the following operators:

kafka-source -> timestamp-extractor -> map -> keyBy -> window -> apply ->
LEARN -> SELECT -> process -> cassandra-sink

the LEARN and SELECT operators belong to an external library supported by
flink. LEARN is a very heavy operation compared to the other operators.

Unfortunately LEARN has a max parallelism of 1, so if I have a cluster of 2
TM with 1 slot each and I set parallelism = 2 I will have one TM which
performs a parallel instances of all the operators and the single instance
of LEARN while the other one TM performs just the second parallel instances
of all the operators (clearly there are no more instance of LEARN).
That's ok and I have no problem with understanding it.

*** The problem:
Actually I have 2 identical flows like this because it matches a situation
where I have two sensor streams so really I have 2 LEARN operators
corresponding to two independent streams.

By the way I noted that even in this case I have one TM which take a load of
the parallel instances of all the operators AND the single instances of
LEARN-1 and LEARN-2 while the other one TM performs just the second parallel
instances of all the operators (no LEARN instances here).

Since LEARN is an heavy operator this lead to a very unbalanced load on the
cluster, so much that the first TM is killed during the execution (looking
at the logs it probably happens because it has not enough memory, in fact
the sink execution is very very slow, it seems like the LEARN is a
bottleneck).

Honestly I can't understand why Flink don't assign 1 LEARN operator to one
TM and the other one LEARN to the other one TM. 
This won't let me to stress the cluster properly because I will have always
one TM super busy and the other one quite "free" and unstressed.

Bye,
Andrea



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Case Class TypeInformation

2017-10-16 Thread Joshua Griffith
Hello,

I have a case class that wraps a Flink Row and I’d like to use fields from that 
Row in a delta iteration join condition. I only have the row’s fields after the 
job starts. I can construct RowTypeInfo for the Row but I’m not sure how to add 
that to Flink’s generated type information for the case class. Without it, I 
understandably get the following error because Flink doesn’t know the Row’s 
TypeInformation:

org.apache.flink.api.common.InvalidProgramException: This type 
(GenericType) cannot be used as key.

Is there a way to manually construct or annotate the type information for the 
case class to provide the Row’s type information so it can be used in a join? I 
could alternately replace the case class with a Tuple and construct a 
TupleTypeInfo but a tuple is more difficult to use than a case class.

Thanks,

Joshua


Re: Case Class TypeInformation

2017-10-16 Thread Joshua Griffith
Correction: I have the row’s RowTypeInfo at runtime before the job starts. I 
don’t have RowTypeInfo at compile time.

On Oct 16, 2017, at 4:15 PM, Joshua Griffith 
mailto:jgriff...@campuslabs.com>> wrote:

Hello,

I have a case class that wraps a Flink Row and I’d like to use fields from that 
Row in a delta iteration join condition. I only have the row’s fields after the 
job starts. I can construct RowTypeInfo for the Row but I’m not sure how to add 
that to Flink’s generated type information for the case class. Without it, I 
understandably get the following error because Flink doesn’t know the Row’s 
TypeInformation:

org.apache.flink.api.common.InvalidProgramException: This type 
(GenericType) cannot be used as key.

Is there a way to manually construct or annotate the type information for the 
case class to provide the Row’s type information so it can be used in a join? I 
could alternately replace the case class with a Tuple and construct a 
TupleTypeInfo but a tuple is more difficult to use than a case class.

Thanks,

Joshua



problem with increase job parallelism

2017-10-16 Thread Lei Chen
Hi,

We're trying to implement some module to help autoscale our pipeline which
is built  with Flink on YARN. According to the document, the suggested
procedure seems to be:

1. cancel job with savepoint
2. start new job with increased YARN TM number and parallelism.

However, step 2 always gave error

Caused by: java.lang.IllegalStateException: Failed to rollback to savepoint
hdfs://10.106.238.14:/tmp/savepoint-767421-20907d234655. Cannot map
savepoint state for operator 37dfe905df17858e07858039ce3d8ae1 to the new
program, because the operator is not available in the new program. If you
want to allow to skip this, you can set the --allowNonRestoredState option
on the CLI.
at org.apache.flink.runtime.checkpoint.savepoint.SavepointLoade
r.loadAndValidateSavepoint(SavepointLoader.java:130)
at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.re
storeSavepoint(CheckpointCoordinator.java:1140)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$
apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobMa
nager.scala:1386)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$
apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.s
cala:1372)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$
apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.s
cala:1372)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.lifte
dTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(F
uture.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.
exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(
ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinW
orkerThread.java:107)

The procedure worked fine if parallelism was not changed.

Also want to mention that I didn't manually specify OperatorID in my job. The
document does mentioned manually OperatorID assignment is suggested, just
curious is that mandatory in my case to fix the problem I'm seeing, given
that my program doesn't change at all so the autogenerated operatorID
should be unchanged after parallelism increase?

thanks,
Lei