Re: Challenges Deploying Flink With Savepoints On Kubernetes

2019-09-24 Thread Hao Sun
case well, I do not see a need to start from checkpoint after a bug fix. >From what I know, currently you can use checkpoint as a savepoint as well Hao Sun On Tue, Sep 24, 2019 at 7:48 AM Yuval Itzchakov wrote: > AFAIK there's currently nothing implemented to solve this problem, but &

Re: Challenges Deploying Flink With Savepoints On Kubernetes

2019-09-24 Thread Hao Sun
I think I overlooked it. Good point. I am using Redis to save the path to my savepoint, I might be able to set a TTL to avoid such issue. Hao Sun On Tue, Sep 24, 2019 at 9:54 AM Yuval Itzchakov wrote: > Hi Hao, > > I think he's exactly talking about the usecase where the JM/

Re: Challenges Deploying Flink With Savepoints On Kubernetes

2019-10-10 Thread Hao Sun
Yep I know that option. That's where get me confused as well. In a HA setup, where do I supply this option (allowNonRestoredState)? This option requires a savepoint path when I start a flink job I remember. And HA does not require the path Hao Sun On Thu, Oct 10, 2019 at 11:16 AM Yun

Re: Flink 1.8.0 S3 Checkpoints fail with java.security.ProviderException: Could not initialize NSS

2019-10-10 Thread Hao Sun
I saw similar issue when using alpine linux. https://pkgs.alpinelinux.org/package/v3.3/main/x86/nss Installing this package fixed my problem Hao Sun On Thu, Oct 10, 2019 at 3:46 PM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > Hi there, > > I'm getting the fol

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

2020-02-28 Thread Hao Sun
Sounds good. Thank you! Hao Sun On Thu, Feb 27, 2020 at 6:52 PM Yang Wang wrote: > Hi Hao Sun, > > I just post the explanation to the user ML so that others could also have > the same problem. > > Gven the job graph is fetched from the jar, do we still need Zookeeper for >

anybody can start flink with job mode?

2018-08-24 Thread Hao Sun
I got an error like this. $ docker run -it flink-job:latest job-cluster Starting the job-cluster config file: jobmanager.rpc.address: localhost jobmanager.rpc.port: 6123 jobmanager.heap.size: 1024m taskmanager.heap.size: 1024m taskmanager.numberOfTaskSlots: 1 parallelism.default: 1 rest.port: 8081

Re: anybody can start flink with job mode?

2018-08-24 Thread Hao Sun
Thanks, I'll look into it. On Fri, Aug 24, 2018, 19:44 vino yang wrote: > Hi Hao Sun, > > From the error log, it seems that the jar package for the job was not > found. > You must make sure your Jar is in the classpath. > Related documentation may not be up-to-date, and t

Re: anybody can start flink with job mode?

2018-08-28 Thread Hao Sun
, > Till > > On Sat, Aug 25, 2018 at 5:11 AM Hao Sun wrote: > >> Thanks, I'll look into it. >> >> On Fri, Aug 24, 2018, 19:44 vino yang wrote: >> >>> Hi Hao Sun, >>> >>> From the error log, it seems that the jar package for the job

java.io.IOException: NSS is already initialized

2018-11-01 Thread Hao Sun
I am on Flink 1.6.2 (no Hadoop, in docker + K8S), using rocksdb and S3 (presto) I got this error when flink creating a checking point === 2018-11-02 04:00:55,011 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job ConnectedStreams maxwell.accounts (00

Re: java.io.IOException: NSS is already initialized

2018-11-02 Thread Hao Sun
untime.executiongraph.ExecutionGraph- Try to restart or fail the job ConnectedStreams maxwell.accounts () if no longer possible. ===== On Thu, Nov 1, 2018 at 9:22 PM Hao Sun wrote: > I am on Flink 1.6.2 (no Hadoop, in docker + K8S),

Flink 1.6 job cluster mode, job_id is always 00000000000000000000000000000000

2018-11-03 Thread Hao Sun
I am wondering if I can customize job_id for job cluster mode. Currently it is always . I am running multiple job clusters and sharing s3, it means checkpoints will be shared by different jobs as well e.g. /chk-64, how can I avoid this

Re: Flink 1.6 job cluster mode, job_id is always 00000000000000000000000000000000

2018-11-04 Thread Hao Sun
Thanks that also works. To avoid same issue with zookeeper, I assume I have to do the same trick? On Sun, Nov 4, 2018, 03:34 Ufuk Celebi wrote: > Hey Hao Sun, > > this has been changed recently [1] in order to properly support > failover in job cluster mode. > > A workaround f

Re: Flink 1.6 job cluster mode, job_id is always 00000000000000000000000000000000

2018-11-05 Thread Hao Sun
Thanks all. On Mon, Nov 5, 2018 at 2:05 AM Ufuk Celebi wrote: > On Sun, Nov 4, 2018 at 10:34 PM Hao Sun wrote: > > Thanks that also works. To avoid same issue with zookeeper, I assume I > have to do the same trick? > > Yes, exactly. The following configuration [1] entry

How to run Flink 1.6 job cluster in "standalone" mode?

2018-11-07 Thread Hao Sun
"Standalone" here I mean job-mananger + taskmanager on the same JVM. I have an issue to debug on our K8S environment, I can not reproduce it in local docker env or Intellij. If JM and TM are running in different VMs, it makes things harder to debug. Or is there a way to debug a job running on JM +

Re: java.io.IOException: NSS is already initialized

2018-11-08 Thread Hao Sun
; > Dawid > On 03/11/2018 03:09, Hao Sun wrote: > > Same environment, new error. > > I can run the same docker image with my local Mac, but on K8S, this gives > me this error. > I can not think of any difference between local Docker and K8S Docker. > > Any hint will be

The heartbeat of TaskManager with id ... timed out.

2018-11-08 Thread Hao Sun
I am running Flink 1.7 on K8S. I am not sure how to debug this issue. I turned on debug on JM/TM. I am not sure this part is related or not. How could an Actor suddenly disappear? = 2018-11-09 04:47:19,480 DEBUG org.apache.flink.runtime.rest.handler.legacy.metrics.MetricFetcher - Query metri

Re: Where is the "Latest Savepoint" information saved?

2018-11-09 Thread Hao Sun
s-release-1.6/monitoring/rest_api.html > <https://ci.apache.org/projects/flink/flink-docs-release-1.6/monitoring/rest_api.html> > > Best, > Paul Lam > > > 在 2018年11月9日,13:55,Hao Sun 写道: > > Since this save point path is very useful to application updates, where is > this information stored? Can we keep it in ZK or S3 for retrieval? > > > > >

Re: java.io.IOException: NSS is already initialized

2018-11-10 Thread Hao Sun
pointing is working with the hadoop flavour. On Fri, Nov 9, 2018 at 2:02 PM Ufuk Celebi wrote: > Hey Hao Sun, > > - Is this an intermittent failure or permanent? The logs indicate that > some checkpoints completed before the error occurs (e.g. checkpoint > numbers are greater than

Re: Where is the "Latest Savepoint" information saved?

2018-11-11 Thread Hao Sun
-1.6/monitoring/rest_api.html > <https://ci.apache.org/projects/flink/flink-docs-release-1.6/monitoring/rest_api.html> > [3] > https://ci.apache.org/projects/flink/flink-docs-release-1.6/monitoring/rest_api.html#job-cancellation > <https://ci.apache.org/projects/flink/flink-docs-rel

Re: Where is the "Latest Savepoint" information saved?

2018-11-11 Thread Hao Sun
An alternative is the web UI checkpointing tab. It shows the latest > checkpoint used for restore of the job. You should see your savepoint > there. > > Best, > > Ufuk > > > On Sun, Nov 11, 2018 at 7:45 PM Hao Sun wrote: > > > > This is great, I will try option

Re: How to run Flink 1.6 job cluster in "standalone" mode?

2018-11-12 Thread Hao Sun
. > > Maybe you can tell us what wrong behavior you observe? > > Btw. Flink's metrics can also already be quite helpful. > > Regards, > Timo > > Am 07.11.18 um 14:15 schrieb Hao Sun: > > "Standalone" here I mean job-mananger + taskmanager on the same

Flink 1.7 RC missing flink-scala-shell jar

2018-11-13 Thread Hao Sun
I can not find the jar here: https://repository.apache.org/content/repositories/orgapacheflink-1191/org/apache/flink/ Here is the error: bash-4.4# ./bin/start-scala-shell.sh local Error: Could not find or load main class org.apache.flink.api.scala.FlinkShell I think somehow I have to include the

Re: Flink 1.7 RC missing flink-scala-shell jar

2018-11-13 Thread Hao Sun
Sorry I mean the scala-2.12 version is missing On Tue, Nov 13, 2018 at 10:58 AM Hao Sun wrote: > I can not find the jar here: > > https://repository.apache.org/content/repositories/orgapacheflink-1191/org/apache/flink/ > > Here is the error: > bash-4.4# ./bin/start-scala-shel

Re: Flink 1.7 RC missing flink-scala-shell jar

2018-11-13 Thread Hao Sun
: > Hi, > > Till is the release manager for 1.7, so ping him here. > > Best, > tison. > > > Hao Sun 于2018年11月14日周三 上午3:07写道: > >> Sorry I mean the scala-2.12 version is missing >> >> On Tue, Nov 13, 2018 at 10:58 AM Hao Sun wrote: &

Re: Flink 1.7 RC missing flink-scala-shell jar

2018-11-14 Thread Hao Sun
etter > though. > > > On 14.11.2018 03:44, Hao Sun wrote: > > I do not see flink-scala-shell jar under flink opt directory. To run > scala shell, do I have to include the flink-scala-shell jar in my program > jar? > Why the error is saying Could not find or load main class >

Flink 1.7 job cluster (restore from checkpoint error)

2018-12-04 Thread Hao Sun
tes(StateAssignmentOperation.java:77) at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreLatestCheckpointedState(CheckpointCoordinator.java:1049) at org.apache.flink.runtime.jobmaster.JobMaster.createAndRestoreExecutionGraph(JobMaster.java:1152) at org.apache.flink.runtime.jobmaster.JobMaster.(JobMaster.java:296) at org.apache.flink.runtime.jobmaster.JobManagerRunner.(JobManagerRunner.java:157) == Can somebody help out? Thanks Hao Sun

Re: Flink 1.7 job cluster (restore from checkpoint error)

2018-12-05 Thread Hao Sun
Till, Flink is automatically trying to recover from a checkpoint not savepoint. How can I get allowNonRestoredState applied in this case? Hao Sun Team Lead 1019 Market St. 7F San Francisco, CA 94103 On Wed, Dec 5, 2018 at 10:09 AM Till Rohrmann wrote: > Hi Hao, > > I think you need t

Re: Flink 1.7 job cluster (restore from checkpoint error)

2018-12-06 Thread Hao Sun
Thanks for the tip! I did change the jobGraph this time. Hao Sun Team Lead 1019 Market St. 7F San Francisco, CA 94103 On Thu, Dec 6, 2018 at 2:47 AM Till Rohrmann wrote: > Hi Hao, > > if Flink tries to recover from a checkpoint, then the JobGraph should not > be modified and the s

How many times Flink initialize an operator?

2018-12-11 Thread Hao Sun
[I,O]).addSink(discard) Hao Sun Team Lead 1019 Market St. 7F San Francisco, CA 94103

Re: How many times Flink initialize an operator?

2018-12-11 Thread Hao Sun
Ok, thanks for the clarification. Hao Sun Team Lead 1019 Market St. 7F San Francisco, CA 94103 On Tue, Dec 11, 2018 at 2:38 PM Ken Krugler wrote: > It’s based the parallelism of that operator, not the number of > TaskManagers. > > E.g. you can have an operator with a parallelism

Kafka consumer, is there a way to filter out messages using key only?

2018-12-18 Thread Hao Sun
KeyedDeserializationSchema, but can I use it to filter data? Hao Sun Team Lead 1019 Market St. 7F San Francisco, CA 94103

Re: Kafka consumer, is there a way to filter out messages using key only?

2018-12-27 Thread Hao Sun
null value...) based on the message key, that would allow you >> later to filter it out. So assuming the Optional solution the result of >> KeyedDeserializationSchema#deserialize could be Optional.empty() for >> invalid keys and Optional.of(deserializedValue) for valid keys. >&g

Is there a better way to debug ClassNotFoundException from FlinkUserCodeClassLoaders?

2019-01-02 Thread Hao Sun
org.apache.flink.streaming.api.graph.StreamConfig.getChainedOutputs(StreamConfig.java:321) ... 5 more Hao Sun Team Lead 1019 Market St. 7F San Francisco, CA 94103

Re: Is there a better way to debug ClassNotFoundException from FlinkUserCodeClassLoaders?

2019-01-02 Thread Hao Sun
lic java.lang.Object createInstance(java.lang.Object[]); public com.zendesk.fraudprevention.examples.ConnectedStreams$$anon$90$$anon$45(com.zendesk.fraudprevention.examples.ConnectedStreams$$anon$90, org.apache.flink.api.common.typeutils.TypeSerializer[]); } Hao Sun Team Lead 1019 Market St. 7F San

Re: Is there a better way to debug ClassNotFoundException from FlinkUserCodeClassLoaders?

2019-01-03 Thread Hao Sun
om/sbt/sbt-assembly to assemble the fat jar. There might be some issue, or config issue with that as well. I am reading this article, it is a good start for me as well https://heapanalytics.com/blog/engineering/missing-scala-class-noclassdeffounderror Hao Sun Team Lead 1019 Market St. 7F San Fra

Re: Is there a better way to debug ClassNotFoundException from FlinkUserCodeClassLoaders?

2019-01-04 Thread Hao Sun
Thanks Congxian for the tip. Arthas looks great Hao Sun Team Lead 1019 Market St. 7F San Francisco, CA 94103 On Fri, Jan 4, 2019 at 5:42 PM Congxian Qiu wrote: > Hi, Hao Sun > > For debugging the `ClassNotFoundException`, maybe the Arthas[1] tool can > help. > > [1] Arthas &

inconsistent module descriptor for hadoop uber jar (Flink 1.8)

2019-04-15 Thread Hao Sun
doop2-uber-2.8.3-1.8.0.pom': bad revision: expected='2.8.3-1.8.0' found='2.4.1-1.8.0'; Is this a bug? Hao Sun

Re: inconsistent module descriptor for hadoop uber jar (Flink 1.8)

2019-04-16 Thread Hao Sun
I am using sbt and sbt-assembly. In build.sbt libraryDependencies ++= Seq("org.apache.flink" % "flink-shaded-hadoop2-uber" % "2.8.3-1.8.0") Hao Sun On Tue, Apr 16, 2019 at 12:07 AM Gary Yao wrote: > Hi, > > Can you describe how to reproduce this? >

Re: java.io.IOException: NSS is already initialized

2019-04-17 Thread Hao Sun
I think I found the root cause https://bugs.alpinelinux.org/issues/10126 I have to re-install nss after apk update/upgrade Hao Sun On Sun, Nov 11, 2018 at 10:50 AM Ufuk Celebi wrote: > Hey Hao, > > 1) Regarding Hadoop S3: are you using the repackaged Hadoop S3 > dependency f

Re: status on FLINK-7129

2019-04-23 Thread Hao Sun
+1 On Tue, Apr 23, 2019, 05:18 Vishal Santoshi wrote: > +1 > > On Tue, Apr 23, 2019, 4:57 AM kant kodali wrote: > >> Thanks all for the reply. I believe this is one of the most important >> feature that differentiates flink from other stream processing engines as >> others don't even have CEP y

Re: [VOTE] How to Deal with Split/Select in DataStream API

2019-07-04 Thread Hao Sun
Personally I prefer 3) to keep split/select and correct the behavior. I feel side output is kind of overkill for such a primitive function, and I prefer simple APIs like split/select. Hao Sun On Thu, Jul 4, 2019 at 11:20 AM Xingcan Cui wrote: > Hi folks, > > Two weeks ago, I started

Re: Graceful Task Manager Termination and Replacement

2019-07-11 Thread Hao Sun
I have a common interest in this topic. My k8s recycle hosts, and I am facing the same issue. Flink can tolerate this situation, but I am wondering if I can do better On Thu, Jul 11, 2019, 12:39 Aaron Levin wrote: > Hello, > > Is there a way to gracefully terminate a Task Manager beyond just kil

Re: [ANNOUNCE] Rong Rong becomes a Flink committer

2019-07-11 Thread Hao Sun
Congratulations Rong. On Thu, Jul 11, 2019, 11:39 Xuefu Z wrote: > Congratulations, Rong! > > On Thu, Jul 11, 2019 at 10:59 AM Bowen Li wrote: > >> Congrats, Rong! >> >> >> On Thu, Jul 11, 2019 at 10:48 AM Oytun Tez wrote: >> >> > Congratulations Rong! >> > >> > --- >> > Oytun Tez >> > >> > *M

Re: R/W traffic estimation between Flink and Zookeeper

2017-11-15 Thread Hao Sun
in your particular use case on some > small scale. > > Piotrek > > > On 11 Oct 2017, at 19:58, Hao Sun wrote: > > > > Hi Is there a way to estimate read/write traffic between flink and zk? > > I am looking for something like 1000 reads/sec or 1000 writes/sec. And > the size of the message. > > > > Thanks > >

Re: R/W traffic estimation between Flink and Zookeeper

2017-11-16 Thread Hao Sun
ytes for most messages and somewhere in the low two-digit > megabytes as a typical max size. > > Best, > Stefan > > Am 15.11.2017 um 18:41 schrieb Hao Sun : > > Thanks Piotr, does Flink read/write to zookeeper every time it process a > record? > I thought only JM use

Re: org.apache.flink.runtime.io.network.NetworkEnvironment causing memory leak?

2017-11-16 Thread Hao Sun
G1_Young_Generation is a counter of how many > gc cycles have been performed and the time is a sum. So naturally, those > values would always increase. > > Best, > Stefan > > > Am 15.11.2017 um 18:35 schrieb Hao Sun : > > > > Hi team, I am looking at some memory/GC i

Re: org.apache.flink.runtime.io.network.NetworkEnvironment causing memory leak?

2017-11-16 Thread Hao Sun
Sorry, the "killed" I mean here is JM lost the TM. The TM instance is still running inside kubernetes, but it is not responding to any requests, probably due to high load. And from JM side, JM lost heartbeat tracking of the TM, so it marked the TM as died. The „volume“ of Kafka topics, I mean, the

Task manager suddenly lost connection to JM

2017-11-16 Thread Hao Sun
Hi team, I see an wired issue that one of my TM suddenly lost connection to JM. Once the job running on the TM relocated to a new TM, it can reconnect to JM again. And after a while, the new TM running the same job will repeat the same process. It is not guaranteed the troubled TMs can reconnect to

Re: [EXTERNAL] difference between checkpoints & savepoints

2017-11-30 Thread Hao Sun
Hi team, I have one follow up question on this. There is a discussion on resuming jobs from *a saved external checkpoint*, I feel there are two aspects of that topic. *1. I do not have changes to the job, just want to resume the job from a failure.* I can see this automatically happen with ZK enab

Re: Questions about checkpoints/savepoints

2017-11-30 Thread Hao Sun
Hi team, I am a similar use case do we have any answers on this? When we trigger savepoint can we store that information to ZK as well? So I can avoid S3 file listing and do not have to use other external services? On Wed, Oct 25, 2017 at 11:19 PM vipul singh wrote: > As a followup to above, is

non-shared TaskManager-specific config file

2017-12-01 Thread Hao Sun
Hi team, I am wondering how can I create a non-shared config file and let Flink read it. Can I use include in the config? Or I have to prepare a different config for each TM? https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html - taskmanager.hostname: The hostname

Trace jar file name from jobId, is that possible?

2017-12-01 Thread Hao Sun
Hi I am using Flink 1.3.2 on K8S, and need a deployment strategy for my app. I want to use savepoints to resume a job after each deployment. As you know I need jar file name and path to savepoints to resume a task. Currently `flink list` command only gives me job ids, not jar file names. And REST

Re: non-shared TaskManager-specific config file

2017-12-04 Thread Hao Sun
Thanks. If we can support include configuration dir that will be very helpful. On Mon, Dec 4, 2017, 00:50 Chesnay Schepler wrote: > You will have to create a separate config for each TaskManager. > > > On 01.12.2017 23:14, Hao Sun wrote: > > Hi team, I am wondering how can I c

Re: non-shared TaskManager-specific config file

2017-12-04 Thread Hao Sun
Sure, I will do that. On Mon, Dec 4, 2017, 07:26 Fabian Hueske wrote: > Can you create a JIRA issue to propose the feature? > > Thank you, > Fabian > > 2017-12-04 16:15 GMT+01:00 Hao Sun : > >> Thanks. If we can support include configuration dir that will be very >

Re: Trace jar file name from jobId, is that possible?

2017-12-04 Thread Hao Sun
e-1.3/monitoring/rest_api.html#submitting-programs > <https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/rest_api.html#submitting-programs> > > 2017-12-02 0:28 GMT+01:00 Hao Sun : > >> Hi I am using Flink 1.3.2 on K8S, and need a deployment strategy for my

Re: non-shared TaskManager-specific config file

2017-12-04 Thread Hao Sun
https://issues.apache.org/jira/browse/FLINK-8197, here is the JIRA link for xref. On Mon, Dec 4, 2017 at 7:35 AM Hao Sun wrote: > Sure, I will do that. > > On Mon, Dec 4, 2017, 07:26 Fabian Hueske wrote: > >> Can you create a JIRA issue to propose the feature? >>

Re: Trace jar file name from jobId, is that possible?

2017-12-07 Thread Hao Sun
Anything I can do for the job reschedule case? Thanks. Or is there a way to add job lifecycle hooks to trace it? On Mon, Dec 4, 2017 at 12:01 PM Hao Sun wrote: > Thanks Fabian, there is one case can not be covered by the REST API. When > a job rescheduled to run, but jobid will change,

Re: Trace jar file name from jobId, is that possible?

2017-12-07 Thread Hao Sun
I mean restarted during failure recovery On Thu, Dec 7, 2017 at 8:29 AM Fabian Hueske wrote: > What do you mean by rescheduled? > Started from a savepoint or restarted during failure recovery? > > > 2017-12-07 16:59 GMT+01:00 Hao Sun : > >> Anything I can do for the job

Re: Trace jar file name from jobId, is that possible?

2017-12-07 Thread Hao Sun
Let me check details, on top of my mind I remember the job id changes, I might be wrong. On Thu, Dec 7, 2017, 08:48 Fabian Hueske wrote: > AFAIK, a job keeps its ID in case of a recovery. > Did you observe something else? > > 2017-12-07 17:32 GMT+01:00 Hao Sun : > >> I

Re: [ANNOUNCE] Apache Flink 1.4.0 released

2017-12-12 Thread Hao Sun
Congratulations! Awesome work. Two quick questions about the HDFS free feature. I am using S3 to store checkpoints, savepoints, and I know it is being done through hadoop-aws. - Do I have to include a hadoop-aws jar in my flatjar AND flink's lib directory to make it work for 1.4? Both or just the

Could not flush and close the file system output stream to s3a, is this fixed?

2017-12-12 Thread Hao Sun
https://issues.apache.org/jira/browse/FLINK-7590 I have a similar situation with Flink 1.3.2 on K8S = 2017-12-13 00:57:12,403 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: KafkaSource(maxwell.tickets) -> MaxwellFilter->Maxwell(maxwell.tickets) -> FixedDelayWatermar

org.apache.zookeeper.ClientCnxn, Client session timed out

2017-12-27 Thread Hao Sun
Hi I need some help to figure out the root cause of this error. I am running flink 1.3.2 on K8S. My cluster has been up and running for almost two weeks and all of a sudden I see this familiar error again, my task manager is killed/lost. There are many ways cause this error, I need help to figure

Re: org.apache.zookeeper.ClientCnxn, Client session timed out

2017-12-27 Thread Hao Sun
Dec 27, 2017 at 4:41 PM, Hao Sun wrote: > >> Somehow TM detected JM leadership loss from ZK and self disconnected? >> And couple of seconds later, JM failed to connect to ZK? >> > > Yes, exactly as you describe. The TM noticed the loss of leadership before > the

Re: org.apache.zookeeper.ClientCnxn, Client session timed out

2017-12-28 Thread Hao Sun
Ok, thanks for the clarification. On Thu, Dec 28, 2017 at 1:05 AM Ufuk Celebi wrote: > On Thu, Dec 28, 2017 at 12:11 AM, Hao Sun wrote: > > Thanks! Great to know I do not have to worry duplicates inside Flink. > > > > One more question, why this happens? Because

scala 2.12 support/cross-compile

2018-01-02 Thread Hao Sun
Hi team, I am wondering if there is a schedule to support scala 2.12? If I need flink 1.3+ with scala 2.12, do I just have to cross compile myself? Is there anything blocking us from using scala 2.12? Thanks

Re: scala 2.12 support/cross-compile

2018-01-03 Thread Hao Sun
or Spark, which track their respective progress here: > https://issues.apache.org/jira/browse/SPARK-14540 > <https://issues.apache.org/jira/browse/SPARK-14540>. > > Best, > Aljoscha > > > On 3. Jan 2018, at 10:39, Stephan Ewen wrote: > > Hi Hao Sun! > > Th

akka.remote.ShutDownAssociation: Shut down address: akka.tcp://flink@fps-flink-jobmanager:45652

2018-01-07 Thread Hao Sun
I am running Flink 1.3.2 in my local docker environment. I see this error, not sure how to find the root cause. I am confused by this error message, why JM is trying to connect to JM from one random port to the RPC port: 6123? 2018-01-08 05:38:03,294 ERROR akka.remote.EndpointWriter - Associ

Re: state.checkpoints.dir

2018-01-22 Thread Hao Sun
We generate flink.conf on the fly, so we can use different values based on environment. On Mon, Jan 22, 2018 at 12:53 PM Biswajit Das wrote: > Hello , > > Is there any hack to supply *state.checkpoints.*dir as argument or JVM > parameter when running locally . I can change the source > *Checkp

Re: [ANNOUNCE] Apache Flink 1.4.1 released

2018-02-15 Thread Hao Sun
This is great! On Thu, Feb 15, 2018 at 2:50 PM Bowen Li wrote: > Congratulations everyone! > > On Thu, Feb 15, 2018 at 10:04 AM, Tzu-Li (Gordon) Tai > wrote: > >> The Apache Flink community is very happy to announce the release of >> Apache Flink 1.4.1, which is the first bugfix release for the

Can not cancel with savepoint with Flink 1.3.2

2018-03-15 Thread Hao Sun
Hi, I am running flink on K8S and store states in s3 with rocksdb backend. I used to be able to cancel and savepointing through the rest api. But sometimes the process never finish. No matter how many time I try. Is there a way to figure out what is going wrong? Why "isStoppable"=>false? Thanks

Re: Can not cancel with savepoint with Flink 1.3.2

2018-03-15 Thread Hao Sun
encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. On Thu, Mar 15, 2018 at 8:38 PM Hao Sun wrote: > Hi, I am running flink on K8S and store states in s3 with rocksdb ba

How can I confirm a savepoint is used for a new job?

2018-03-21 Thread Hao Sun
Do we have any logs in JM/TM indicate the job is using a savepoint I passed in when I submit the job? Thanks

Re: How can I confirm a savepoint is used for a new job?

2018-03-26 Thread Hao Sun
ng job from savepoint ... > > Regards, > Timo > > Am 22.03.18 um 06:45 schrieb Hao Sun: > > Do we have any logs in JM/TM indicate the job is using a savepoint I > > passed in when I submit the job? > > > > Thanks > > >

Re: Flink and Docker ?

2018-04-03 Thread Hao Sun
Hi, we are using this docker on K8S + S3. https://github.com/docker-flink/docker-flink It works fine for us. On Tue, Apr 3, 2018 at 1:00 AM Christophe Salperwyck < christophe.salperw...@gmail.com> wrote: > Hi, > > I didn't try docker with Flink but I know that those guys did: > https://github.co

Re: Temporary failure in name resolution

2018-04-03 Thread Hao Sun
Hi Timo, we do have similar issue, TM got killed by a job. Is there a way to monitor JVM status? If through the monitor metrics, what metric I should look after? We are running Flink on K8S. Is there a possibility that a job consumes too much network bandwidth, so JM and TM can not connect? On Tue

Re: java.lang.Exception: TaskManager was lost/killed

2018-04-09 Thread Hao Sun
Same story here, 1.3.2 on K8s. Very hard to find reasons on why a TM is killed. Not likely caused by memory leak. If there is a logger I have turn on please let me know. On Mon, Apr 9, 2018, 13:41 Lasse Nedergaard wrote: > We see the same running 1.4.2 on Yarn hosted on Aws EMR cluster. The only

Re: keyBy and parallelism

2018-04-11 Thread Hao Sun
>From what I learnt, you have to control parallelism your self. You can set parallelism on operator or set default one through flink-config.yaml. I might be wrong. On Wed, Apr 11, 2018 at 2:16 PM Christophe Jolif wrote: > Hi all, > > Imagine I have a default parallelism of 16 and I do something

Re: [ANNOUNCE] Apache Flink 1.5.0 release

2018-05-25 Thread Hao Sun
This is great. Thanks for the effort to get this out! On Fri, May 25, 2018 at 9:47 AM Till Rohrmann wrote: > The Apache Flink community is very happy to announce the release of Apache > Flink 1.5.0. > > Apache Flink® is an open-source stream processing framework for > distributed, high-performin

Flink 1.5, failed to instantiate S3 FS

2018-06-01 Thread Hao Sun
I can not find anywhere I have 100M. Not sure why I get this failure. This is in my dev docker env. Same configure file worked well for 1.3.2 = Log Caused by: org.apache.flink.util.FlinkException: Failed to submit job aa75905062dd0487034bb9d8b6617dc2. at org.apache.flink.runtime

Do I still need hadoop-aws libs when using Flink 1.5 and Presto?

2018-06-01 Thread Hao Sun
I am trying to figure out how to use S3 as state storage. The recommended way is https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/aws.html#shaded-hadooppresto-s3-file-systems-recommended Seems like I only have to do two things: *1. Put flink-s3-fs-presto to the lib* *2. C

Re: Flink 1.5, failed to instantiate S3 FS

2018-06-02 Thread Hao Sun
Thanks Amit for checking. I do not use hadoop, but I am using Flink with bundled HDP 2.8 binary. I think this article is right, I mixed 2.7 lib and 2.8 binary somehow. On Sat, Jun 2, 2018 at 1:05 AM Amit Jain wrote: > Hi Hao, > > Have look over > https://issues.apache.org/jira/browse/HADOOP-1381

Re: Do I still need hadoop-aws libs when using Flink 1.5 and Presto?

2018-06-05 Thread Hao Sun
> > Also, could you also post the full stack trace, please? > > Best, > Aljoscha > > > On 2. Jun 2018, at 07:34, Hao Sun wrote: > > I am trying to figure out how to use S3 as state storage. > The recommended way is > https://ci.apache.org/projects/flink/flink-docs-

Re: Do I still need hadoop-aws libs when using Flink 1.5 and Presto?

2018-06-05 Thread Hao Sun
d yes, you add config values to the flink config as > s3.xxx. > > Best, > Aljoscha > > > On 5. Jun 2018, at 18:23, Hao Sun wrote: > > Thanks for pick up my question. I had s3a in the config now I removed it. > I will post a full trace soon, but want to get some quest

Re: Do I still need hadoop-aws libs when using Flink 1.5 and Presto?

2018-06-05 Thread Hao Sun
also a follow up question. Can I use all properties here? Should I remove `hive.` for all the keys? https://prestodb.io/docs/current/connector/hive.html#hive-configuration-properties More specifically how I configure sse for s3? On Tue, Jun 5, 2018 at 11:33 AM Hao Sun wrote: > I do not h

Re: Do I still need hadoop-aws libs when using Flink 1.5 and Presto?

2018-06-05 Thread Hao Sun
After I added these to my flink-conf.yml, everything works now. s3.sse.enabled: true s3.sse.type: S3 Thanks for the help! In general I also want to know what config keys for presto-s3 I can use. On Tue, Jun 5, 2018 at 11:43 AM Hao Sun wrote: > also a follow up question. Can I use

Re: [DISCUSS] Flink 1.6 features

2018-06-05 Thread Hao Sun
adding my vote to K8S Job mode, maybe it is this? > Smoothen the integration in Container environment, like "Flink as a Library", and easier integration with Kubernetes services and other proxies. On Mon, Jun 4, 2018 at 11:01 PM Ben Yan wrote: > Hi Stephan, > > Will [ https://issues.apache.or

StandaloneResourceManager failed to associate with JobManager leader

2017-08-15 Thread Hao Sun
Hi, I am trying to run a cluster of job-manager and task-manager in docker. One of each for now. I got a StandaloneResourceManager error, stating that it can not associate with job-manager. I do not know what was wrong. I am sure that job-manager can be connected. ===

Flink HA with Kubernetes, without Zookeeper

2017-08-20 Thread Hao Sun
Hi, I am new to Flink and trying to bring up a Flink cluster on top of Kubernetes. For HA setup, with kubernetes, I think I just need one job manager and do not need Zookeeper? I will store all states to S3 buckets. So in case of failure, kubernetes can just bring up a new job manager without losi

Re: Flink HA with Kubernetes, without Zookeeper

2017-08-21 Thread Hao Sun
;> where the JobManager stores information which needs to be recovered after >> the JobManager fails. >> >> We're eyeing https://github.com/coreos/zetcd >> <https://github.com/coreos/zetcd> as a way to run >> Zookeeper on top of Kubernetes' etcd cl

Re: StandaloneResourceManager failed to associate with JobManager leader

2017-08-22 Thread Hao Sun
Thanks Till, the DEBUG log level is a good idea. I figured it out. I made a mistake with `-` and `_`. On Tue, Aug 22, 2017 at 1:39 AM Till Rohrmann wrote: > Hi Hao Sun, > > have you checked that one can resolve the hostname flink_jobmanager from > within the container? This is

Re: Flink HA with Kubernetes, without Zookeeper

2017-08-22 Thread Hao Sun
> If you don’t want to actually rip way into the code for the Job Manager > the ETCD Operator <https://github.com/coreos/etcd-operator> would > be a good way to bring up an ETCD cluster that is separate from the core > Kubernetes ETCD database. Combined with zetcd you could probably hav

CheckpointConfig says to persist periodic checkpoints, but no checkpoint directory has been configured.

2017-09-25 Thread Hao Sun
Hi I am running flink in dev mode through Intellij, I have flink-conf.yaml correctly configured and from the log you can see job manager is reading it. 2017-09-25 20:41:52.255 [main] INFO org.apache.flink.configuration.GlobalConfiguration - *Loading configuration property: state.backend, rocksdb

Re: CheckpointConfig says to persist periodic checkpoints, but no checkpoint directory has been configured.

2017-09-26 Thread Hao Sun
ld try using > StreamExecutionEnvironment.createLocalEnvironment(int, Configuration) to > manually specify a configuration. > > Best, > Aljoscha > > On 26. Sep 2017, at 05:49, Hao Sun wrote: > > Hi I am running flink in dev mode through Intellij, I have flink-conf.yaml > correctly c

Re: CheckpointConfig says to persist periodic checkpoints, but no checkpoint directory has been configured.

2017-09-26 Thread Hao Sun
Thanks, I will try that. On Tue, Sep 26, 2017 at 8:24 AM Aljoscha Krettek wrote: > I'm not sure whether the JM is reading it or not. But you can manually set > the values on the Configuration using the setter methods. > > > On 26. Sep 2017, at 16:58, Hao Sun wrote: > >

javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated for S3 access

2017-10-03 Thread Hao Sun
I am using S3 for checkpointing and external ckp as well. s3a://bucket/checkpoints/e58d369f5a181842768610b5ab6a500b I have this exception, and not sure what I can do with it. I guess to configure hadoop to use some SSLFactory? I am not using hadoop, I am on kubernetes (in AWS) with S3 Thanks!

Re: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated for S3 access

2017-10-04 Thread Hao Sun
n on the > server/client may be the cause. > > Which Flink binary (specifically, for which hadoop version) are you using? > > > On 03.10.2017 20:48, Hao Sun wrote: > > com.amazonaws.http.AmazonHttpClient - Unable to > execute H

Re: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated for S3 access

2017-10-04 Thread Hao Sun
Here is what my docker file says: ENV FLINK_VERSION=1.3.2 \ HADOOP_VERSION=27 \ SCALA_VERSION=2.11 \ On Wed, Oct 4, 2017 at 8:23 AM Hao Sun wrote: > I am running Flink 1.3.2 with docker on kubernetes. My docker is using > openjdk-8, I do not have hadoop, the version is 2.7, sc

TM get killed/disconnected after a while

2017-10-06 Thread Hao Sun
Hi, I am running Flink 1.3.2 on kubernetes, I am not sure why sometime one of my TM is killed, is there a way to debug this? Thanks = Logs *2017-10-05 22:36:42,631 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at fps-flink-taskmanager-2384273

How to make my execution graph prettier?

2017-10-09 Thread Hao Sun
Hi my execution graph looks like following, all things stuffed into on tile.[image: image.png] How can I get something like this?

Re: How to make my execution graph prettier?

2017-10-10 Thread Hao Sun
e is no shuffle between operations > and when the parallelism is the same (roughly speaking). > > If you wan't the graph to have separate tasks, you can disable chaining on > the Flink ExecutionConfig. This can lead to worse performance, though. > > Best, > Aljoscha > > On

  1   2   >