Re: How many task managers can Flink efficiently scale to?

2019-08-11 Thread qi luo
Hi Chad, In our cases, 1~2k TMs with up to ~10k TM slots are used in one job. In general, the CPU/memory of Job Manager should be increased with more TMs. Regards, Qi > On Aug 11, 2019, at 2:03 AM, Chad Dombrova wrote: > > Hi, > I'm still on my task management investigation, and I'm curious t

Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-22 Thread qi luo
Congratulations and thanks for the hard work! Qi > On Aug 22, 2019, at 8:03 PM, Tzu-Li (Gordon) Tai wrote: > > The Apache Flink community is very happy to announce the release of Apache > Flink 1.9.0, which is the latest major release. > > Apache Flink® is an open-source stream processing fra

Streaming write to Hive

2019-09-04 Thread Qi Luo
Hi guys, In Flink 1.9 HiveTableSink is added to support writing to Hive, but it only supports batch mode. StreamingFileSink can write to HDFS in streaming mode, but it has no Hive related functionality (e.g. adding Hive partition). Is there any easy way we can streaming write to Hive (with exactl

Re: Streaming write to Hive

2019-09-04 Thread Qi Luo
ther tweaks you need to make. > > It's on our list for 1.10, not high priority though. > > Bowen > > On Wed, Sep 4, 2019 at 2:23 AM Qi Luo wrote: > >> Hi guys, >> >> In Flink 1.9 HiveTableSink is added to support writing to Hive, but it >> only support

Re: Streaming write to Hive

2019-09-05 Thread Qi Luo
i > Send Time:2019年9月6日(星期五) 05:21 > To:Qi Luo > Cc:user ; snake.fly318 ; > lichang.bd > Subject:Re: Streaming write to Hive > > Hi, > > I'm not sure if there's one yet. Feel free to create one if not. > > On Wed, Sep 4, 2019 at 11:28 PM Qi Luo wrote: > Hi

Re: Flink Exception - assigned slot container was removed

2018-11-26 Thread qi luo
This is weird. Could you paste your entire exception trace here? > On Nov 26, 2018, at 4:37 PM, Flink Developer > wrote: > > In addition, after the Flink job has failed from the above exception, the > Flink job is unable to recover from previous checkpoint. Is this the expected > behavior? Ho

Re: Cannot configure akka.ask.timeout

2018-12-11 Thread qi luo
Hi Alex and Lukas, This error is controlled by another RPC timeout (which is hard coded and not affected by “akka.ask.timeout”). Could you open an JIRA issue so I can propose a fix on that? Cheers, Qi > On Dec 12, 2018, at 7:07 AM, Alex Vinnik wrote: > > Hi there, > > Run into the same prob

Re: Cannot configure akka.ask.timeout

2018-12-12 Thread qi luo
om>> wrote: > Hi Qi, > > Thanks for looking into this. Here is ticket > https://issues.apache.org/jira/browse/FLINK-11143 > <https://issues.apache.org/jira/browse/FLINK-11143> > > Best, > -Alex > > On Tue, Dec 11, 2018 at 8:47 PM qi luo <mailto:lu

Re: Cannot configure akka.ask.timeout

2018-12-13 Thread qi luo
MERATE_NESTED_FILES_FLAG, > true); > > JsonLinesInputFormat jsonInputFormat = new JsonLinesInputFormat(new > Path(inputPath), configuration); > jsonInputFormat.setFilesFilter(new BucketingSinkFilter()); > > DataSet input = env.readFile(jsonInputFormat, > inputPath).withParameters(co

Re: Serious stability issues when running on YARN (Flink 1.7.0)

2018-12-20 Thread qi luo
Hi Gyula, Your issue is possibly related to [1] that slots prematurely released. I’ve raised a PR which is still pending review. [1] https://issues.apache.org/jira/browse/FLINK-10941 > On Dec 20, 2018, at 9:33 PM, Gyula Fóra wrote: > > Hi! > > Since we have moved to the new execution mode wi

Re: Is there a better way to debug ClassNotFoundException from FlinkUserCodeClassLoaders?

2019-01-02 Thread qi luo
Hi Hao, Since Flink is using Child-First class loader, you may try search for the class "com.zendesk.fraudprevention.examples.ConnectedStreams$$anon$90$$anon$45” in your fat JAR. Is that an inner class? Best, Qi > On Jan 3, 2019, at 7:01 AM, Hao Sun wrote: > > Hi, > > I am wondering if the

Set partition number of Flink DataSet

2019-03-11 Thread qi luo
Hi, We’re trying to distribute batch input data to (N) HDFS files partitioning by hash using DataSet API. What I’m doing is like: env.createInput(…) .partitionByHash(0) .setParallelism(N) .output(…) This works well for small number of files. But when we need to distribute to

Re: Set partition number of Flink DataSet

2019-03-12 Thread qi luo
arallelizing the reads from the files, given > the parallelism that you’re specifying. > > — Ken > > >> On Mar 11, 2019, at 5:42 AM, qi luo > <mailto:luoqi...@gmail.com>> wrote: >> >> Hi, >> >> We’re trying to distribute batch input data

Re: Set partition number of Flink DataSet

2019-03-13 Thread qi luo
//ci.apache.org/projects/flink/flink-docs-release-1.7/api/java/org/apache/flink/api/common/functions/MapPartitionFunction.html> > would be easiest, I think. > > — Ken > > > >> On Mar 12, 2019, at 2:28 AM, qi luo > <mailto:luoqi...@gmail.com>> wrote: >

Re: Set partition number of Flink DataSet

2019-03-13 Thread qi luo
n Mar 13, 2019, at 1:26 AM, qi luo > <mailto:luoqi...@gmail.com>> wrote: >> >> Hi Ken, >> >> Do you mean that I can create a batch sink which writes to N files? > > Correct. > >> That sounds viable, but since our data size is huge (billions of r

Re: Set partition number of Flink DataSet

2019-03-14 Thread qi luo
/flink-utils/>, for a rough but working > version of a bucketing sink. > > — Ken > > >> On Mar 13, 2019, at 7:46 PM, qi luo > <mailto:luoqi...@gmail.com>> wrote: >> >> Hi Ken, >> >> Agree. I will try partitonBy() to reducer the number

Re: Set partition number of Flink DataSet

2019-03-15 Thread qi luo
uffles are an important building block for this and there might be > better support for your use case in the future. > > Best, Fabian > > Am Fr., 15. März 2019 um 03:56 Uhr schrieb qi luo <mailto:luoqi...@gmail.com>>: > Hi Ken, > > That looks awesome! I’ve

Re: Set partition number of Flink DataSet

2019-03-21 Thread qi luo
K-10429 > <https://issues.apache.org/jira/browse/FLINK-10429> > > > Am Fr., 15. März 2019 um 12:13 Uhr schrieb qi luo <mailto:luoqi...@gmail.com>>: > Hi Fabian, > > I understand this is a by-design behavior, since Flink is firstly built for > streaming. Su

Re: RemoteTransportException: Connection unexpectedly closed by remote task manager

2019-03-28 Thread qi luo
Hi Yinhua, This looks like the TM executing the sink is down, maybe due to OOM or some other error. You can check the JVM heap and GC log to see if there’re any clues. Regards, Qi > On Mar 28, 2019, at 7:23 PM, yinhua.dai wrote: > > Hi, > > I write a single flink job with flink SQL with vers

Infinitely requesting for Yarn container in Flink 1.5

2019-03-29 Thread qi luo
Hello, Today we encountered an issue where our Flink job request for Yarn container infinitely. In the JM log as below, there were errors when starting TMs (caused by underlying HDFS errors). So the allocated container failed and the job kept requesting for new containers. The failed containers

Re: Infinitely requesting for Yarn container in Flink 1.5

2019-03-31 Thread qi luo
> > Thanks, > Rong > > [1] https://issues.apache.org/jira/browse/FLINK-10868 > <https://issues.apache.org/jira/browse/FLINK-10868> > On Fri, Mar 29, 2019 at 5:09 AM qi luo <mailto:luoqi...@gmail.com>> wrote: > Hello, > > Today we encountered an issue

Re: Any suggestions about which GC collector to use in Flink?

2019-04-02 Thread qi luo
+1. It would be great if someone could benchmark between difference GC in Flink (we may do it in next few months). I’m told that the default parallel GC provides better throughput but longer pauses (we encountered 2min+ GC pauses in large dataset). Whereas the G1GC provides less pauses but also

TM occasionally hang in deploying state in Flink 1.5

2019-04-19 Thread qi luo
Hi all, We use Flink 1.5 batch and start thousands of jobs per day. Occasionally we observed some stuck jobs, due to some TM hang in “DEPLOYING” state. On checking TM log, it shows that it stuck in downloading jars in BlobClient: ... INFO org.apache.flink.runtime.taskexecutor.TaskExecuto

Re: TM occasionally hang in deploying state in Flink 1.5

2019-04-25 Thread qi luo
.run(Thread.java:748) I checked the latest master code. There’s still no socket timeout in Blob client. Should I create an issue to add this timeout? Regards, Qi > On Apr 19, 2019, at 7:49 PM, qi luo wrote: > > Hi all, > > We use Flink 1.5 batch and start thousands o

Re: TM occasionally hang in deploying state in Flink 1.5

2019-05-07 Thread qi luo
ly and isn't easy to reproduce. > On Apr 25, 2019, at 6:40 PM, Dawid Wysakowicz wrote: > > Hi, > > Feel free to open a JIRA for this issue. By the way have you investigated > what is the root cause for it hanging? > > Best, > > Dawid > > On 25/04/2019 0

Re: Batch mode with Flink 1.8 unstable?

2019-06-26 Thread qi luo
Hi Stephan, We have met similar issues described as Ken. Would all these issues be hopefully fixed in 1.9? Thanks, Qi > On Jun 26, 2019, at 10:50 PM, Stephan Ewen wrote: > > Hi Ken! > > Sorry to hear you are going through this experience. The major focus on > streaming so far means that the

Re: YarnResourceManager unresponsive under heavy containers allocations

2019-07-09 Thread qi luo
> other reasons? Or you can try to increase the value of > `taskmanager.registration.timeout`. For allocating containers using > multi-thread, I personally think it's going to get very complicated, and the > more recommended way is to put some waiting works into asynchronous

Re: YarnResourceManager unresponsive under heavy containers allocations

2019-07-10 Thread qi luo
gt; need to wait until the release code branch being cut. > > Thank you~ > Xintong Song > > > On Wed, Jul 10, 2019 at 1:55 PM qi luo <mailto:luoqi...@gmail.com>> wrote: > Thanks Xintong and Haibo, I’ve found the fix in the blink branch. > > We’re also gl

Job leak in attached mode (batch scenario)

2019-07-15 Thread qi luo
Hi guys, We runs thousands of Flink batch job everyday. The batch jobs are submitted in attached mode, so we can know from the client when the job finished and then take further actions. To respond to user abort actions, we submit the jobs with "—shutdownOnAttachedExit” so the Flink cluster can

Re: Job leak in attached mode (batch scenario)

2019-07-17 Thread qi luo
chanism now. To achieve this, I think it > may be necessary to add a REST-based heartbeat mechanism between Dispatcher > and Client. At present, perhaps you can add a monitoring service to deal with > these residual Flink clusters. > > Best, > Haibo > > At 2019-07-