Re: MaxMetaspace default may be to low?

2020-02-24 Thread Xintong Song
In that case, I think the default metaspace size is too small for you setup. The default configurations are not intended for such large task managers. In Flink 1.8 we do not set the JVM '-XX:MaxMetaspaceSize' parameter, which means you have 'unlimited' metaspace size. We changed that in Flink 1.10

state schema evolution for case classes

2020-02-24 Thread ApoorvK
Hi Team, Earlier we have developed on flink 1.6.2 , So there are lots of case classes which have Map,Nested case class within them for example below : case class MyCaseClass(var a: Boolean, var b: Boolean, var c: Boolean,

AW: Timeseries aggregation with many IoT devices off of one Kafka topic.

2020-02-24 Thread theo.diefent...@scoop-software.de
Hi, At last flink forward in Berlin I spoke with some persons about the same problem, where they had construction devices as IoT sensors which could even be offline for multiple days. They told me that the major problem for them was that the watermark in Flink is maintained per operator /subtask, e

Re: MaxMetaspace default may be to low?

2020-02-24 Thread John Smith
I would like to also add the same exact jobs on Flink 1.8 where running perfectly fine. On Tue, 25 Feb 2020 at 00:20, John Smith wrote: > Right after Job execution. Basically as soon as I deployed a 5th job. So > at 4 jobs it was ok, at 5 jobs it would take like 1-2 minutes max and the > node wo

Re: MaxMetaspace default may be to low?

2020-02-24 Thread John Smith
Right after Job execution. Basically as soon as I deployed a 5th job. So at 4 jobs it was ok, at 5 jobs it would take like 1-2 minutes max and the node would just shut off. So far with MaxMetaSpace 256m it's been stable. My task nodes are 16GB and the memory config is done as follows... taskmanager

Re: Timeseries aggregation with many IoT devices off of one Kafka topic.

2020-02-24 Thread hemant singh
Hello, I am also working on something similar. Below is the pipeline design I have, sharing may be it can be helpful. topic -> keyed stream on device-id -> window operation -> sink. You can PM me on further details. Thanks, Hemant On Tue, Feb 25, 2020 at 1:54 AM Marco Villalobos wrote: > I n

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

2020-02-24 Thread Yang Wang
Hi M Singh, > Mans - If we use the session based deployment option for K8 - I thought > K8 will automatically restarts any failed TM or JM. > In the case of failed TM - the job will probably recover, but in the case > of failed JM - perhaps we need to resubmit all jobs. > Let me know if I have mis

Re: MaxMetaspace default may be to low?

2020-02-24 Thread Xintong Song
Hi John, The default metaspace size is intend for working with a major proportion of jobs. We are aware that for some jobs that need to load lots of classes, the default value might not be large enough. However, having a larger default value means for other jobs that do not load many classes, the

Re: Colocating Sub-Tasks across jobs / Sharing Task Slots across jobs

2020-02-24 Thread Xintong Song
Hi Ben, You can not share slots across jobs. Flink adopts a two-level slot scheduling mechanism. Slots are firstly allocated to each job, then the JobMaster decides which tasks should be executed in which slots, i.e. slot sharing. I think what you are looking for is Pipelined Region Restart Strat

Re: yarn session: one JVM per task

2020-02-24 Thread Xintong Song
Depending on your Flink version, the '-n' option might not take effect. It is removed in the latest release, but before that there were a few versions where this option is neither removed nor taking effect. Anyway, as long as you have multiple containers, I don't think there's a way to make some o

MaxMetaspace default may be to low?

2020-02-24 Thread John Smith
Hi, I just upgraded to 1.10 and I started deploying my jobs. Eventually task nodes started shutting down with OutOfMemory Metaspace. I look at the logs and the task managers are started with: -XX:MaxMetaspaceSize=100663296 So I configed: taskmanager.memory.jvm-metaspace.size: 256m It seems to be

Re: MODERATE for d...@flink.apache.org

2020-02-24 Thread Henry Saputra
Hi Sri, Thank you for your interest with Apache Flink. To continue to interact with people in the mailing list, please subscribe to the list [1] to make sure your posts are delivered to the right list. Thanks, Henry [1] https://flink.apache.org/community.html#how-to-subscribe-to-a-mailing-list

Batch Flink Job S3 write performance vs Spark

2020-02-24 Thread sri hari kali charan Tummala
Hi All, have a question did anyone compared the performance of Flink batch job writing to s3 vs spark writing to s3? -- Thanks & Regards Sri Tummala

Re: AWS Client Builder with default credentials

2020-02-24 Thread Suneel Marthi
Not sure if this helps - this is how I invoke a Sagemaker endpoint model from a flink pipeline. See https://github.com/smarthi/NMT-Sagemaker-Inference/blob/master/src/main/java/de/dws/berlin/util/AwsUtil.java On Mon, Feb 24, 2020 at 10:08 AM David Magalhães wrote: > Hi Robert, thanks for your

Re: AWS Client Builder with default credentials

2020-02-24 Thread sri hari kali charan Tummala
check this. https://github.com/kali786516/FlinkStreamAndSql/blob/b8bcbadaa3cb6bfdae891f10ad1205e256adbc1e/src/main/scala/com/aws/examples/dynamodb/dynStreams/FlinkDynamoDBStreams.scala#L42 https://github.com/kali786516/FlinkStreamAndSql/blob/b8bcbadaa3cb6bfdae891f10ad1205e256adbc1e/src/main/scala

Re: Apache Flink Job fails repeatedly due to RemoteTransportException

2020-02-24 Thread M Singh
Thanks will try your recommendations and apologize for the delayed response. On Wednesday, January 29, 2020, 09:58:26 AM EST, Till Rohrmann wrote: Hi M Singh, have you checked the TaskManager logs of  ip-xx-xxx-xxx-xxx.ec2.internal/xx.xxx.xxx.xxx:39623 for any suspicious logging statem

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

2020-02-24 Thread M Singh
Thanks Wang for your detailed answers. >From what I understand the native_kubernetes also leans towards creating a >session and submitting a job to it.   Regarding other recommendations, please my inline comments and advice. On Sunday, February 23, 2020, 10:01:10 PM EST, Yang Wang wrote:

Getting javax.management.InstanceAlreadyExistsException when upgraded to 1.10

2020-02-24 Thread John Smith
Hi. Just upgraded to 1.10.0 And getting the bellow error when I deploy my tasks. The first 1 seems to deploy ok, but subsequent ones seem to this throw this error. But The seem to work still. javax.management.InstanceAlreadyExistsException: kafka.consumer:type=app-info,id=consumer-2 at com.sun.jm

Timeseries aggregation with many IoT devices off of one Kafka topic.

2020-02-24 Thread Marco Villalobos
I need to collect timeseries data from thousands of IoT devices. Each device has name, value, and timestamp published to one Kafka topic. The event time timestamps are in order only relation with the individual device, but out of order with respect to other devices. Is there a way to aggregate

Map Of DataStream getting NullPointer Exception

2020-02-24 Thread aj
I am trying below piece of code to create multiple datastreams object and store in map. for (EventConfig eventConfig : eventTypesList) { LOGGER.info("creating a stream for ", eventConfig.getEvent_name()); String key = eventConfig.getEvent_name(); final Streaming

Re: async io parallelism

2020-02-24 Thread Alexey Trenikhun
Arvid, thank you. So there is single instance of FIFO per async IO operator regardless of parallelism of the async IO operator? Thanks, Alexey From: Arvid Heise Sent: Saturday, February 22, 2020 1:23:01 PM To: Alexey Trenikhun Cc: user@flink.apache.org Subject:

Colocating Sub-Tasks across jobs / Sharing Task Slots across jobs

2020-02-24 Thread Benoît Paris
Hello all! I have a setup composed of several streaming pipelines. These have different deployment lifecycles: I want to be able to modify and redeploy the topology of one while the other is still up. I am thus putting them in different jobs. The problem is I have a Co-Location constraint between

Re: AWS Client Builder with default credentials

2020-02-24 Thread David Magalhães
Hi Robert, thanks for your reply. GlobalConfiguration.loadConfiguration was useful to check if a flink-conf.yml file was on resources, for the integration tests that I'm doing. On the cluster I will use the default configurations. On Fri, Feb 21, 2020 at 10:58 AM Robert Metzger wrote: > There a

Re: yarn session: one JVM per task

2020-02-24 Thread David Morin
Hi, Thanks Xintong. I've noticed than when I use yarn-session.sh with --slots (-s) parameter but without --container (-n) it creates one task/slot per taskmanager. Before with the both n and -s it was not the case. I prefer to use only small container with only one task to scale my pipeline and

Re: StreamingFileSink Not Flushing All Data

2020-02-24 Thread Kostas Kloudas
Hi Austin, Dawid is correct in that you need to enable checkpointing for the StreamingFileSink to work. I hope this solves the problem, Kostas On Mon, Feb 24, 2020 at 11:08 AM Dawid Wysakowicz wrote: > > Hi Austing, > > If I am not mistaken the StreamingFileSink by default flushes on checkpoint

Re: StreamingFileSink Not Flushing All Data

2020-02-24 Thread Dawid Wysakowicz
Hi Austing, If I am not mistaken the StreamingFileSink by default flushes on checkpoints. If you don't have checkpoints enabled it might happen that not all data is flushed. I think you can also adjust that behavior with: forBulkFormat(...) .withRollingPolicy(/* your custom logic */) I also cc

Java implementations of Streaming applications for Flink

2020-02-24 Thread Piper Piper
Hi all, The examples in the Flink github repo do not seem to include many standard streaming applications compared to the batch examples. Where can I get standard (recommended) Java implementations of “Streaming” applications for Flink, that are clearly: (1) CPU-intensive, like streaming PageRank

Re: [DISCUSS] Drop Savepoint Compatibility with Flink 1.2

2020-02-24 Thread Congxian Qiu
+1 for dropping savepoint compatibility with Flink 1.2 Best, Congxian Dawid Wysakowicz 于2020年2月24日周一 下午4:00写道: > +1 for dropping > > Best, > > Dawid > On 24/02/2020 08:22, Yu Li wrote: > > +1 for dropping savepoint compatibility with Flink 1.2. > > Best Regards, > Yu > > > On Sat, 22 Feb 2020

Re: [DISCUSS] Drop Savepoint Compatibility with Flink 1.2

2020-02-24 Thread Dawid Wysakowicz
+1 for dropping Best, Dawid On 24/02/2020 08:22, Yu Li wrote: > +1 for dropping savepoint compatibility with Flink 1.2. > > Best Regards, > Yu > > > On Sat, 22 Feb 2020 at 22:05, Ufuk Celebi > wrote: > > Hey Stephan, > > +1. > > Reading over the linked ticket