Re: Re: Questions of "State Processing API in Scala"

2020-09-01 Thread Matthias Pohl
for the State > >Processor API, but it's just that up to this point, we didn't have a plan > >for that yet. > >Can you open a JIRA for this? I think it'll be a reasonable extension to > >the API. > > > > > >> > >> And when I c

Re: How can I drop events which are late by more than X hours/days?

2020-09-24 Thread Matthias Pohl
the difference between the watermark and my element's timestamp is > greater than X - drop the element. > > However, I do not have access to the current watermark inside any of > Flink's operators/functions including FilterFunction. > > How can such functionality be

Re: global state and single stream

2020-09-24 Thread Matthias Pohl
Hi Adam, sorry for the late reply. Introducing a global state is something that should be avoided as it introduces bottlenecks and/or concurrency/order issues. Broadcasting the state between different subtasks will also bring a loss in performance since each state change has to be shared with every

Re: Ignoring invalid values in KafkaSerializationSchema

2020-09-24 Thread Matthias Pohl
Hi Yuval, thanks for bringing this issue up. You're right: There is no error handling currently implemented for SerializationSchema. FLIP-124 [1] addressed this for the DeserializationSchema, though. I created FLINK-19397 [2] to cover this feature. In the meantime, I cannot think of any other solu

Fwd: Flink memory usage monitoring

2020-10-27 Thread Matthias Pohl
I missed adding the mailing list in my previous email. -- Forwarded message - From: Matthias Pohl Date: Tue, Oct 27, 2020 at 12:39 PM Subject: Re: Flink memory usage monitoring To: Rajesh Payyappilly Jose Hi Rajesh, thanks for reaching out to us. We worked on providing metrics

Re: Kubernetes Job Cluster, does it autoterminate?

2020-10-28 Thread Matthias Pohl
Hi Ruben, thanks for reaching out to us. Flink's native Kubernetes Application mode [1] might be what you're looking for. Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#flink-kubernetes-application On Wed, Oct 28, 2020 at 11:

Re: Running flink in a Local Execution Environment for Production Workloads

2020-10-29 Thread Matthias Pohl
Hi Joseph, thanks for reaching out to us. There shouldn't be any downsides other than the one you already mentioned as far as I know. Best, Matthias On Fri, Oct 23, 2020 at 1:27 PM Joseph Lorenzini wrote: > Hi all, > > > > I plan to run flink jobs as docker containers in a AWS Elastic Container

Re: a couple of memory questions

2020-11-05 Thread Matthias Pohl
Hello Edward, please find my answers within your message below: On Wed, Nov 4, 2020 at 1:35 PM Colletta, Edward wrote: > Using Flink 1.9.2 with FsStateBackend, Session cluster. > > > >1. Does heap state get cleaned up when a job is cancelled? > > We have jobs that we run on a daily basis. W

Re: Job crash in job cluster mode

2020-11-10 Thread Matthias Pohl
Hi Tim, I'm not aware of any memory-related issues being related to the deployment mode used. Have you checked the logs for hints? Additionally, you could try to extract a heap dump. That might help you in analyzing the cause of the memory consumption. The TaskManager and JobManager are logging th

Re: Caching Mechanism in Flink

2020-11-11 Thread Matthias Pohl
Hi Iacovos, The task's off-heap configuration value is used when spinning up TaskManager containers in a clustered environment. It will contribute to the overall memory reserved for a TaskManager container during deployment. This parameter can be used to influence the amount of memory allocated if

Re: SSL setup for YARN deployment when hostnames are unknown.

2020-11-11 Thread Matthias Pohl
Hi Jiahui, thanks for reaching out to the mailing list. This is not something I have expertise in. But have you checked out the Flink SSL Setup documentation [1]? Maybe, you'd find some help there. Additionally, I did go through the code a bit: A SecurityContext is loaded during ClusterEntrypoint

Re: Caching Mechanism in Flink

2020-11-11 Thread Matthias Pohl
on. I find that the off-heap > is used when Flink uses HybridMemorySegments. Well, how the Flink knows > when to use these HybridMemorySegments and in which operations this is > happened? > > Best, > Iacovos > On 11/11/20 11:41 π.μ., Matthias Pohl wrote: > > Hi Iacovos, > The task

Re: Data loss exception using hash join in batch mode

2020-11-13 Thread Matthias Pohl
Hi 键, we would need more context on your case (e.g. logs and more details on what you're doing exactly or any other useful information) to help. Best, Matthias On Thu, Nov 12, 2020 at 3:25 PM 键 <1941890...@qq.com> wrote: > Data loss exception using hash join in batch mode >

Re: Job is still in ACTIVE state after /jobs/:jobid/stop

2020-11-13 Thread Matthias Pohl
Hi Averell, thanks for sharing this with the Flink community. Is there anything suspicious in the logs which you could share? Best, Matthias On Fri, Nov 13, 2020 at 2:27 AM Averell wrote: > I have some updates. Some weird behaviours were found. Please refer to the > attached photo. > > All requ

Re: Logs of JobExecutionListener

2020-11-13 Thread Matthias Pohl
Hi Flavio, thanks for sharing this with the Flink community. Could you answer the following questions, please: - What's the code of your Job's main method? - What cluster backend and application do you use to execute the job? - Is there anything suspicious you can find in the logs that might be rel

Re: Filter By Value in List

2020-11-13 Thread Matthias Pohl
Hi Rex, after verifying with Timo I created a new issue to address your proposal of introducing a new operator [1]. Feel free to work on that one if you like. Best, Matthias [1] https://issues.apache.org/jira/browse/FLINK-20148 On Thu, Nov 5, 2020 at 6:35 PM Rex Fenley wrote: > Thanks Timo, >

Re: Assistance configuring access to GoogleCloudStorage for row format streaming file sink

2020-11-13 Thread Matthias Pohl
erval(TimeUnit.MINUTES.toMillis(1)) > .withMaxPartSize(1024 * 1024 * 1024) > .build()) > .build(); > > //input.print(); > input.addSink(sink); > > > Not sure what else to try. Any pointers appreciated. > > > > Sent with ProtonMail <https://protonmail.com&g

Re: Is possible that make two operators always locate in same taskmanager?

2020-11-13 Thread Matthias Pohl
Hi Si-li, trying to answer your initial question: Theoretically, you could try using the co-location constraints to achieve this. But keep in mind that this might lead to multiple Join operators running in the same JVM reducing the amount of memory each operator can utilize. Best, Matthias On Mon

Re: Questions on Table to Datastream conversion and onTimer method in KeyedProcessFunction

2020-11-13 Thread Matthias Pohl
Hi Fuyao, for your first question about the different behavior depending on whether you chain the methods or not: Keep in mind that you have to save the return value of the assignTimestampsAndWatermarks method call if you don't chain the methods together as it is also shown in [1]. At least the fol

Re: Batch compressed file output

2020-11-27 Thread Matthias Pohl
Hi Flavio, others might have better ideas to solve this but I'll give it a try: Have you considered extending FileOutputFormat to achieve what you need? That approach (which is discussed in [1]) sounds like something you could do. Another pointer I want to give is the DefaultRollingPolicy [2]. It l

Re: Flink jobmanager TLS connectivity to Zookeeper

2020-12-10 Thread Matthias Pohl
gt; some way that I can go about doing this? > > -- Matthias Pohl | Engineer Follow us @VervericaData Ververica <https://www.ververica.com/> -- Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- V

Re: Accumulators storage path

2020-12-10 Thread Matthias Pohl
Hi Hanan, thanks for reaching out to the Flink community. Have you considered changing io.tmp.dirs [1][2]? Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/config.html#io-tmp-dirs [2] https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/deployment/clus

Re: Putting record on kinesis (actually kinesalite) results with The security token included in the request is invalid.

2020-12-10 Thread Matthias Pohl
Hi Avi, thanks for reaching out to the Flink community. I haven't worked with the KinesisConsumer. Unfortenately, I cannot judge whether there's something missing in your setup. But first of all: Could you confirm that the key itself is valid? Did you try to use it in other cases? Best, Matthias

Re: Putting record on kinesis (actually kinesalite) results with The security token included in the request is invalid.

2020-12-11 Thread Matthias Pohl
Dec 10, 2020 at 6:53 PM Matthias Pohl > wrote: > >> Hi Avi, >> thanks for reaching out to the Flink community. I haven't worked with the >> KinesisConsumer. Unfortenately, I cannot judge whether there's something >> missing in your setup. But first of all: Could

Re: Official Flink 1.12.0 image

2020-12-22 Thread Matthias Pohl
Hi Robert, there is a discussion about it in FLINK-20632 [1]. PR #9249 [2] still needs to get reviewed. You might want to follow that PR as Xintong suggested in [1]. I hope that helps. Best, Matthias [1] https://issues.apache.org/jira/browse/FLINK-20632 [2] https://github.com/docker-library/offic

Re: Idea import Flink source code

2021-01-12 Thread Matthias Pohl
Hi, you might want to move these kinds of questions into the user@flink.apache.org which is the mailing list for community support questions [1]. Coming back to your question: Is it just me or is the image not accessible? Could you provide a textual description of your problem? Best, Matthias [1

Re: Re: Idea import Flink source code

2021-01-13 Thread Matthias Pohl
lve plugins 14 errors > Cannot resolve plugin org.codehaus.mojo:build-helper-maven-plugin: > > > Best, > penguin > > > > > 在 2021-01-13 15:24:22,"Matthias Pohl" 写道: > > Hi, > you might want to move these kinds of questions into the > us

Re: Re: Re: Idea import Flink source code

2021-01-13 Thread Matthias Pohl
. Does this mean that only the > replied person can see the email? > > > If Maven fails to download plugins or dependencies, is mvn -clean > install -DskipTests a must? > I'll try first. > > penguin > > > > > 在 2021-01-13 16:35:10,"Matthias Pohl&qu

Re: Flink Application cluster/standalone job: some JVM Options added to Program Arguments

2021-01-17 Thread Matthias Pohl
Hi Alexey, thanks for reaching out to the Flink community. I'm not 100% sure whether you have an actual issue or whether it's just the changed behavior you are confused about. The change you're describing was introduced in Flink 1.12 as part of the work on FLIP-104 [1] exposing the actual memory us

Re: Flink Application cluster/standalone job: some JVM Options added to Program Arguments

2021-01-19 Thread Matthias Pohl
“, set system property ? If > user code has nothing to do with such arguments, why Flink append these > arguments to user JOB args? > Thanks, > Alexey > > > ------ > *From:* Matthias Pohl > *Sent:* Sunday, January 17, 2021 11:53:29 PM >

Re: Question about setNestedFileEnumeration()

2021-01-22 Thread Matthias Pohl
Hi Wayne, based on other mailing list discussion ([1]) you can assume that the combination of FileProcessingMode.PROCESS_CONTINUOUSLY and setting FileInputFormat.setNestedFileEnumeration to true should work as you expect it to work. Can you provide more context on your issue like log files? Which

Re: JDBC connection pools

2021-01-22 Thread Matthias Pohl
Hi Marco, have you had a look into the connector documentation ([1] for the regular connector or [2] for the SQL connector)? Maybe, discussions about connection pooling in [3] and [4] or the code snippets provided in the JavaDoc of JdbcInputFormat [5] help as well. Best, Matthias [1] https://ci.a

Re: Handling validations/errors in the flink job

2021-01-22 Thread Matthias Pohl
Hi Sagar, have you had a look at CoProcessFunction [1]? CoProcessFunction enables you to join two streams into one and also provide context to use SideOutput [2]. Best, Matthias [1] https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-streaming-java/src/main/java/or

Re: Re: Test failed in flink-end-to-end-tests/flink-end-to-end-tests-common-kafka

2021-01-22 Thread Matthias Pohl
Hi Smile, Have you used a clean checkout? I second Robert's statement considering that the dependency you're talking about is already part of flink-end-to-end-tests/flink-end-to-end-tests-common-kafka/pom.xml. It also has the correct scope set both in master and release-1.12. Best, Matthias On Fr

Re: Flink 1.11 checkpoint compatibility issue

2021-01-22 Thread Matthias Pohl
Hi Lu, thanks for reaching out to the community, Lu. Interesting observation. There's no change between 1.9.1 and 1.11 that could explain this behavior as far as I can tell. Have you had a chance to debug the code? Can you provide the code so that we could look into it more closely? Another thing:

Re: [Flink SQL] CompilerFactory cannot be cast error when executing statement

2021-01-22 Thread Matthias Pohl
Hi Sebastián, have you tried changing the dependency scope to provided for flink-table-planner-blink as it is suggested in [1]? Best, Matthias [1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Flink-1-10-exception-Unable-to-instantiate-java-compiler-td38221.html On Fri, Jan 22,

Re: [Flink SQL] CompilerFactory cannot be cast error when executing statement

2021-01-22 Thread Matthias Pohl
perience with mvn or "modern" Java > in general. > > :-) > > Thanks! > > On Fri, 22 Jan 2021 at 15:19, Matthias Pohl > wrote: > >> Hi Sebastián, >> have you tried changing the dependency scope to provided >> for flink-table-planner-blink as

Re: unsubscribe

2021-01-24 Thread Matthias Pohl
Hi Abhishek, unsubscribing works by sending an email to user-unsubscr...@flink.apache.org as stated in [1]. Best, Matthias [1] https://flink.apache.org/community.html#mailing-lists On Sun, Jan 24, 2021 at 3:06 PM Abhishek Jain wrote: > unsubscribe >

Re: FlinkSQL submit query and then the jobmanager failed.

2021-01-24 Thread Matthias Pohl
Hi, thanks for reaching out to the community. I'm not an Hive nor Orc format expert. But could it be that this is a configuration problem? The error is caused by an ArrayIndexOutOfBounds exception in ValidReadTxnList.readFromString on an array generated by splitting a String using colons as separat

Re: Re: Re: Test failed in flink-end-to-end-tests/flink-end-to-end-tests-common-kafka

2021-01-25 Thread Matthias Pohl
/confluent/kafka-avro-serializer/5.5.2/kafka-avro-serializer-5.5.2.pom > [3]. https://mvnrepository.com/artifact/io.confluent/kafka-avro-serializer > > At 2021-01-22 21:22:51, "Matthias Pohl" wrote: > > Hi Smile, > Have you used a clean checkout? I second Robert's statement

Re: memory tuning

2021-01-26 Thread Matthias Pohl
Hi Marco, Could you share the preconfiguration logs? They are printed in the beginning of the taskmanagers' logs and contain a summary of the used memory configuration? Best, Matthias On Tue, Jan 26, 2021 at 6:35 AM Marco Villalobos wrote: > > I have a flink job that collects and aggregates tim

Re: JobManager seems to be leaking temporary jar files

2021-01-26 Thread Matthias Pohl
Hi Maciek, my understanding is that the jars in the JobManager should be cleaned up after the job is terminated (I assume that your jobs successfully finished). The jars are managed by the BlobService. The dispatcher will trigger the jobCleanup in [1] after job termination. Are there any suspicious

Re: memory tuning

2021-01-27 Thread Matthias Pohl
rg.apache.flink.configuration.GlobalConfiguration [] - Loading > configuration property: jobmanager.memory.process.size, 1600m > 2021-01-25 21:41:18,046 INFO > org.apache.flink.configuration.GlobalConfiguration [] - Loading > configuration property: taskmanager.memory.process.size, 1728m > 2021-01-25 21:41:18,0

Re: [ANNOUNCE] Apache Flink 1.10.3 released

2021-02-01 Thread Matthias Pohl
Yes, thanks for taking over the release! Best, Matthias On Mon, Feb 1, 2021 at 5:04 AM Zhu Zhu wrote: > Thanks Xintong for being the release manager and everyone who helped with > the release! > > Cheers, > Zhu > > Dian Fu 于2021年1月29日周五 下午5:56写道: > >> Thanks Xintong for driving this release! >

Re: S3 parquet files as Sink in the Table SQL API

2021-02-10 Thread Matthias Pohl
Hi, have tried using the bundled hadoop uber jar [1]. It looks like some Hadoop dependencies are missing. Best, Matthias [1] https://flink.apache.org/downloads.html#additional-components On Wed, Feb 10, 2021 at 1:24 PM meneldor wrote: > Hello, > I am using PyFlink and I want to write records f

Re: Flink’s Kubernetes HA services - NOT working

2021-02-10 Thread Matthias Pohl
Hi Daniel, what's the exact configuration you used? Did you use the resource definitions provided in the Standalone Flink on Kubernetes docs [1]? Did you do certain things differently in comparison to the documentation? Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-release-1.

Re: How does Flink handle shorted lived keyed streams

2021-02-10 Thread Matthias Pohl
Hi narashima, not sure whether this fits your use case, but have you considered creating a savepoint and analyzing it using the State Processor API [1]? Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/state_processor_api.html#state-processor-api On Wed, Feb 10,

Re: Should flink job manager crash during zookeeper upgrade?

2021-02-10 Thread Matthias Pohl
Hi Barisa, thanks for sharing this. I'm gonna add Till to this thread. He might have some insights. Best, Matthias On Wed, Feb 10, 2021 at 4:19 PM Barisa Obradovic wrote: > I'm trying to understand if behaviour of the flink jobmanager during > zookeeper upgrade is expected or not. > > I'm runni

Re: Flink’s Kubernetes HA services - NOT working

2021-02-11 Thread Matthias Pohl
https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/ha/kubernetes_ha.html#configuration On Wed, Feb 10, 2021 at 6:08 PM Matthias Pohl wrote: > Hi Daniel, > what's the exact configuration you used? Did you use the resource > definitions provided in the Standalone

Re: threading and distribution

2021-02-11 Thread Matthias Pohl
Hi Marco, sorry for the late reply. The documentation you found [1] is already a good start. You can define how many subtasks of an operator run in parallel using the operator's parallelism configuration [2]. Each operator's subtask will run in a separate task slot. There's the concept of slot shar

Re: Question

2021-02-12 Thread Matthias Pohl
Hi Abu Bakar Siddiqur Rahman, Have you had a look at the Flink documentation [1]? It provides step-by-step instructions on how to run a job (the Flink binaries provide example jobs under ./examples) using a local standalone cluster. This should also work on a Mac. You would just need to start the F

Re: Question

2021-02-12 Thread Matthias Pohl
not, could you please provide me the source code to access in the > storage where snapshots are saved? > > Thank you > > > > > On Fri, Feb 12, 2021 at 2:44 AM Matthias Pohl > wrote: > >> Hi Abu Bakar Siddiqur Rahman, >> Have you had a look at the Flink docume

Re: Flink’s Kubernetes HA services - NOT working

2021-02-15 Thread Matthias Pohl
I'm adding the Flink user ML to the conversation again. On Mon, Feb 15, 2021 at 8:18 AM Matthias Pohl wrote: > Hi Omer, > thanks for sharing the configuration. You're right: Using NFS for HA's > storageDir is fine. > > About the error message you're referri

Re: Question

2021-02-22 Thread Matthias Pohl
flink-docs-release-1.12/try-flink/local_installation.html > > I can run the code in the UI of Apache Flink that is in the bin file of > Apache Flink. If I run a java code from intellij idea or eclipse, then how > can I connect the code to apache flink UI? > > Thank you! > >

Re: Question

2021-02-22 Thread Matthias Pohl
docs-stable/dev/project-configuration.html#maven-quickstart On Mon, Feb 22, 2021 at 5:06 PM Abu Bakar Siddiqur Rahman Rocky < bakar121...@gmail.com> wrote: > Hi Matthias Pohl, > > Thank you for your reply. > > At first, I'm sorry if my question make you confuse. Let me

Re: Flink application kept restarting

2021-02-26 Thread Matthias Pohl
Hi Rainie, the network buffer pool was destroyed for some reason. This happens when the NettyShuffleEnvironment gets closed which is triggered when an operator is cleaned up, for instance. Maybe, the timeout in the metric system caused this. But I'm not sure how this is connected. I'm gonna add Che

Re: Get JobId and JobManager RPC Address in RichMapFunction executed in TaskManager

2021-02-26 Thread Matthias Pohl
Hi Sandeep, thanks for reaching out to the community. Unfortunately, the information you're looking for is not exposed in a way that you could access it from within your RichMapFunction. Could you elaborate a bit more on what you're trying to achieve? Maybe, we can find another solution for your pr

Re: apache-flink +spring boot - logs related to spring boot application start up not printed in file in flink 1.12

2021-02-26 Thread Matthias Pohl
Hi Abhishek, this might be caused by the switch from log4j to log4j2 as the default in Flink 1.11 [1]. Have you had a chance to look at the logging documentation [2] to enable log4j again? Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-stable/release-notes/flink-1.11.html#swit

Re: Processing-time temporal join is not supported yet.

2021-02-26 Thread Matthias Pohl
Hi Eric, it looks like you ran into FLINK-19830 [1]. I'm gonna add Leonard to the thread. Maybe, he has a workaround for your case. Best, Matthias [1] https://issues.apache.org/jira/browse/FLINK-19830 On Fri, Feb 26, 2021 at 11:40 AM eric hoffmann wrote: > Hello > Working with flink 1.12.1 i r

Re: Flink 1.12.1 example applications failing on a single node yarn cluster

2021-02-26 Thread Matthias Pohl
Hi Debraj, thanks for reaching out to the Flink community. Without knowing the details on how you've set up the Single-Node YARN cluster, I would still guess that it is a configuration issue on the YARN side. Flink does not know about a .flink folder. Hence, there is no configuration to set this fo

Re: Is it possible to specify max process memory in flink 1.8.2, similar to what is possible in flink 1.11

2021-02-26 Thread Matthias Pohl
Hi Bariša, have you had the chance to analyze the memory usage in more detail? An OutOfMemoryError might be an indication for some memory leak which should be solved instead of lowering some memory configuration parameters. Or is it that the off-heap memory is not actually used but blocks the JVM f

Re: Flink 1.12.1 example applications failing on a single node yarn cluster

2021-02-26 Thread Matthias Pohl
t > stores the Flink jar and configuration file."* > > Same mentioned here > <https://docs.cloudera.com/csa/1.2.0/installation/topics/csa-hdfs-home-install.html> > . > > > <https://wints.github.io/flink-web//faq.html#the-yarn-session-crashes-with-a-hdfs-permission-exc

Re: apache-flink +spring boot - logs related to spring boot application start up not printed in file in flink 1.12

2021-02-28 Thread Matthias Pohl
Hi Abhishek, have you also tried to apply the instructions listed in [1]? Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/advanced/logging.html#configuring-log4j1 On Mon, Mar 1, 2021 at 4:42 AM Abhishek Shukla wrote: > Hi Matthias, > Thanks for replyi

Re: Flink application kept restarting

2021-03-01 Thread Matthias Pohl
On Fri, Feb 26, 2021 at 6:33 AM Matthias Pohl > wrote: > >> Hi Rainie, >> the network buffer pool was destroyed for some reason. This happens when >> the NettyShuffleEnvironment gets closed which is triggered when an operator >> is cleaned up, for instance. Maybe, the t

Re: Flink application kept restarting

2021-03-01 Thread Matthias Pohl
Another question is: The timeout of 48 hours sounds strange. There should have been some other system noticing the connection problem earlier assuming that you have a reasonably low heartbeat interval configured. Matthias On Mon, Mar 1, 2021 at 1:22 PM Matthias Pohl wrote: > Thanks

Re: Flink application kept restarting

2021-03-03 Thread Matthias Pohl
stractEventExecutor.safeExecute(AbstractEventExecutor.java:163) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.

Re: apache-flink +spring boot - logs related to spring boot application start up not printed in file in flink 1.12

2021-03-10 Thread Matthias Pohl
Hi Abhishek, sorry for the late reply. Did you manage to fix it? One remark: Are you sure you're referring to the right configuration file? log4j-cli.properties is used for the CLI tool [1]. Or do you try to get the logs from within the main of your job? Best, Matthias [1] https://ci.apache.org/p

Re: Evenly Spreading Out Source Tasks

2021-03-12 Thread Matthias Pohl
Hi Aeden, just to be sure: All task managers have the same hardware/memory configuration, haven't they? I'm not 100% sure whether this affects the slot selection in the end, but it looks like this parameter has also an influence on the slot matching strategy preferring slots with less utilization o

Re: Flink History server ( running jobs )

2021-03-19 Thread Matthias Pohl
Hi Vishal, yes, as the documentation explains [1]: Only jobs that reached a globally terminal state are archived into Flink's history server. State information about running jobs can be retrieved through Flink's REST API. Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-release-

Re: Read the metadata files (got from savepoints)

2021-03-22 Thread Matthias Pohl
Hi Abdullah, you might also want to have a look at the State Processor API [1]. Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/state_processor_api.html On Mon, Mar 22, 2021 at 6:28 AM Congxian Qiu wrote: > Hi >Maybe you can reach to this test[1] for refe

Re: RocksDB StateBuilder unexpected exception

2021-03-23 Thread Matthias Pohl
Hi Danesh, thanks for reaching out to the Flink community. Checking the code, it looks like the OutputStream is added to a CloseableRegistry before writing to it [1]. My suspicion is - based on the exception cause - that the CloseableRegistry got triggered while restoring the state. I tried to tra

Re: Evenly distribute task slots across task-manager

2021-03-23 Thread Matthias Pohl
Hi Vignesh, are you trying to achieve an even distribution of tasks for this one operator that has the parallelism set to 16? Or do you observe the described behavior also on a job level? I'm adding Chesnay to the thread as he might have more insights on this topic. Best, Matthias On Mon, Mar 22,

Re: Evenly distribute task slots across task-manager

2021-03-23 Thread Matthias Pohl
.nabble.com/Evenly-Spreading-Out-Source-Tasks-tp42108p42235.html On Tue, Mar 23, 2021 at 1:42 PM Matthias Pohl wrote: > Hi Vignesh, > are you trying to achieve an even distribution of tasks for this one > operator that has the parallelism set to 16? Or do you observe the > described b

Re: QueryableStateClient getKVState

2021-03-23 Thread Matthias Pohl
Hi Sandeep, the equals method does not compare the this.map with that.map but that.dimensions. ...at least in your commented out code. Might this be the problem? Best, Matthias On Tue, Mar 23, 2021 at 5:28 AM Sandeep khanzode wrote: > Hi, > > I have a stream that exposes the state for Queryable

Re: Flink Streaming Counter

2021-03-23 Thread Matthias Pohl
Hi Vijayendra, thanks for reaching out to the Flink community. What do you mean by displaying it in your local IDE? Would it be ok to log the information out onto stdout? You might want to have a look at the docs about setting up a slf4j metrics report [1] if that's the case. Best, Matthias [1] h

Re: QueryableStateClient getKVState

2021-03-23 Thread Matthias Pohl
name. > > I can use String, Int, Enum, Long type keys in the Key that I send in the > Query getKvState … but the moment I introduce a TreeMap, even though it > contains a simple one entry String, String, it doesn’t work … > > Thanks, > Sandeep > > On 23-Mar-2021, at 7:00 P

Re: pipeline.auto-watermark-interval vs setAutoWatermarkInterval

2021-03-23 Thread Matthias Pohl
Hi Aeden, sorry for the late reply. I looked through the code and verified that the JavaDoc is correct. Setting pipeline.auto-watermark-interval to 0 will disable the automatic watermark generation. I created FLINK-21931 [1] to cover this. Thanks, Matthias [1] https://issues.apache.org/jira/brows

Re: How to get operator uid from a sql

2021-03-23 Thread Matthias Pohl
Hi XU Qinghui, sorry for the late reply. Unfortunately, the operator ID does not mean to be accessible for Flink SQL through the API. You might have a chance to extract the Operator ID through the debug logs. StreamGraphHasherV2.generateDeterministicHash logs out the operator ID [1]: "[main] DEBUG

Re: Do data streams partitioned by same keyBy but from different operators converge to the same sub-task?

2021-03-23 Thread Matthias Pohl
Hi Vishal, I'm not 100% sure what you're trying to do. But the partitioning by a key just relies on the key on the used parallelism. So, I guess, what you propose should work. You would have to rely on some join function, though, when merging two input operators into one again. I hope that was hel

Re: Flink Streaming Counter

2021-03-24 Thread Matthias Pohl
o quickly. I am looking for a sample > example where I can increment counters on each stage #1 thru #3 for > DATASTREAM. > Then probably I can print it using slf4j. > > Thanks, > Vijay > > On Tue, Mar 23, 2021 at 6:35 AM Matthias Pohl > wrote: > >> Hi Vijaye

Re: Do data streams partitioned by same keyBy but from different operators converge to the same sub-task?

2021-03-24 Thread Matthias Pohl
1. yes - the same key would affect the same state variable 2. you need a join to have the same operator process both streams Matthias On Wed, Mar 24, 2021 at 7:29 AM vishalovercome wrote: > Let me make the example more concrete. Say O1 gets as input a data stream > T1 > which it splits into two

Re: DataDog and Flink

2021-03-24 Thread Matthias Pohl
Hi Vishal, what about the TM metrics' REST endpoint [1]. Is this something you could use to get all the metrics for a specific TaskManager? Or are you looking for something else? Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/rest_api.html#taskmanagers-metrics

Re: [BULK]Re: [SURVEY] Remove Mesos support

2021-03-25 Thread Matthias Pohl
Hi everyone, considering the upcoming release of Flink 1.13, I wanted to revive the discussion about the Mesos support ones more. Mesos is also already listed as deprecated in Flink's overall roadmap [1]. Maybe, it's time to align the documentation accordingly to make it more explicit? What do you

Re: pipeline.auto-watermark-interval vs setAutoWatermarkInterval

2021-03-26 Thread Matthias Pohl
.auto-watermak-interval and setAutoWatermarkInterval are >> effectively the same setting/option. However I am not sure if Table API >> interprets it in the same way as DataStream APi. The documentation you >> linked, Aeden, describes the SQL API. >> >> @Jark @Timo Could you v

Re: Fw:A question about flink watermark illustration in official documents

2021-03-31 Thread Matthias Pohl
Hi 罗昊, the 2nd picture is meant to visualize the issue of out-of-orderness in general. I'd say it's not referring to a specific strategy. But one way to interpret the image is using the BoundedOutOfOrderness strategy for watermark generation [1]: You can define an upper bound B for the out-of-orde

Re: DataStream from kafka topic

2021-03-31 Thread Matthias Pohl
Hi Maminspapin, I haven't worked with Kafka/Flink, yet. But have you had a look at the docs about the DeserializationSchema [1]? It mentions ConfluentRegistryAvroDeserializationSchema. Is this something you're looking for? Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-stable/

Re: DataStream from kafka topic

2021-03-31 Thread Matthias Pohl
Ok, it looks like you've found that solution already based on your question in [1]. [1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Proper-way-to-get-DataStream-lt-GenericRecord-gt-td42640.html On Wed, Mar 31, 2021 at 1:26 PM Matthias Pohl wrote: > Hi Maminspa

Re: Proper way to get DataStream

2021-03-31 Thread Matthias Pohl
Hi Maminspapin again, have you checked whether your topic actually contains data that matches your schema specified through cep.model.User? Best, Matthias On Tue, Mar 30, 2021 at 3:39 PM Maminspapin wrote: > Hi, > > I'm trying to solve a task with getting data from topic. This topic keeps > avr

Re: Restoring from Flink Savepoint in Kubernetes not working

2021-03-31 Thread Matthias Pohl
Hi Claude, thanks for reaching out to the Flink community. Could you provide the Flink logs for this run to get a better understanding of what's going on? Additionally, what exact Flink 1.12 version are you using? Did you also verify that the snapshot was created by checking the actual folder? Bes

Re: IO benchmarking

2021-03-31 Thread Matthias Pohl
Hi Deepthi, 1. Have you had a look at flink-benchmarks [1]? I haven't used it but it might be helpful. 2. Unfortunately, Flink doesn't provide metrics like that. But you might want to follow FLINK-21736 [2] for future developments. 3. Is there anything specific you are looking for? Unfortunately, I

Re: IO benchmarking

2021-03-31 Thread Matthias Pohl
zed into one (or > more?) files on persistent storage. I'll check out the code pointers! > > On Wed, Mar 31, 2021 at 7:07 AM Matthias Pohl > wrote: > >> Hi Deepthi, >> 1. Have you had a look at flink-benchmarks [1]? I haven't used it but it >> might be helpful.

Re: JDBC connector support for JSON

2021-04-01 Thread Matthias Pohl
Hi Fanbin, I'm not that familiar with the FlinkSQL features. But it looks like the JdbcConnector does not support Json as stated in the documentation [1]. You might work around it by implementing your own user-defined functions [2]. I hope this helps. Matthias [1] https://ci.apache.org/projects/f

Re: Restoring from Flink Savepoint in Kubernetes not working

2021-04-01 Thread Matthias Pohl
> In the UI, I see the following: > > *Latest Failed Checkpoint ID: 50Failure Time: 2021-03-31 09:34:43Cause: > Asynchronous task checkpoint failed.* > > What does this failure mean? > > > On Wed, Mar 31, 2021 at 9:22 AM Matthias Pohl > wrote: > >> Hi Claud

Re: [BULK]Re: [SURVEY] Remove Mesos support

2021-04-14 Thread Matthias Pohl
;> Cheers, > > >> Till > > >> > > >> On Thu, Mar 25, 2021 at 1:49 PM Konstantin Knauf > > wrote: > > >>> > > >>> Hi Matthias, > > >>> > > >>> Thank you for following up on this. +1 to officially deprecate

Re: Flink streaming job's taskmanager process killed by yarn nodemanager because of exceeding 'PHYSICAL' memory limit

2021-04-22 Thread Matthias Pohl
Hi, I have a few questions about your case: * What is the option you're referring to for the bounded shuffle? That might help to understand what streaming mode solution you're looking for. * What does the job graph look like? Are you assuming that it's due to a shuffling operation? Could you provid

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-22 Thread Matthias Pohl
Hi Vishal, based on the error message and the behavior you described, introducing a filter for late events is the way to go - just as described in the SO thread you mentioned. Usually, you would collect late events in some kind of side output [1]. I hope that helps. Matthias [1] https://ci.apache

Re: Flink Hadoop config on docker-compose

2021-04-22 Thread Matthias Pohl
I think you're right, Flavio. I created FLINK-22414 to cover this. Thanks for bringing it up. Matthias [1] https://issues.apache.org/jira/browse/FLINK-22414 On Fri, Apr 16, 2021 at 9:32 AM Flavio Pompermaier wrote: > Hi Yang, > isn't this something to fix? If I look at the documentation at [1

Re: MemoryStateBackend Issue

2021-04-22 Thread Matthias Pohl
Hi Milind, I bet someone else might have a faster answer. But could you provide the logs and config to get a better understanding of what your issue is? In general, the state is maintained even in cases where a TaskManager fails. Best, Matthias On Thu, Apr 22, 2021 at 5:11 AM Milind Vaidya wrote

Re: Long to Timestamp(3) Conversion

2021-04-22 Thread Matthias Pohl
Hi Aeden, there are some improvements to time conversions coming up in Flink 1.13. For now, the best solution to overcome this is to provide a user-defined function. Hope, that helps. Best, Matthias On Wed, Apr 21, 2021 at 9:36 PM Aeden Jameson wrote: > I've probably overlooked something simple

Re: Kubernetes Setup - JM as job vs JM as deployment

2021-04-22 Thread Matthias Pohl
Hi Gil, I'm not sure whether I understand you correctly. What do you mean by deploying the job manager as "job" or "deployment"? Are you referring to the different deployment modes, Flink offers [1]? These would be independent of Kubernetes. Or do you wonder what the differences are between the Fli

  1   2   3   >