Testing Flink job with bounded input

2024-01-18 Thread Jan Lukavský
Hi, I have a question about how to correctly set up a test that will read input from locally provided collection in bounded mode and provide outputs at the end of the computation. My test case looks something like the following: String[] lines = ...; try (StreamExecutionEnvironment env = St

Re: Flink TCP custom source - secured server socket

2023-07-02 Thread Jan Lukavský
your Flink application. Best,  Jan On 7/1/23 05:20, Kamal Mittal wrote: Hello, Thanks for your suggestion but please confirm below. Is it the case that TCP socket source job can’t be restored from last checkpoint? Rgds, Kamal *From:*Jan Lukavský *Sent:* 29 June 2023 06:18 PM

Re: Flink TCP custom source - secured server socket

2023-06-29 Thread Jan Lukavský
> ... a state backward in (processing) time ... (of course not processing, I meant to say event time) On 6/29/23 14:45, Jan Lukavský wrote: Hi Kamal, you probably have several options:  a) bundle your private key and certificate into your Flink application's jar (not recommend

Re: Flink TCP custom source - secured server socket

2023-06-29 Thread Jan Lukavský
Hi Kamal, you probably have several options:  a) bundle your private key and certificate into your Flink application's jar (not recommended, your service's private key will have to be not exactly "private")  b) create a service which will provide certificate for your service during runtime (e

Re: Watermark in global commit

2023-02-14 Thread Jan Lukavský
Hi, I'm not expert on Flink specifially, but your approach might be easier solve when broken down into two steps - create a "stable" input to downstream processing, this might include a specific watermark. In Flink, the "stability" of input for downstream processing is ensured by a checkpoint

Re: beam + flink + k8

2023-02-02 Thread Jan Lukavský
ubmit python Job to flink it's not showing or flink UI or simply hangs without any error. Sent from my Galaxy Original message ---- From: Jan Lukavský Date: 02/02/2023 15:07 (GMT+05:30) To: user@flink.apache.org Subject: Re: beam + flink + k8 I'm not sure how exactly minik

Re: beam + flink + k8

2023-02-02 Thread Jan Lukavský
the same script which I shared with you.. Do I need to make some changes for Google Kubernetes Environment? On Tue, 31 Jan 2023 at 20:20, Jan Lukavský wrote: The script looks good to me, did you run the SDK harness? External environment needs the SDK harness to be run external

Re: Non-temporal watermarks

2023-02-02 Thread Jan Lukavský
Hi, I will not speak about details related to Flink specifically, the concept of watermarks is more abstract, so I'll leave implementation details aside. Speaking generally, yes, there is a set of requirements that must be met in order to be able to generate a system that uses watermarks.

Re: beam + flink + k8

2023-01-31 Thread Jan Lukavský
e/kubernetes/> I have changed the log level to ERROR but didn't find much... Can you please help me out how to run the script from inside the pod. On Tue, 31 Jan 2023 at 15:40, Jan Lukavský wrote: Hi, can you please share the also the script itself? I'd say that the pro

Re: beam + flink + k8

2023-01-31 Thread Jan Lukavský
/30/23 18:36, P Singh wrote: Hi Jan, Yeah I am using minikube and beam image with python 3.10. Please find the attached screenshots. On Mon, 30 Jan 2023 at 21:22, Jan Lukavský wrote: Hi, can you please share the command-line and complete output of the script? Are you using mi

Re: beam + flink + k8

2023-01-30 Thread Jan Lukavský
Hi, can you please share the command-line and complete output of the script? Are you using minikube? Can you share list of your running pods?  Jan On 1/30/23 14:25, P Singh wrote: Hi Team, I am trying to run beam job on top of flink on my local machine (kubernetes).  I have flink 1.14 an

Re: Flink task stuck - MapPartition WAITING on java.util.concurrent.CompletableFuture$Signaller

2022-06-02 Thread Jan Lukavský
Starting worker with command ['/opt/apache/beam/boot', '--id=3-3', '--logging_endpoint=localhost:38683', '--artifact_endpoint=localhost:44867', '--provision_endpoint=localhost:34833', '--control_endpoint=localhost:44351'] Startin

Re: Flink task stuck - MapPartition WAITING on java.util.concurrent.CompletableFuture$Signaller

2022-06-01 Thread Jan Lukavský
Hi Gorjan, +user@beam The trace you posted is just waiting for a bundle to finish in the SDK harness. I would suspect there is a problem in the logs of the harness. Did you look for possible errors there?  Jan On 5/31/22 13:54, Gorjan Todorovski wrote: Hi, I

Re: Beam with Flink runner - Issues when writing to S3 in Parquet Format

2021-09-14 Thread Jan Lukavský
Hi Sandeep, a few questions:  a) which state backend do you use for Flink?  b) what is your checkpointingInterval set for FlinkRunner?  c) how much data is there in your input Kafka topic(s)? FileIO has to buffer all elements per window (by default) into state, so this might create a high pressu

Re: Flink Kafka Partition Discovery Never Forgets Deleted Partitions

2021-09-14 Thread Jan Lukavský
nsibility of Flink to verify that such source is removed only after a checkpoint is taken. Otherwise there would be possible risk of data loss. This definitely looks like quite complex process. Best, D. On Tue, Sep 14, 2021 at 3:44 PM Jan Lukavský <mailto:je...@seznam.cz>> wrote:

Re: Flink Kafka Partition Discovery Never Forgets Deleted Partitions

2021-09-14 Thread Jan Lukavský
Hi, just out of curiosity, would this problem be solvable by the ability to remove partitions, that declare, that do not contain more data (watermark reaching end of global window)? There is probably another problem with that topic can be recreated after being deleted, which could result in w

Re: Running Beam on a native Kubernetes Flink cluster

2021-08-15 Thread Jan Lukavský
Hi Gorjan, the address of localhost is hard-coded in the python worker pool (see [1]). There should be no need to setup a load-balancer for the worker_pool, if you have it as another container in each TM pod, it should suffice to replace {beam_sdk_url} with 'localhost'. Each TM will then have

Re: Delay data elements in pipeline by X minutes

2021-07-19 Thread Jan Lukavský
    }     @Override     public void onTimer(long timestamp, OnTimerContext ctx, Collector out) throws Exception {     out.collect(this.state.value());     }     @Override     public TypeInformation getProducedType() {     return TypeInformation.of(clazz);     } } Best regards, Da

Re: Delay data elements in pipeline by X minutes

2021-07-18 Thread Jan Lukavský
Hi Dario, out of curiosity, could you briefly describe the driving use-case? What is the (logical) constraint, that drives the requirement? I'd guess, that it could be related to waiting for some (external) condition? Or maybe related to late data? I think that there might be better approache

Re: Does the Kafka source perform retractions on Key?

2021-02-24 Thread Jan Lukavský
Hi Rex, If I understand correctly, you are concerned about behavior of Kafka source in the case of compacted topic, right? If that is the case, then this is not directly related to Flink, Flink will expose the behavior defined by Kafka. You can read about it for instance here [1]. TL;TD - you

Re: Random Task executor shutdown (java.lang.OutOfMemoryError: Metaspace)

2020-11-16 Thread Jan Lukavský
:*     [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/project-configuration.html#appendix-template-for-building-a-jar-with-dependencies On Mon, Nov 16, 2020 at 3:29 PM Jan Lukavský <mailto:je...@seznam.cz>> wrote: Yes, that could definitely cause this. You should p

Re: Random Task executor shutdown (java.lang.OutOfMemoryError: Metaspace)

2020-11-16 Thread Jan Lukavský
shaded-jackson classes in my user code? For example import org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper? Bets, Flavio On Mon, Nov 16, 2020 at 3:15 PM Jan Lukavský <mailto:je...@seznam.cz>> wrote: Hi Flavio, when I encountered quite similar pro

Re: Random Task executor shutdown (java.lang.OutOfMemoryError: Metaspace)

2020-11-16 Thread Jan Lukavský
Hi Flavio, when I encountered quite similar problem that you describe, it was related to a static storage located in class that was loaded "parent-first". In my case it was it was in java.lang.ClassValue, but it might (and probably will be) different in your case. The problem is that if user-

Re: Streaming data to parquet

2020-09-14 Thread Jan Lukavský
Hi, I'd like to mention another approach, which might not be as "flinkish", but removes the source of issues which arise when writing bulk files. The actual cause of issues here is that when creating bulk output, the most efficient option is to have _reversed flow of commit_. That is to say -

Re: k8s job cluster using StatefulSet

2020-08-14 Thread Jan Lukavský
Hi Alexey, I'm using StatefulSet for JM exactly as you describe (Deployment for TM is just fine). The main advantage is that you don't need distributed storage for JM fault tolerance, because you can use persistent volume mount (provided your cloud provider provides it as fault tolerant volum

Re: POJO serialization vs immutability

2019-10-07 Thread Jan Lukavský
, Jan Lukavský wrote: Exactly. And that's why it is good for mutable data, because they are not suited for keys either. Jan On 10/7/19 2:58 PM, Chesnay Schepler wrote: The default hashCode implementation is effectively random and not suited for keys as they may not be routed to the same ins

Re: POJO serialization vs immutability

2019-10-07 Thread Jan Lukavský
/2019 14:54, Jan Lukavský wrote: Hi Stephen, I found a very nice article [1], which might help you solve the issues you are concerned about. The elegant solution to this problem might be summarized as "do not implement equals() and hashCode() for POJO types, use Object's default imple

Re: POJO serialization vs immutability

2019-10-07 Thread Jan Lukavský
Hi Stephen, I found a very nice article [1], which might help you solve the issues you are concerned about. The elegant solution to this problem might be summarized as "do not implement equals() and hashCode() for POJO types, use Object's default implementation". I'm not 100% sure that this wi

Re: [SURVEY] What is the most subtle/hard to catch bug that people have seen?

2019-10-01 Thread Jan Lukavský
Hi, I'd add another one regarding Java hashCode() and its practical usability for distributed systems [1], although practically all (Java based) data processing systems rely on it. One bug directly related to this I once saw was, that using an Enum inside other object used as partitioning ke