Re: Write JSON string to JDBC as JSONB?

2022-07-24 Thread John Tipper
d AWS does not give the owner of the database sufficient permissions to execute this. From: John Tipper Sent: 24 July 2022 22:01 To: user@flink.apache.org Subject: Write JSON string to JDBC as JSONB? Hi all, I am using PyFlink and SQL and have a JSON string that I

Write JSON string to JDBC as JSONB?

2022-07-24 Thread John Tipper
Hi all, I am using PyFlink and SQL and have a JSON string that I wish to write to a JDBC sink (it's a Postgresql DB). I'd like to write that string to a column that is of type JSONB (or even JSON). I'm getting exceptions when Flink tries to write the column: Batch entry 0 INSERT INTO my_table(

Re: What do columns for TM memory usage in Flink UI Console mean?

2022-07-20 Thread John Tipper
Sorry, pressed send too early. What is the unit of measure for "count" and does this tell me I have too little Direct Memory and if so, what do I do to specifically increase this number? Many thanks, John ____ From: John Tipper Sent: 20 July 2022 17:5

What do columns for TM memory usage in Flink UI Console mean?

2022-07-20 Thread John Tipper
Hi all, I can't find mention of what the columns mean for the "Outside JVM Memory' for the Task Manager in the Flink console. I have:   Count UsedCapacity Direct4,203   227 MB  227MB Mapped0   0 B   0 B Wh

Re: PyFlink SQL: force maximum use of slots

2022-07-20 Thread John Tipper
copy all events to all slots, in which case I don't understand how parallelism assists? Many thanks, John From: Dian Fu Sent: 20 July 2022 05:19 To: John Tipper Cc: user@flink.apache.org Subject: Re: PyFlink SQL: force maximum use of slots Hi John

PyFlink SQL: force maximum use of slots

2022-07-18 Thread John Tipper
Hi all, Is there a way of forcing a pipeline to use as many slots as possible? I have a program in PyFlink using SQL and the Table API and currently all of my pipeline is using just a single slot. I've tried this:    StreamExecutionEnvironment.get_execution_environment().disable_operator_

Re: PyFlink and parallelism

2022-07-18 Thread John Tipper
s broken if the tablestream contains a timestamp, which I reported a little while ago and Dian filed as FLINK-28253. Kind regards, John From: Juntao Hu Sent: 18 July 2022 04:13 To: John Tipper Cc: user@flink.apache.org Subject: Re: PyFlink and parallelism It&

Re: PyFlink and parallelism

2022-07-16 Thread John Tipper
t in Java and not the int primitive. I actually see this if I just call set_parallelism(1)​ without the call to get_config()​. Is this a bug or is there a workaround? ________ From: John Tipper Sent: 15 July 2022 16:44 To: user@flink.apache.org Subject: PyFlink and p

PyFlink and parallelism

2022-07-15 Thread John Tipper
Hi all, I have a processing topology using PyFlink and SQL where there is data skew: I'm splitting a stream of heterogenous data into separate streams based on the type of data that's in it and some of these substreams have very many more events than others and this is causing issues when check

Re: PyFlink: restoring from savepoint

2022-07-08 Thread John Tipper
nks, John From: Dian Fu Sent: 08 July 2022 02:27 To: John Tipper Cc: user@flink.apache.org Subject: Re: PyFlink: restoring from savepoint Hi John, Could you provide more information, e.g. the exact command submitting the job, the logs file, the PyFlink ve

Re: Restoring a job from a savepoint

2022-07-07 Thread John Tipper
operators that did not have any state in the savepoint: INFO o.a.f.r.c.CheckpointCoordinator [] - Skipping empty savepoint state for operator a0f11f7a2c416beb6b7aed14be0d63ca. Best, Alexander Fedulov On Wed, Jul 6, 2022 at 9:50 PM John Tipper mailto:john_tip...@hotmail.com>> wrote:

PyFlink: restoring from savepoint

2022-07-07 Thread John Tipper
Hi all, I have a PyFlink job running in Kubernetes. Savepoints and checkpoints are being successfully saved to S3. However, I am unable to get the job to start from a save point. The container is started with these args: “standalone-job”, “-pym”, “foo.main”, “-s”, “s3://”, “-n” In the JM logs

Restoring a job from a savepoint

2022-07-06 Thread John Tipper
Hi all, The docs on restoring a job from a savepoint (https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/savepoints/#resuming-from-savepoints) state that the syntax is: $ bin/flink run -s :savepointPath [:runArgs] and where "you may give a path to either the savepoint’s dire

Re: How to convert Table containing TIMESTAMP_LTZ into DataStream in PyFlink 1.15.0?

2022-06-27 Thread John Tipper
with to_append_stream` to see if it works. Regards, Dian On Sat, Jun 25, 2022 at 4:07 AM John Tipper mailto:john_tip...@hotmail.com>> wrote: Hi, I have a source table using a Kinesis connector reading events from AWS EventBridge using PyFlink 1.15.0. An example of the sorts of data that

How to convert Table containing TIMESTAMP_LTZ into DataStream in PyFlink 1.15.0?

2022-06-24 Thread John Tipper
Hi, I have a source table using a Kinesis connector reading events from AWS EventBridge using PyFlink 1.15.0. An example of the sorts of data that are in this stream is here: https://docs.aws.amazon.com/codebuild/latest/userguide/sample-build-notifications.html#sample-build-notifications-ref.

How to use connectors in PyFlink 1.15.0 when not defined in Python API?

2022-06-24 Thread John Tipper
Hi all, There are a number of connectors which do not appear to be in the Python API v1.15.0, e.g. Kinesis. I can see that it's possible to use these connectors by using the Table API: CREATE TABLE my_table (...) WITH ('connector' = 'kinesis' ...) I guess if you wanted the stream as a DataStre

Re: Is it possible to use DataStream API keyBy followed by Table API SQL in PyFlink?

2022-06-23 Thread John Tipper
Sorry for the noise, I completely missed this part of the documentation describing exactly how to do this: https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/data_stream_api/ From: John Tipper Sent: 23 June 2022 21:35 To: user

Is it possible to use DataStream API keyBy followed by Table API SQL in PyFlink?

2022-06-23 Thread John Tipper
Hi all, In PyFlink, is it possible to use the DataStream API to create a DataStream by means of StreamExecutionEnvironment's addSource(...), then perform transformations on this data stream using the DataStream API and then convert that stream into a form where SQL statements can be executed on

Flink TaskManager memory configuration failed

2022-06-22 Thread John Tipper
Hi all, I'm wanting to run quite a number of PyFlink jobs on Kubernetes, where the amount of state and number of events being processed is small and therefore I'd like to use as little memory in my clusters as possible so I can bin pack most efficiently. I'm running a Flink cluster per job. I'm

Re: The configured managed memory fraction for Python worker process must be within (0, 1], was: %s

2022-06-17 Thread John Tipper
naged.consumer-weights`, I really can't think of any situation where this problem will occur. Best, Xingbo John Tipper mailto:john_tip...@hotmail.com>> 于2022年6月16日周四 19:41写道: Hi Xingbo, Yes, there are a number of temporary views being created, where each is being created using

Re: The configured managed memory fraction for Python worker process must be within (0, 1], was: %s

2022-06-16 Thread John Tipper
From: Xingbo Huang Sent: 16 June 2022 12:34 To: John Tipper Cc: user@flink.apache.org Subject: Re: The configured managed memory fraction for Python worker process must be within (0, 1], was: %s Hi John, Does your job logic include conversion between Table and

Re: The configured managed memory fraction for Python worker process must be within (0, 1], was: %s

2022-06-16 Thread John Tipper
version of pyflink you used? Best, Xingbo John Tipper mailto:john_tip...@hotmail.com>> 于2022年6月16日周四 17:05写道: Hi all, I'm trying to run a PyFlink unit test to test some PyFlink SQL and where my code uses a Python UDF. I can't share my code but the test case is similar to the c

The configured managed memory fraction for Python worker process must be within (0, 1], was: %s

2022-06-16 Thread John Tipper
Hi all, I'm trying to run a PyFlink unit test to test some PyFlink SQL and where my code uses a Python UDF. I can't share my code but the test case is similar to the code here: https://github.com/apache/flink/blob/f8172cdbbc27344896d961be4b0b9cdbf000b5cd/flink-python/pyflink/testing/test_case_

Re: How to handle deletion of items using PyFlink SQL?

2022-06-14 Thread John Tipper
-06-09 08:53:36,"Dian Fu" 写道: Hi John, If you are using Table API & SQL, the framework is handling the RowKind and it's transparent for you. So usually you don't need to handle RowKind in Table API & SQL. Regards, Dian On Thu, Jun 9, 2022 at 6:56 AM John Tipper mailto

Re: How to implement unnest() as udtf using Python?

2022-06-13 Thread John Tipper
you? Best regards, Martijn [1] https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/table/sql/queries/joins/#array-expansion Op ma 13 jun. 2022 om 13:55 schreef John Tipper mailto:john_tip...@hotmail.com>>: Hi all, Flink doesn’t support the unnest() function, which takes an a

How to implement unnest() as udtf using Python?

2022-06-13 Thread John Tipper
Hi all, Flink doesn’t support the unnest() function, which takes an array and creates a row for each element in the array. I have column containing an array of map and I’d like to implement my own unnest. I try this as an initial do-nothing implementation: @udtf(result_types=Datatypes.MAP(

Re: How to handle deletion of items using PyFlink SQL?

2022-06-08 Thread John Tipper
ble-functions [2] https://github.com/apache/flink/blob/44f73c496ed1514ea453615b77bee0486b8998db/flink-core/src/main/java/org/apache/flink/types/RowKind.java#L52 -- Best! Xuyang At 2022-06-08 20:06:17, "John Tipper" wrote: Hi all, I have some reference data that is periodica

How to handle deletion of items using PyFlink SQL?

2022-06-08 Thread John Tipper
Hi all, I have some reference data that is periodically emitted by a crawler mechanism into an upstream Kinesis data stream, where those rows are used to populate a sink table (and where I am using Flink 1.13 PyFlink SQL within AWS Kinesis Data Analytics). What is the best pattern to handle de

Re: Multiple INSERT INTO within single PyFlink job?

2022-04-29 Thread John Tipper
-python-api-with-amazon-kinesis-data-analytics/ Sorry for the noise. From: John Tipper Sent: 29 April 2022 14:07 To: user@flink.apache.org Subject: Multiple INSERT INTO within single PyFlink job? Hi all, Is it possible to have more than one `INSERT INTO ... SELECT

Multiple INSERT INTO within single PyFlink job?

2022-04-29 Thread John Tipper
Hi all, Is it possible to have more than one `INSERT INTO ... SELECT ...` statement within a single PyFlink job (on Flink 1.13.6)? I have a number of output tables that I create and I am trying to write to write to these within a single job, where the example SQL looks like (assume there is an

Re: Unit testing PyFlink SQL project

2022-04-25 Thread John Tipper
Hi Dian, I've tried this and it works nicely, on both MacOS and Windows, thank you very much indeed for your help. Kind regards, John From: Dian Fu Sent: 25 April 2022 02:42 To: John Tipper Cc: user@flink.apache.org Subject: Re: Unit testing PyFlin

Re: Unit testing PyFlink SQL project

2022-04-24 Thread John Tipper
via command line argument '--jarfile' or the config option 'pipeline.jars' -- Ran 1 test in 0.401s FAILED (errors=1) sys:1: ResourceWarning: unclosed file <_io.BufferedWriter name=4> From: John Tipper Sent: 24 April 2022 20:48 To: Dian Fu

Re: Unit testing PyFlink SQL project

2022-04-24 Thread John Tipper
osed file <_io.BufferedWriter name=4> (.venv) Johns-MacBook-Pro:testing john$ Unable to get the Python watchdog object, now exit. From: John Tipper Sent: 24 April 2022 20:30 To: Dian Fu Cc: user@flink.apache.org Subject: Re: Unit testing PyFlink SQL project Hi Dian, Thank you very much, th

Re: Unit testing PyFlink SQL project

2022-04-24 Thread John Tipper
code needs all of the transitive dependencies on the classpath? Have you managed to get your example tests to run in a completely clean virtual environment? It looks like if it's working on your computer that your computer perhaps has Java and Python dependencies already downloaded into partic

Unit testing PyFlink SQL project

2022-04-23 Thread John Tipper
Hi all, Is there an example of a self-contained repository showing how to perform SQL unit testing of PyFlink (specifically 1.13.x if possible)? I have cross-posted the question to Stack Overflow here: https://stackoverflow.com/questions/71983434/is-there-an-example-of-pyflink-sql-unit-testing

XXX doesn't exist in the parameters of the SQL statement

2022-04-17 Thread John Tipper
Hi all, I'm having some issues with getting a Flink SQL application to work, where I get an exception and I'm not sure why it's occurring. I have a source table, reading from Kinesis, where the incoming data in JSON format has a "source" and "detail-type" field. CREATE TABLE `input` ( `so

Re: How to process events with different JSON schemas in single Kinesis stream using PyFlink?

2022-04-11 Thread John Tipper
Hi Dian, Thank you very much, that worked very nicely. Kind regards, John From: Dian Fu Sent: 11 April 2022 06:32 To: John Tipper Cc: user@flink.apache.org Subject: Re: How to process events with different JSON schemas in single Kinesis stream using PyFlink

How to process events with different JSON schemas in single Kinesis stream using PyFlink?

2022-04-10 Thread John Tipper
TLDR; I want to know how best to process a stream of events using PyFlink, where the events in the stream have a number of different schemas. Details: I want to process a stream of events coming from a Kinesis data stream which originate from an AWS EventBridge bus. The events in this stream ar

Re: Does Flink support raw generic types in a merged stream?

2019-07-17 Thread John Tipper
nay Schepler mailto:ches...@apache.org>> wrote: Have you looked at org.apache.flink.types.Either? If you'd wrap all elements in both streams before the union you should be able to join them properly. On 17/07/2019 14:18, John Tipper wrote: Hi All, Can I union/join 2 streams cont

Does Flink support raw generic types in a merged stream?

2019-07-17 Thread John Tipper
Hi All, Can I union/join 2 streams containing generic classes, where each stream has a different parameterised type? I'd like to process the combined stream of values as a single raw type, casting to a specific type for detailed processing, based on some information in the type that will allow

What order are events processed in iterative loop?

2019-06-17 Thread John Tipper
For the case of a single iteration of an iterative loop where the feedback type is different to the input stream type, what order are events processed in the forward flow? So for example, if we have: * the input stream contains input1 followed by input2 * a ConnectedIterativeStream at th

How to join/group 2 streams by key?

2019-06-14 Thread John Tipper
Hi All, I have 2 streams of events that relate to a common base event, where one stream is the result of a flatmap. I want to join all events that share a common identifier. Thus I have something that looks like: DataStream streamA = ... DataStream streamB = someDataStream.flatMap(...) // pro

How are timestamps treated within an iterative DataStream loop within Flink?

2019-06-08 Thread John Tipper
Hi All, How are timestamps treated within an iterative DataStream loop within Flink? For example, here is an example of a simple iterative loop within Flink where the feedback loop is of a different type to the input stream: DataStream inputStream = env.addSource(new MyInputSourceFunction()); I

Is it possible to configure Flink pre-flight type serialization scanning?

2019-06-03 Thread John Tipper
Flink performs significant scanning during the pre-flight phase of a Flink application (https://ci.apache.org/projects/flink/flink-docs-stable/dev/types_serialization.html). The act of creating sources, operators and sinks causes Flink to scan the data types of the objects that are used within

Re: FLIP-16, FLIP-15 Status Updates?

2019-02-19 Thread John Tipper
admap > page can be expected in the next 2-3 weeks. > > I hope this information helps. > > Regards, > Timo > > [1] > https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html > >> Am 19.02.19 um 12:47 schrieb John Tipper: >> Hi A

FLIP-16, FLIP-15 Status Updates?

2019-02-19 Thread John Tipper
Hi All, Does anyone know what the current status is for FLIP-16 (loop fault tolerance) and FLIP-15 (redesign iterations) please? I can see lots of work back in 2016, but it all seemed to stop and go quiet since about March 2017. I see iterations as offering very interesting capabilities for Fli