Re: Re: Re: Using Kafka as bounded source with DataStream API in batch mode (Flink 1.12)

2021-01-14 Thread Yun Gao
Hi Sagar, I rechecked and found that the new kafka source is not formally publish yet, and a stable method I think may be try adding the FlinkKafkaConsumer as a BOUNDED source first. Sorry for the inconvient. Best, Yun -- Send

Re: Getting Exception in thread "main" java.util.concurrent.ExecutionException: scala.tools.reflect.ToolBoxError: reflective compilation has failed: cannot initialize the compiler

2021-01-14 Thread Arvid Heise
Hi Avi, apparently the maximum version that Flink supports for scala is 2.12.7 [1]. Do you have a specific reason to use a higher version? [1] https://issues.apache.org/jira/browse/FLINK-12461 On Thu, Jan 14, 2021 at 5:11 AM Avi Levi wrote: > Hi Arvid, > Please find attached full build.gradle

Deterministic rescale for test

2021-01-14 Thread Martin Frank Hansen
Hi, I am trying to make a test-suite for our Flink jobs, and are having problems making the input-data deterministic. We are reading a file-input with parallelism 1 and want to rescale to a higher parallelism, such that the ordering of the data is the same every time. I have tried using rebalanc

Re: StreamingFileSink with ParquetAvroWriters

2021-01-14 Thread Dawid Wysakowicz
Hi Jan Could you make sure you are packaging that dependency with your job jar? There are instructions how to configure your build setup[1]. Especially the part how to build a jar with dependencies might come in handy[2]. Best, Dawid [1] https://ci.apache.org/projects/flink/flink-docs-release-1

Re: Elasticsearch config maxes can not be disabled

2021-01-14 Thread Dawid Wysakowicz
Hi, First of all, what Flink versions are you using? You are right it is a mistake in the documentation of the sink.bulk-flush.max-actions. It should say: Can be set to |'-1'| to disable it. I created a ticket[1] to track that. And as far as I can tell and I quickly checked that it should work. A

Fwd: Error querying flink state

2021-01-14 Thread Falak Kansal
Hi, I have set up a flink cluster on my local machine. I created a flink job ( TrackMaximumTemperature) and made the state queryable. I am using *github/streamingwithflink/chapter7/QueryableState.scala* example from *https://github.com/streaming-with-flink

Re: Flink[Python] questions

2021-01-14 Thread Shuiqiang Chen
Hi Dc, Thank you for your feedback. 1. Currently, only built-in types are supported in Python DataStream API, however, you can apply a Row type to represent a custom Python class as a workaround that field names stand for the name of member variables and field types stand for the type of member

Re: Dead code in ES Sink

2021-01-14 Thread Aljoscha Krettek
On 2021/01/13 07:50, Rex Fenley wrote: Are you saying that this option does get passed along to Elasticsearch still or that it's just arbitrarily validated? According to [1] it's been deprecated in ES 6 and removed in ES 7. [1] https://github.com/elastic/elasticsearch/pull/38085 Sorry, I wasn'

Re: Getting Exception in thread "main" java.util.concurrent.ExecutionException: scala.tools.reflect.ToolBoxError: reflective compilation has failed: cannot initialize the compiler

2021-01-14 Thread Arvid Heise
Hi Avi, could you run gradle dependencies and report back to me? Also did you ensure to run gradle clean before? The gradle version you are using is ancient, so I'm not sure if it's picking up the change correctly. On Thu, Jan 14, 2021 at 10:55 AM Avi Levi wrote: > No, I don't. I actually tri

Re: Re: Re: Using Kafka as bounded source with DataStream API in batch mode (Flink 1.12)

2021-01-14 Thread sagar
Thanks Yun On Thu, Jan 14, 2021 at 1:58 PM Yun Gao wrote: > Hi Sagar, > > I rechecked and found that the new kafka source is not formally publish > yet, and a stable method I think may be try adding the FlinkKafkaConsumer > as a BOUNDED source first. Sorry for the inconvient. > > Best, > Yu

Declaring and using ROW data type in PyFlink DataStream

2021-01-14 Thread meneldor
Hello, What is the correct way to use Python dict's as ROW type in pyflink? Im trying this: output_type_info = Types.ROW_NAMED(['id', 'data', 'timestamp' ], [Types.STRING(), Types.STRING(), Types.LONG() ]) class MyProcessFunction(KeyedProcessFunction): de

Pyflink Join with versioned view / table

2021-01-14 Thread Barth, Torben
Dear List, I have trouble implementing a join between two streaming tables in Python Table API. The left table of my join should be enriched with the information with the last value of the right_table. The right_table is updated only rarely (maybe after 15 minutes). When implementing the join

Re: Declaring and using ROW data type in PyFlink DataStream

2021-01-14 Thread Shuiqiang Chen
Hi meneldor, The main cause of the error is that there is a bug in `ctx.timer_service().current_watermark()`. At the beginning the stream, when the first record come into the KeyedProcessFunction.process_element() , the current_watermark will be the Long.MIN_VALUE at Java side, while at the Python

Re: Elasticsearch config maxes can not be disabled

2021-01-14 Thread Rex Fenley
Flink 1.11.2 CREATE TABLE sink_es ( ... ) WITH ( 'connector' = 'elasticsearch-7', 'hosts' = '${sys:proxyEnv.ELASTICSEARCH_HOSTS}', 'index' = '${sys:graph.flink.index_name}', 'format' = 'json', 'sink.bulk-flush.max-actions' = '0', 'sink.bulk-flush.max-size' = '0', 'sink.bulk-flush.interval' = '1s',

Re: error accessing S3 bucket 1.12

2021-01-14 Thread Dawid Wysakowicz
Hi Billy, I think you might be hitting the same problem as described in this thread[1]. Does your bucket meet all the name requirements as described in here[2] (e.g. have an underscore)? Best, Dawid [1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Unable-to-set-S3-like-ob

Re: Declaring and using ROW data type in PyFlink DataStream

2021-01-14 Thread meneldor
Thank you for the answer Shuiqiang! Im using the last apache-flink version: > Requirement already up-to-date: apache-flink in > ./venv/lib/python3.7/site-packages (1.12.0) however the method signature is using a collector: [image: image.png] Im using the *setup-pyflink-virtual-env.sh* shell scr

Re: Deterministic rescale for test

2021-01-14 Thread Jaffe, Julian
Martin, You can use `.partitionCustom` and provide a partitioner if you want to control explicitly how elements are distributed to downstream tasks. From: Martin Frank Hansen Reply-To: "m...@berlingskemedia.dk" Date: Thursday, January 14, 2021 at 1:48 AM To: user Subject: Deterministic rescal

Re: Enrich stream with SQL api

2021-01-14 Thread Dawid Wysakowicz
Hi Marek, I am afraid I don't have a good answer for your question. The problem indeed is that the JDBC source can work only as a bounded source. As you correctly pointed out, as of now mixing bounded with unbounded sources does not work with checkpointing, which we want to address in the FLIP-147

Re: Flink SQL - IntervalJoin doesn't support consuming update and delete - trying to deduplicate rows

2021-01-14 Thread Dan Hill
Hey, sorry for the late reply. I'm using v1.11.1. Cool. I did a non-SQL way of using the first row. I'll try to see if I can do this in the SQL version. On Wed, Jan 13, 2021 at 11:26 PM Jark Wu wrote: > Hi Dan, > > Sorry for the late reply. > > I guess you applied a "deduplication with keepi

Re: Declaring and using ROW data type in PyFlink DataStream

2021-01-14 Thread Xingbo Huang
Hi meneldor, I guess Shuiqiang is not using the pyflink 1.12.0 to develop the example. The signature of the `process_element` method has been changed in the new version[1]. In pyflink 1.12.0, you can use `collector`.collect to send out your results. [1] https://issues.apache.org/jira/browse/FLINK

Gauges generating same graph

2021-01-14 Thread Manish G
Hi All, I have few RichFlatMapFunction classes, and I have gauge added to each one of them. For a particular usecase I am updating these gauges incrementally. I have a class member variable in each of these classes which keeps increasing as flapMap function in these classes is called, and then I

Re: Flink app logs to Elastic Search

2021-01-14 Thread bat man
I was able to make it work with a fresh Elastic installation. Now taskmanager and jobmanager logs are available in elastic. Thanks for the pointers. -Hemant. On Wed, Jan 13, 2021 at 6:21 PM Aljoscha Krettek wrote: > On 2021/01/11 01:29, bat man wrote: > >Yes, no entries to the elastic search. N

Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-14 Thread Yun Gao
Hi all, We have some offline discussion together with @Arvid, @Roman and @Aljoscha and I'd like to post some points we discussed: 1) For the problem that the "new" root task coincidently finished before getting triggered successfully, we have listed two options in the FLIP-147[1], for the fi

Re: Deterministic rescale for test

2021-01-14 Thread Martin Frank Hansen
Hi Jaffe, Thanks for your reply, I will try to use a Custom Partioner. Den tor. 14. jan. 2021 kl. 19.39 skrev Jaffe, Julian < julianja...@activision.com>: > Martin, > > > > You can use `.partitionCustom` and provide a partitioner if you want to > control explicitly how elements are distributed t

Re: Simplest way to deploy flink job on k8s for e2e testing purposes

2021-01-14 Thread Salva Alcántara
Can anyone explain why I am getting this error? "Exception in thread "main" java.lang.IllegalStateException: No ExecutorFactory found to execute the application." I have tried a slightly different approach by running the jar that `sbt assembly`produces inside a container that looks like this (Doc