Re: Read data from Oracle using Flink SQL API

2020-02-06 Thread Jingsong Li
Hi Flavio, We can just document after support dialect through JDBCTableFactory. Best, Jingsong Lee On Thu, Feb 6, 2020 at 10:54 PM Flavio Pompermaier wrote: > Ok to me, I just want this to be documented in at > https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connect.html#jdbc

Flink DataTypes json parse exception

2020-02-06 Thread sunfulin
Hi, guys When upgrading to Flink 1.10 rc0, I am confused by the new DataTypes schema defination. I am reading and consuming records from kafka with json schema like {"user_id":1866071598998,"recv_time":1547011832729}. and the code snippet is : .withSchema( new Schema() // eve

Re: Failed to transfer file from TaskExecutor : Vanilla Flink Cluster

2020-02-06 Thread Yang Wang
Maybe you forget to limit the blob server port(blob.server.port) to the range. Best, Yang Milind Vaidya 于2020年2月7日周五 上午7:03写道: > I figured out that it was problem with the ports. 39493/34094 were not > accessible. So to get this working I opened all the ports 0-65535 for the > security group.

Running a Beam Pipeline on GCP Dataproc Flink Cluster

2020-02-06 Thread Xander Song
I am attempting to run a Beam pipeline on a GCP Dataproc Flink cluster. I have followed the instructions at this repo to create a Flink cluster on Dataproc using an initialization action. However, the resulting cluste

Re: Failed to transfer file from TaskExecutor : Vanilla Flink Cluster

2020-02-06 Thread Milind Vaidya
I figured out that it was problem with the ports. 39493/34094 were not accessible. So to get this working I opened all the ports 0-65535 for the security group. How do I control that if I want to open only certain range of ports ? Is "taskmanager.rpc.port" the right parameter to set ? I did try a

Flink Job Cleanup Sequence

2020-02-06 Thread Abdul Qadeer
Hi! Is there a FLIP or doc describing how Flink (1.8+) cleans up the states of a Job upon cancellation in Zookeeper HA mode? I am trying to find the sequence of cleaning between SubmittedJobGraphs in Zookeeper, blobs in FS maintained by BlobStore etc.

Re: NoClassDefFoundError when an exception is throw inside invoke in a Custom Sink

2020-02-06 Thread David Magalhães
Hi Andrey, thanks for your reply. The class is on the jar created with `*sbt assembly*` that is submitted to Flink to start a Job. unzip -l target/jar/myapp-0.0.1-SNAPSHOT.jar | grep DateTimeParserBucket 1649 05-27-2016 10:24 org/joda/time/format/DateTimeParserBucket$SavedField.class 1

Re: Performance issue with RegistryAvroSerializationSchema

2020-02-06 Thread Steve Whelan
Robert, You are correct that it is using a *CachedSchemaRegistryClient* object. Therefore, *schemaRegistryClient.*register() should be checking the cache first before sending a request to the Registry. However, turning on debug logging of my Registry, I can see a request being sent for every seria

Re: NoClassDefFoundError when an exception is throw inside invoke in a Custom Sink

2020-02-06 Thread Andrey Zagrebin
Hi David, This looks like a problem with resolution of maven dependencies or something. The custom WindowParquetGenericRecordListFileSink probably transitively depends on org/joda/time/format/DateTimeParserBucket and it is missing on the runtime classpath of Flink. Best, Andrey On Wed, Feb 5, 20

Re: Performance issue with RegistryAvroSerializationSchema

2020-02-06 Thread Dawid Wysakowicz
Hi Steve, I think your observation is correct. If I am not mistaken we should use *schemaRegistryClient.getId(subject, schema); *instead of** **schemaRegistryClient.register(subject, schema);. **The former should perform an http request only if the schema is not in the cache. I created an issue t

Re: Performance issue with RegistryAvroSerializationSchema

2020-02-06 Thread Robert Metzger
Hi, thanks a lot for your message. It's certainly not intentional to do a HTTP request for every single message :) Isn't the *schemaRegistryClient *an instance of CachedSchemaRegistryClient, which, as the name says, caches? Can you check with a debugger at runtime what registry client is used, and

Re: Best approach for recalculating statistics based on amended or deleted events?

2020-02-06 Thread Timo Walther
Hi Stephen, it would I meant was that the schema of the table might still contain a column that descibes the change "isRetract". We cannot apply it internally. But of course you can deal with this column in a SQL query. This answer here and the linked answer might also help you: https://stac

RE: Flink 1.10 feedback

2020-02-06 Thread ZAIDI Farouk - externe
Thanks I’ll give a try De : danrtsey...@gmail.com Envoyé : jeudi 6 février 2020 15:24 À : Farouk Cc : user Objet : Re: Flink 1.10 feedback Hi Farouk, Currently, the community does not provide the image on docker hub for un-released version. I think you could using the following steps to bui

Re: Read data from Oracle using Flink SQL API

2020-02-06 Thread Flavio Pompermaier
Ok to me, I just want this to be documented in at https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connect.html#jdbc-connector. Do I need to open another JIRA for that? On Thu, Feb 6, 2020 at 11:04 AM Jingsong Li wrote: > Hi Flavio, > > Instead of "TableEnvionment.registerTableSo

Re: Flink 1.10 feedback

2020-02-06 Thread Yang Wang
Hi Farouk, Currently, the community does not provide the image on docker hub for un-released version. I think you could using the following steps to build your image and then push to your docker repository. 1. Download the flink dist for 1.10, e.g. flink-1.10.0-rc2[1] 2. Unzip and cd flink-1.10

Re: Best approach for recalculating statistics based on amended or deleted events?

2020-02-06 Thread Stephen Young
Are you able to advise any further Timo? Thanks! On 2020/02/04 16:10:04, Stephen Young wrote: > Hi Timo, > > Thanks for replying to me so quickly! > > We could do it with insert-only rows. When you say flags in the data do you > mean a field with a name like 'retracts' and then the value of t

Re: [BULK]Re: DisableGenericTypes is not compatible with Kafka

2020-02-06 Thread Aljoscha Krettek
We can just directly create a KryoSerializer instead of going through TypeInformation, I think. Best, Aljoscha On 05.02.20 18:29, Oleksandr Nitavskyi wrote: Thanks, guys for the answers. Aljoscha, I have a question to ensure I get it right. Am I correctly understand that this newly created T

Re: Flink TaskManager Logs polluted with InfluxDB metrics reporter errors

2020-02-06 Thread Chesnay Schepler
This could be related to FLINK-12579 , some kafka metrics use Infinity as a value which Influx doesn't support. For 1.10 we updated our influxdb-java dependency which automatically filters out these values, but we haven't released this yet. We

Re: Flink TaskManager Logs polluted with InfluxDB metrics reporter errors

2020-02-06 Thread Morgan Geldenhuys
The only difference that i can tell is the Kubernetes cluster was upgraded from 1.14 to 1.17, however I rolled this back to test and the same result on the older version. Ive created a stripped down Job which produces the errors: // create a testing sink private static class CollectSinkimple

Re: Read data from Oracle using Flink SQL API

2020-02-06 Thread Jingsong Li
Hi Flavio, Instead of "TableEnvionment.registerTableSource", we want to drop it in 1.10. I think you can create a JIRA to support pass dialect through JDBCTableFactory. We should support it. What do you think? Best, Jingsong Lee On Thu, Feb 6, 2020 at 6:01 PM Flavio Pompermaier wrote: > Shou

Re: Read data from Oracle using Flink SQL API

2020-02-06 Thread Flavio Pompermaier
Should I open a JIRA to better document how to connect to an RDBMS that has no "out-of-the-box" dialect? I.e. Pass the dialect impl in the TableEnvionment.registerTableSource? On Thu, Feb 6, 2020 at 10:53 AM Jingsong Li wrote: > Hi Bowen, > > JIRA exists: https://issues.apache.org/jira/browse/FL

Flink 1.10 feedback

2020-02-06 Thread Farouk
Hi guys Is there an easy way to test in advance Flink 1.10 with docker images before releasing 1.10 ? We can may be give you a feedback. We have tests running with TestContainers. Thanks Farouk

Re: Flink TaskManager Logs polluted with InfluxDB metrics reporter errors

2020-02-06 Thread Chesnay Schepler
Setup-wise, are there any differences to what you had a few months ago? On 06/02/2020 10:40, Morgan Geldenhuys wrote: Further info, the flink cluster (1.9) is running on Kubernetes (1.17) with InfluxDB. I have tried the following images for InfluxDB: docker.io/influxdb:1.6.4 and influxdb:la

Re: Read data from Oracle using Flink SQL API

2020-02-06 Thread Jingsong Li
Hi Bowen, JIRA exists: https://issues.apache.org/jira/browse/FLINK-14078 Best, Jingsong Lee On Thu, Feb 6, 2020 at 12:57 PM Bowen Li wrote: > Hi Flavio, > > +1 for adding Oracle (potentially more dbms like SqlServer, etc) to > flink-jdbc. Would you mind open a parent ticket and some subtasks,

Re: Flink TaskManager Logs polluted with InfluxDB metrics reporter errors

2020-02-06 Thread Morgan Geldenhuys
Further info, the flink cluster (1.9) is running on Kubernetes (1.17) with InfluxDB. I have tried the following images for InfluxDB: docker.io/influxdb:1.6.4 and influxdb:latest When going into the database and showing the series, there are really weird results: > show series key --- job

Re: Flink TaskManager Logs polluted with InfluxDB metrics reporter errors

2020-02-06 Thread Chesnay Schepler
What InfluxDB version are you using? On 05/02/2020 19:41, Morgan Geldenhuys wrote: I am trying to setup metrics reporting for Flink using InflixDB, however I am receiving tons of exceptions (listed right at the bottom). Reporting is setup as recommended by the documentation: metrics.reporter