[DISCUSS] FLIP-111: Docker image unification

2020-03-04 Thread Andrey Zagrebin
Hi All, If you have ever touched the docker topic in Flink, you probably noticed that we have multiple places in docs and repos which address its various concerns. We have prepared a FLIP [1] to simplify the perception of docker topic in Flink by users. It mostly advocates for an approach of exte

Re: StreamingFileSink Not Flushing All Data

2020-03-04 Thread Kostas Kloudas
Hi Austin, I will have a look at your repo. In the meantime, given that [1] is already merged in 1.10, would upgrading to 1.10 and using the newly introduced CompressWriterFactory be an option for you? It is unfortunate that this feature was not documented. Cheers, Kostas [1] https://issues.apa

Re: Very large _metadata file

2020-03-04 Thread Kostas Kloudas
Hi Jacob, Could you specify which StateBackend you are using? The reason I am asking is that, from the documentation in [1]: "Note that if you use the MemoryStateBackend, metadata and savepoint state will be stored in the _metadata file. Since it is self-contained, you may move the file and rest

CREATE TABLE with Schema derived from format

2020-03-04 Thread Gyula Fóra
Hi All! I am wondering if it would be possible to change the CREATE TABLE statement so that it would also work without specifying any columns. The format generally defines the available columns so maybe we could simply use them as is if we want. This would be very helpful when exploring differen

Question on the Kafka connector parameter "connector.properties.zookeeper.connect"

2020-03-04 Thread Weike Dong
Hi, Recently I have found that in the Flink Kafka Connector, the parameter "connector.properties.zookeeper.connect" is made mandatory for users. Therefore without it, Flink would throw an exception saying "Caused by: org.apache.flink.table.api.ValidationException: Could not find required proper

Re: Question on the Kafka connector parameter "connector.properties.zookeeper.connect"

2020-03-04 Thread Jark Wu
Hi Weike, You are right. It is not needed since Kafka 0.9+. We already have an issue to make it optional. See https://issues.apache.org/jira/browse/FLINK-16125. We are planning to fix it in 1.10.1 too. Best, Jark On Wed, 4 Mar 2020 at 18:23, Weike Dong wrote: > Hi, > > > > Recently I have foun

Re: Very large _metadata file

2020-03-04 Thread Tzu-Li (Gordon) Tai
Hi Jacob, Apart from what Klou already mentioned, one slightly possible reason: If you are using the FsStateBackend, it is also possible that your state is small enough to be considered to be stored inline within the metadata file. That is governed by the "state.backend.fs.memory-threshold" confi

Re: CREATE TABLE with Schema derived from format

2020-03-04 Thread Jark Wu
Hi Gyula, That's a good point and is on the roadmap. In 1.10, JSON and CSV format can derive format schema from table schema. So you don't need to specify format schema in properties anymore if you are using 1.10. On the contrary, we are planning to derive table schema from format schema if it i

Re: CREATE TABLE with Schema derived from format

2020-03-04 Thread Gyula Fóra
Hi Jark, Thank you for the clarification this is exactly what I was looking for, especially for the second part regarding schema registry integration. This question came up as we were investigating how the schema registry integration should look like :) Cheers, Gyula On Wed, Mar 4, 2020 at 12:0

Flink Serialization as stable (kafka) output format?

2020-03-04 Thread Theo Diefenthal
Hi, Without knowing too much about flink serialization, I know that Flinks states that it serializes POJOtypes much faster than even the fast Kryo for Java. I further know that it supports schema evolution in the same way as avro. In our project, we have a star architecture, where one flink j

History server UI not working

2020-03-04 Thread pwestermann
We recently upgraded from Flink 1.7 to Flink 1.9.2 and the history server UI now seems to be broken. It doesn't load and always just displays a blank screen. The individual endpoints (e.g. /jobs/overview) still work. Could this be an issue caused by the Angular update for the regular UI? -- Se

Re: CREATE TABLE with Schema derived from format

2020-03-04 Thread Jark Wu
Yes. From my perspective, deriving schema from schema registry is the most important use case of FLINK-16420. Some initial idea about this: 1) introduce a SchemaRegisteryCatalog to allow users run queries on existing topics without manual table definition. see FLINK-12256 2) provide a connector pr

Re: CREATE TABLE with Schema derived from format

2020-03-04 Thread Gyula Fóra
Hi! Initially we were looking at 2) but 1) would be the best solution. I think both are would be very valuable. My only concern related to using the Schema Registry as a Catalog is the interaction with other Catalogs in the system. Maybe you are using a Hive catalog to track a bunch of tables, an

Re: EXTERNAL: Re: Should I use a Sink or Connector? Or Both?

2020-03-04 Thread Castro, Fernando C.
Thank you guys. So I have no idea of why data is not being pushed to Elasticsearch… ☹ My complete code is at https://stackoverflow.com/questions/60512064/flink-is-not-adding-any-data-to-elasticsearch-but-no-errors Btw, for some reason I still need to pass .documentType to the Elasticsearch conn

RE: Flink's Either type information

2020-03-04 Thread jacopo.gobbi
Hi all, Yes my problem is that I do not create the function inline but create a function directly when creating the data stream job. My code (which I cannot share) is exactly like your example, Yun, are you aware if there is a way to prevent code erasure? Kind regards, Jacopo Gobbi From: Yun

CFP: Workshop on Large Scale RDF Analytics (LASCAR-20) at ESWC'20

2020-03-04 Thread Hajira Jabeen
** We apologize for cross-postings. We appreciate your great help in forwarding this CFP to your colleagues and friends. **

Teradata as JDBC Connection

2020-03-04 Thread Norm Vilmer (Contractor)
Using Flink 1.10 and coding in Java 11, is it possible use to write to Teradata in append mode? MySQL, PostgreSQL, and Derby are the only supported drivers listed. Thanks. https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connect.html#connectors I created the ConnectorDescriptor

Re: Building with Hadoop 3

2020-03-04 Thread Stephan Ewen
Have you tried to just export Hadoop 3's classpath to `HADOOP_CLASSPATH` and see if that works out of the box? If the main use case is HDFS access, then there is a fair chance it might just work, because Flink uses only a small subset of the Hadoop FS API which is stable between 2.x and 3.x, as fa

checkpoint _metadata file has >20x different in size among different check-points

2020-03-04 Thread Yu Yang
Hi all, We have a flink job that does check-pointing per 10 minutes. We noticed that for the check-points of this job, the _metadata file size can vary a lot. In some checkpoint, we observe that _metadata file size was >900MB, while in some other check-points of the same job, the _metadata file s

Re: Very large _metadata file

2020-03-04 Thread Jacob Sevart
Kostas and Gordon, Thanks for the suggestions! I'm on RocksDB. We don't have that setting configured so it should be at the default 1024b. This is the full "state.*" section showing in the JobManager UI. [image: Screen Shot 2020-03-04 at 9.56.20 AM.png] Jacob On Wed, Mar 4, 2020 at 2:45 AM Tzu-

Re: checkpoint _metadata file has >20x different in size among different check-points

2020-03-04 Thread Arvid Heise
Hi Yu, are you using incremental checkpoints [1]? If so, then the smaller checkpoints would be the deltas and the larger the complete state. [1] https://flink.apache.org/features/2018/01/30/incremental-checkpointing.html On Wed, Mar 4, 2020 at 6:41 PM Yu Yang wrote: > Hi all, > > We have a fli

Re: Flink's Either type information

2020-03-04 Thread Arvid Heise
Hi Jacopo, to prevent type erasure in Java, you need to create a sub-type that contains only reified types. Instead of using a generic type with bound variables in stream.process(new MyKeyedBroadcastProcessFunction()); you can use stream.process(new MyKeyedBroadcastProcessFunction()

Re: Unable to recover from savepoint and checkpoint

2020-03-04 Thread Puneet Kinra
I killed the task manager and job manager forcefully by the kill -9 command and while recovering I am checking the flag returned by the isRestored method in the Intializestate function. anyways I figured the issue and fixed it thanks for the support. On Tue, Mar 3, 2020 at 7:24 PM Gary Yao wrote

Re: JobMaster does not register with ResourceManager in high availability setup

2020-03-04 Thread Bajaj, Abhinav
Thanks Xintong for pointing that out. I will dig deeper and get back with my findings. ~ Abhinav Bajaj From: Xintong Song Date: Tuesday, March 3, 2020 at 7:36 PM To: "Bajaj, Abhinav" Cc: "user@flink.apache.org" Subject: Re: JobMaster does not register with ResourceManager in high availabilit

Re: JobMaster does not register with ResourceManager in high availability setup

2020-03-04 Thread Bajaj, Abhinav
While I setup to reproduce the issue with debug logs, I would like to share more information I noticed in INFO logs. Below is the sequence of events/exceptions I notice during the time zookeeper was disrupted. I apologize in advance as they are a bit verbose. * Zookeeper seems to be down a

RE: Teradata as JDBC Connection

2020-03-04 Thread Norm Vilmer (Contractor)
Same error with this change: public class Teradata extends ConnectorDescriptor { /** * Constructs a {@link ConnectorDescriptor}. */ public Teradata() { super("jdbc", 1, false); } @Override protected Map toConnectorProperties() { Map map = new HashMap<

Rocksdb Serialization issue

2020-03-04 Thread David Morin
Hello, I have this Exception in my datastream app and I can't find the root cause. I consume data from Kafka and it fails when I try to get a value from my MapState in RocksDB. It was working in previous release of my app but I can't find the cause of this error. java.lang.ArrayIndexOutOfBounds

How to test flink job recover from checkpoint

2020-03-04 Thread Eleanore Jin
Hi, I have a flink application and checkpoint is enabled, I am running locally using miniCluster. I just wonder if there is a way to simulate the failure, and verify that flink job restarts from checkpoint? Thanks a lot! Eleanore

Re: StreamingFileSink Not Flushing All Data

2020-03-04 Thread Austin Cawley-Edwards
Hey Kostas, We’re a little bit off from a 1.10 update but I can certainly see if that CompressWriterFactory might solve my use case for when we do. If there is anything I can do to help document that feature, please let me know. Thanks! Austin On Wed, Mar 4, 2020 at 4:58 AM Kostas Kloudas wro

Re: How to test flink job recover from checkpoint

2020-03-04 Thread Zhu Zhu
Hi Eleanore, You can change your application tasks to throw exceptions in a certain frequency. Alternatively, if the application has external dependencies (e.g. source), you can trigger failures manually by manipulating the status of the external service (e.g. shutdown the source service, or break

Re: How to test flink job recover from checkpoint

2020-03-04 Thread Bajaj, Abhinav
I implemented a custom function that throws up a runtime exception. You can extend from simpler MapFunction or more complicated RichParallelSourceFunction depending on your use case. You can add logic to throw a runtime exception on a certain condition in the map or run method. . Y

Re: JobMaster does not register with ResourceManager in high availability setup

2020-03-04 Thread Xintong Song
Hi Abhinav, Do you mind sharing the complete 'jobmanager.log'? org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Cannot serve slot > request, no ResourceManager connected. > Sometimes you see this log because the ResourceManager is not yet connect when the slot request arrives the SlotPool.

How to use self defined json format when create table from kafka stream?

2020-03-04 Thread wangl...@geekplus.com.cn
I want to rigister a table from mysql binlog like this: tEnv.sqlUpdate("CREATE TABLE order(\n" + "order_id BIGINT,\n" + "order_no VARCHAR,\n" + ") WITH (\n" + "'connector.type' = 'kafka',\n" ... + "'update-mode' = 'append',\n" + "'format.typ

Re: Question about runtime filter

2020-03-04 Thread faaron zheng
I finally got through the runtimefilter in 1.10, the reason why it didn't call commit method is in OperatorCodeGenerator. It should call endInput() method correctly in generateOneInputStreamOperator. A complete process of runtimefilter is: 1.Add insert and remove rule in batch rules. 2.Build side c

Re: Question about runtime filter

2020-03-04 Thread Jingsong Li
Great exploration. And thanks for your information. I believe you have a deep understanding of Flink's internal mechanism. Best, Jingsong Lee On Thu, Mar 5, 2020 at 12:09 PM faaron zheng wrote: > I finally got through the runtimefilter in 1.10, the reason why it didn't > call commit method is i

Re: How to test flink job recover from checkpoint

2020-03-04 Thread Eleanore Jin
Hi Zhu Zhu and Abhinav, I am able to verify the recovery from checkpoint based on your suggestions, thanks a lot for the help! Eleanore On Wed, Mar 4, 2020 at 5:40 PM Bajaj, Abhinav wrote: > I implemented a custom function that throws up a runtime exception. > > > > You can extend from simpler

Re: History server UI not working

2020-03-04 Thread Yang Wang
If all the rest api could be viewed successfully, then the reason may be js cache. You could try to force a refresh(e.g. Cmd+Shft+R for Mac). It solved my problem before. Best, Yang pwestermann 于2020年3月4日周三 下午8:40写道: > We recently upgraded from Flink 1.7 to Flink 1.9.2 and the history server >

Re: How to use self defined json format when create table from kafka stream?

2020-03-04 Thread Jark Wu
Hi Lei, Currently, Flink SQL doesn't support to register a binlog format (i.e. just define "order_id" and "order_no", but the json schema has other binlog fields). This is exactly what we want to support in FLIP-105 [1] and FLIP-95. For now, if you want to consume such json data, you have to defi

Re: Rocksdb Serialization issue

2020-03-04 Thread Arvid Heise
Hi David, the obvious reason is that your state stored an enum value that is not present anymore. It tries to deserialize the 512. entry in your enum that is not available. However, since it's highly unlikely that you actually have that many enum values in the same enum class, we are actually loo

Re: Teradata as JDBC Connection

2020-03-04 Thread Arvid Heise
Hi Norm, the error message already points to the main issue: your property names are not correct. *Unsupported property keys: drivername update-mode password dburl username* You should use the builder to properly configure the sink [1]. [1] https://ci.apache.org/projects/flink/flink-docs-st

Re: EXTERNAL: Re: Should I use a Sink or Connector? Or Both?

2020-03-04 Thread Arvid Heise
Hi Fernando, How much data are you trying to write? If you just use single messages for testing, it could be that the default bulk settings are not working well. If so, could you please adjust the following settings and report back? public enum SinkOption { BULK_FLUSH_MAX_ACTIONS, BULK_FLU