Flink sql using Hive for metastore throws Exception

2021-02-01 Thread Eleanore Jin
Hi experts, I am trying to experiment how to use Hive to store metadata along using Flink SQL. I am running Hive inside a docker container locally, and running Flink SQL program through IDE. Flink version 1.12.0 the sample code looks like: StreamExecutionEnvironment bsEnv = StreamExecutionEnviro

Re: Custom JdbcSink?

2021-02-01 Thread Cecile Kim
Nevermind! I realized I was just misreading the code. I can override the class. JdbcExec is just a generic, duh 😊 Thank you, Cecile From: Cecile Kim Date: Monday, February 1, 2021 at 2:01 PM To: user@flink.apache.org Subject: Custom JdbcSink? Hi, I would like to override the JdbcSink, so that

Re: LEAD/LAG functions

2021-02-01 Thread Jark Wu
Yes. RANK/ROW_NUMBER is not allowed with ROW/RANGE over window, i.e. the "ROWS BETWEEN 1 PRECEDING AND CURRENT ROW" clause. Best, Jark On Mon, 1 Feb 2021 at 22:06, Timo Walther wrote: > Hi Patrick, > > I could imagine that LEAD/LAG are translated into RANK/ROW_NUMBER > operations that are not s

Re: Flink CheckPoint/Savepoint Behavior Question

2021-02-01 Thread Raghavendar T S
Flink is aware of all the tasks running in the cluster. If any of the tasks fails, the failed task is restored using the checkpoint (only If the task uses Flink Operator State). This scenario will not use savepoints. Savepoints are same as checkpoints and the difference is that the savepoints are c

How to implement a FTP connector Flink Table/sql support?

2021-02-01 Thread 1095193...@qq.com
Hi I have investigate the relevant document and code about Flink connector. Flink support local filesystem and several pluggable file system which not include FTP. Could you give me some suggestions how to make Flink read data from FTP. One way I have learned is implementing FTP conncector ac

Re: Integration with Apache AirFlow

2021-02-01 Thread 姜鑫
Hi Flavio,Could you explain what your direct question is? In my opinion, it is possible to define two airflow operators to submit dependent flink job, as long as the first one can reach the end.Regards,Xin2021年2月1日 下午6:43,Flavio Pompermaier 写道:Any advice here?On Wed, Jan 27,

[Flink SQL] Run SQL programatically

2021-02-01 Thread cristi.cioriia
Hey guys, We have been trying out Fink SQL to implement use-cases like "compute the last X minutes of data" and we'd like to return the computed data programatically from Scala/Java, is there a way to write a program to run SQL queries over Flink in a reactive manner? We used for now for experime

Flink CheckPoint/Savepoint Behavior Question

2021-02-01 Thread Jason Liu
We currently have some logic to load data from S3 into memory in our Flink/Kinesis Analytics app. This happens before the RichFunction.open() function. We have two questions here and I can't find too much information in the apache.org website: 1. (More of a clarification) When Flink does c

Custom JdbcSink?

2021-02-01 Thread Cecile Kim
Hi, I would like to override the JdbcSink, so that, given one record, it adds N insert SQL statements to the batch, where N is equal to a length computed by the given record. To do this, I need to override JdbcBatchingOutputFormat.writeRecord(), so that I can adjust how batchCount is increment

Running Beam Pipelines on a Flink Application Mode Cluster

2021-02-01 Thread Jan Bensien
Hello, I am currently trying to run my Apache Beam applications using Flink as my backend. Currently i use a session cluster running on Kubernetes. Is it possible to run Beam pipelines using the application mode? I would like to change to application mode, as I currently benchmark my applicat

[Flink SQL] Insert query fails for partitioned table

2021-02-01 Thread Cristian Cioriia
Hey guys, I’m trying to create a Kafka backed partitioned table [1] and insert some data into it [2] using the sql-client, but I get the error [3] when doing it. Can you guys help with this? Also, I wanted to add the partition to the table as in [4] as per the documentation, but then the creati

Re: Proctime consistency

2021-02-01 Thread Rex Fenley
We need to aggregate in precisely row order. Is there a safe way to do this? Maybe with some sort of row time sequence number? As written in another email, we're currently doing the following set of operations val compactedUserDocsStream = userDocsStream .window(TumblingProcessingTimeWindows.of(Ti

Re: Flink and Amazon EMR

2021-02-01 Thread Piotr Nowojski
Hi Marco, > Is this assumption correct? Yes. More or else each operator is first creating a copy of its state locally and uploading to S3 this whole file at once. Please first take a look which part of checkpointing is taking so long. Re backpressure. Keep in mind that Checkpoint Barriers need

Re: question on checkpointing

2021-02-01 Thread Marco Villalobos
Actually, perhaps I misworded it. This particular checkpoint seems to occur in an operator that is flat mapping (it is actually a keyed processing function) a single blob data-structure into several hundred thousands elements (sometimes a million) that immediately flow into a sink. I am speculatin

Re: Flink and Amazon EMR

2021-02-01 Thread Marco Villalobos
Thank you. Checkpoints timeout often, even though the timeout limit is 20 minutes. The volume of records in our processing window that require checkpointing is large (between 20 and 2 million). I made the assumption that Flink would batch a blob of bytes to S3, and not create an S3 call per re

Re: Connect to schema registry via SSL

2021-02-01 Thread Laurent Exsteens
Hi Dawid, Thank you for your answer. I suspected the same and was about to try to follow the calls to verify it. You save me some time. I guess it might be early to tell, but if you would know in which flink and Ververica platform version this would become available, and when, I'm off course inte

Re: importing types doesn't fix “could not find implicit value for evidence parameter of type …TypeInformation”

2021-02-01 Thread Piotr Nowojski
Hey, Sorry for my hasty response. I didn't notice you have the import inside the code block. Have you maybe tried one of the responses suggested in the Stackoverflow by other users? Best, Piotrek pon., 1 lut 2021 o 15:49 Piotr Nowojski napisał(a): > Hey Devin, > > Have you maybe tried looking

Re: Flink and Amazon EMR

2021-02-01 Thread Piotr Nowojski
Hi, Yes, it's working. You would need to analyse what's working slower than expected. Checkpointing times? (Async duration? Sync duration? Start delay/back pressure?) Throughput? Recovery/startup? Are you being rate limited by Amazon? Piotrek czw., 28 sty 2021 o 03:46 Marco Villalobos napisał(a

Re: importing types doesn't fix “could not find implicit value for evidence parameter of type …TypeInformation”

2021-02-01 Thread Piotr Nowojski
Hey Devin, Have you maybe tried looking for an answer via Google? Via just copying pasting your error message into Google I'm getting hundreds of results pointing towards: import org.apache.flink.api.scala._ Best, Piotrek czw., 28 sty 2021 o 04:13 Devin Bost napisał(a): > I posted this proble

Re: Connect to schema registry via SSL

2021-02-01 Thread Dawid Wysakowicz
Hi, I am afraid passing of these options is not supported in SQL yet. I created FLINK-21229 to add support for it. In a regular job you can construct a schema registry client manually:     RegistryAvroDeserializationSchema deserializationSchema = new RegistryAvroDeserializationSchema<>(

Re: Proctime consistency

2021-02-01 Thread Timo Walther
Hi Rex, processing-time gives you no alignment of operators across nodes. Each operation works with its local machine clock that might be interrupted by the OS, Java garbage collector, etc. It is always a best effort timing. Regards, Timo On 27.01.21 18:16, Rex Fenley wrote: Hello, I'm lo

Re: Flink sql problem

2021-02-01 Thread Timo Walther
Hi Jiazhi, I think an OVER window might solve your use case. It gives you a rolling aggregation over period of time. Maybe you need to define a custom aggregate function to emit the final record as you need it. Let me know if you have further questions. Regards, Timo On 27.01.21 15:02, ?g?

Re: Rocksdb - org.apache.flink.util.SerializedThrowable : bad entry in block

2021-02-01 Thread Timo Walther
Hi Omkar, sorry for the late reply. This sounds like a serious issue. It looks like some of the RocksDB data is corrupt. Are you sure this is not a problem of you storage layer? Otherwise I would investigate whether the serializers work correctly. Maybe Beam did put a corrupt data into Flink

Re: Problem restirng state

2021-02-01 Thread Timo Walther
Hi Shridhar, the exception indicates that something is wrong with the object serialization. Kryo is unable to serialize the given object. It might help to 1) register a custom Kryo serializer in the ExecutionConfig or 2 ) pass dedicated type information using the types from org.apache.flink

Re: Is Flink able to parse strings into dynamic JSON?

2021-02-01 Thread Chesnay Schepler
Flink needs to know upfront what kind of types it deals with to setup the serialization stack between operators. As such, generally speaking, you will have to use some generic container for transmitting data (e.g., a String or a Jackson ObjectNode) and either work on them directly or map them

Re: Is Flink able to parse strings into dynamic JSON?

2021-02-01 Thread Timo Walther
Hi Devin, Flink supports arbitrary data types. You can simply read the JSON object as a big string first and process the individual event types in a UDF using e.g. the Jackson library. Are you using SQL or DataStream API? An alternative is to set the "fail-on-missing-field" flag to false. Th

Re: LEAD/LAG functions

2021-02-01 Thread Timo Walther
Hi Patrick, I could imagine that LEAD/LAG are translated into RANK/ROW_NUMBER operations that are not supported in this context. But I will loop in @Jark who might know more about the limitaitons here. Regards, Timo On 29.01.21 17:37, Patrick Angeles wrote: Another (hopefully newbie) questi

Re: [Stateful Functions] Problems with Protobuf Versions

2021-02-01 Thread Igal Shilman
Adding user@flink (was accidentally omitted previously) On Fri, Jan 29, 2021 at 5:18 PM Igal Shilman wrote: > Hi Jan, > > Glad to hear that 3.71 and 3.3.0 works for you. You can still include > protobuf in your project, but the version needs to be compatible with > what you will find at runtime.

Re: Integration with Apache AirFlow

2021-02-01 Thread Flavio Pompermaier
Any advice here? On Wed, Jan 27, 2021 at 9:49 PM Flavio Pompermaier wrote: > Hello everybody, > is there any suggested way/pointer to schedule Flink jobs using Apache > AirFlow? > What I'd like to achieve is the submission (using the REST API of AirFlow) > of 2 jobs, where the second one can be

Re: question on checkpointing

2021-02-01 Thread Chesnay Schepler
1) An operator that just blocks for a long time (for example, because it does a synchronous call to some external service) can indeed cause a checkpoint timeout. 2) What kind of effects are you worried about? On 1/28/2021 8:05 PM, Marco Villalobos wrote: Is it possible that checkpointing times

Re: Cannot access state from a empty taskmanager - using kubernetes

2021-02-01 Thread Chesnay Schepler
Yes, it does sound quite a lot like FLINK-10225. I assume it is only happening for some task executors and not all of them? Unfortunately I don't think this issue will be fixed anytime soon. On 1/28/2021 12:59 PM, Daniel Peled wrote: Hi, We have followed the instructions in the following link

Re: Question about setNestedFileEnumeration()

2021-02-01 Thread Piotr Nowojski
Hi Billy, Could you maybe share some minimal code reproducing the problem? For example I would suggest to start with reading from local files with some trivial application. Best Piotrek pt., 22 sty 2021 o 00:21 Billy Bain napisał(a): > I have a Streaming process where new directories are added

Re: Question

2021-02-01 Thread Chesnay Schepler
Could you expand a bit on what you mean? Are you referring to /savepoints/? On 1/28/2021 3:24 PM, Abu Bakar Siddiqur Rahman Rocky wrote: Hi, Is there any library to use and remember the apache flink snapshot? Thank you -- Regards, Abu Bakar Siddiqur Rahman

Re: [Stateful Functions] Problems with Protobuf Versions

2021-02-01 Thread Tzu-Li (Gordon) Tai
Hi, This hints an incompatible Protobuf generated class by the protoc compiler, and the runtime dependency used by the code. Could you try to make sure the `protoc` compiler version matches the Protobuf version in your code? Cheers, Gordon On Fri, Jan 29, 2021 at 6:07 AM Jan Brusch wrote: > H

Re: Comment in source code of CoGroupedStreams

2021-02-01 Thread Piotr Nowojski
Hi Sudharsan, Sorry for maybe a bit late response, but as far as I can tell, this comment refers to this piece of code: public void apply(KEY key, W window, Iterable> values, Collector out) throws Exception { List oneValues = new ArrayList<>(); Lis

Re: stopping with save points

2021-02-01 Thread Chesnay Schepler
It would be good if you could take a look at the Job-/TaskManager logs to see whether the operation is making progress or whether an exception has occurred. Does the job stop eventually? It could be that draining the jobs just takes longer than the client timeout allows by default (60 seconds)

Re: Flink 1.11.2 test cases fail with Scala 2.12.12

2021-02-01 Thread Chesnay Schepler
Scala 2.12.8 broke binary compatibility with 2.12.7 which Flink currently is compiled against. As a result you must either stay at 2.12.7, or recompile Flink yourself against 2.12.12 as shown here

Re: Unknown call expression: avg(amount) when use distinct() in Flink Thanks~!

2021-02-01 Thread Timo Walther
Hi, sorry I forgot to further investigate this issue. It seems the last refactoring of the code base caused this documented feature to break. I opened an issue for it: https://issues.apache.org/jira/browse/FLINK-21225 For now, I would suggest to use SQL for the same behavior. I hope someone

Re: Flink SQL and checkpoints and savepoints

2021-02-01 Thread Timo Walther
I agree with Max. Within the same Flink release you can perform savepoints and sometimes also change parts of the query. But the latter depends on a case-by-case basis and needs to be tested. Regards, Timo On 30.01.21 11:43, Maximilian Michels wrote: It is true that there are no strict upgr

Re: Newbie question: Machine Learning Library of Apache Flink

2021-02-01 Thread Timo Walther
Hi, it is true that there is no dedicated machine learning library for Flink. Flink is a general data processing framework. It allows to embedded any available algorithm library within user-defined functions. Flink's focus is on stream processing. There are not many dedicated stream processi

Re: [ANNOUNCE] Apache Flink 1.10.3 released

2021-02-01 Thread Matthias Pohl
Yes, thanks for taking over the release! Best, Matthias On Mon, Feb 1, 2021 at 5:04 AM Zhu Zhu wrote: > Thanks Xintong for being the release manager and everyone who helped with > the release! > > Cheers, > Zhu > > Dian Fu 于2021年1月29日周五 下午5:56写道: > >> Thanks Xintong for driving this release! >