Re: Flink 1.11 test Parquet sink

2020-07-15 Thread Flavio Pompermaier
If I use tableEnv.fromValues(Row.of(1L, "Hello"),...) things works if I change also the query to "INSERT INTO `out` SELECT CAST(f0 AS STRING) ,f1 FROM ParquetDataset". If there is still a bug fill a proper JIRA ticket with the exact description of the problem.. Just to conclude this thread there a

Communicating with my operators

2020-07-15 Thread Tom Wells
Hi Everyone I'm looking for some advice on designing my operators (which unsurprisingly tend to take the form of SourceFunctions, ProcessFunctions or SinkFunctions) to allow them to be "dynamically configured" while running. By way of example, I have a SourceFunction which collects the names of v

Re: Flink 1.11 test Parquet sink

2020-07-15 Thread Dawid Wysakowicz
Hi, Unfortunately this is a bug. The problem is in CustomizedConvertRule#convertCast as it drops the requested nullability. It was fixed in master as part of FLINK-13784[1]. Therefore the example works on master. Could you create a jira issue for 1.11 version? We could backport the corresponding

Re: How to debug window states

2020-07-15 Thread Paul Lam
It turns out to be a bug of StateTTL [1]. But I’m still interested in debugging window states. Any suggestions are appreciated. Thanks! 1. https://issues.apache.org/jira/browse/FLINK-16581 Best, Paul Lam > 2020年7月15日 13:13,Paul Lam 写道: >

Re: RestartStrategy failure count when losing a Task Manager

2020-07-15 Thread Chesnay Schepler
1) A restart in one region only increments the count by 1, independent of how many tasks from that region fail at the same time. If tasks from different regions fail at the same time, then the bound is incremented by the number of affected regions. 2) I would consider what failure rate is acce

Re: ERROR submmiting a flink job

2020-07-15 Thread Yun Tang
Hi Aissa The reason why the job exits is due to "Recovery is suppressed by NoRestartBackoffTimeStrategy" and this is because Flink use "no restart" strategy when checkpoint is not enabled [1]. That is to say, you should better look at why the job failed at the 1st time, once the job failed and

Re: Communicating with my operators

2020-07-15 Thread Chesnay Schepler
Using an S3 bucket containing the configuration is the way to go. 1) web sockets, or more generally all approaches where you connect to the source The JobGraph won't help you; it doesn't contain the information on where tasks are deployed to at runtime. It is just an abstract representation

Pravega connector cannot recover from the checkpoint due to "Failure to finalize checkpoint"

2020-07-15 Thread B.Zhou
Hi community, To give some background, https://github.com/pravega/flink-connectors is a Pravega connector for Flink. The ReaderCheckpointHook[1] class uses the Flink `MasterTriggerRestoreHook` interface to trigger the Pravega checkpoint during Flink checkpoints to make sure the data recovery. We

pyFlink UDTF function registration

2020-07-15 Thread Manas Kale
Hi, I am trying to use a UserDefined Table Function to split up some data as follows: from pyflink.table.udf import udtf @udtf(input_types=DataTypes.STRING(), result_types= [DataTypes.STRING(), DataTypes.DOUBLE()]) def split_feature_values(data_string): json_data = loads(data_string) for

Re: Flink 1.11 test Parquet sink

2020-07-15 Thread Flavio Pompermaier
I've just opened a ticket on JIRA: https://issues.apache.org/jira/browse/FLINK-18608 On Wed, Jul 15, 2020 at 10:10 AM Dawid Wysakowicz wrote: > Hi, > > Unfortunately this is a bug. > > The problem is in CustomizedConvertRule#convertCast as it drops the > requested nullability. It was fixed in ma

Re: pyFlink UDTF function registration

2020-07-15 Thread Xingbo Huang
Hi Manas, You need to join with the python udtf function. You can try the following sql: ddl_populate_temporary_table = f""" INSERT INTO {TEMPORARY_TABLE} SELECT * FROM ( SELECT monitorId, featureName, featureData, time_st FROM {INPUT_TABLE}, LATERAL TABLE(split(data)) as T(feature

Parquet format in Flink 1.11

2020-07-15 Thread Flavio Pompermaier
Hi to all, in my current code I use the legacy Hadoop Output format to write my Parquet files. I wanted to use the new Parquet format of Flink 1.11 but I can't find how to migrate the following properties: ParquetOutputFormat.setBlockSize(job, parquetBlockSize); ParquetOutputFormat.setEnableDictio

Re: Parquet format in Flink 1.11

2020-07-15 Thread godfrey he
hi Flavio, Parquet format supports configuration from ParquetOutputFormat . please refer to [1] for details [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/

Re: Parquet format in Flink 1.11

2020-07-15 Thread Flavio Pompermaier
Ok, thanks Godfrey. On Wed, Jul 15, 2020 at 3:03 PM godfrey he wrote: > hi Flavio, > > Parquet format supports configuration from ParquetOutputFormat > . > please > refer to [

Re: map JSON to scala case class & off-heap optimization

2020-07-15 Thread Aljoscha Krettek
On 11.07.20 10:31, Georg Heiler wrote: 1) similarly to spark the Table API works on some optimized binary representation 2) this is only available in the SQL way of interaction - there is no programmatic API yes it's available from SQL, but also the Table API, which is a programmatic declarati

Is there a way to access the number of 'keys' that have been used for partioning?

2020-07-15 Thread orionemail
Hi, I need to query the number of keys that a stream has been split by, is there a way to do this? Thanks, O

Pyflink sink rowtime field

2020-07-15 Thread Jesse Lord
I am trying to sink the rowtime field in pyflink 1.10. I get the following error For the source schema I use .field("rowtime", DataTypes.TIMESTAMP(2)) .rowtime( Rowtime() .timestamps_from_field("timestamp") .watermarks_periodic_ascending() )

Status of a job when a kafka source dies

2020-07-15 Thread Nick Bendtner
Hi guys, I want to know what is the default behavior of Kafka source when a kafka cluster goes down during streaming. Will the job status go to failing or is the exception caught and there is a back off before the source tries to poll for more events ? Best, Nick.

Re: Is there a way to access the number of 'keys' that have been used for partioning?

2020-07-15 Thread Chesnay Schepler
This information is not readily available; in fact Flink itself doesn't know how many keys there are at any point. You'd have to calculate it yourself. On 15/07/2020 17:11, orionemail wrote: Hi, I need to query the number of keys that a stream has been split by, is there a way to do this? T

Print table content in Flink 1.11

2020-07-15 Thread Flavio Pompermaier
Hi to all, I'm trying to read and print out the content of my parquet directory with Flink 1.11 (using the bridge API). However Flink complains that there is no topology to execute..what am I doing wrong? The exception is: java.lang.IllegalStateException: No operators defined in streaming topology

flink app crashed

2020-07-15 Thread Rainie Li
Hi All, I am new to Flink, any idea why flink app's Job Manager stuck, here is bottom part from the Job Manager log. Any suggestion will be appreciated. 2020-07-15 16:49:52,749 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.Sta

Performance test Flink vs Storm

2020-07-15 Thread Prasanna kumar
Hi, We are testing flink and storm for our streaming pipelines on various features. In terms of Latency,i see the flink comes up short on storm even if more CPU is given to it. Will Explain in detail. *Machine*. t2.large 4 core 16 gb. is used for Used for flink task manager and storm supervisor

How to write junit testcases for KeyedBroadCastProcess Function

2020-07-15 Thread bujjirahul45
Hi, I am new to flink i am trying write junit test cases to test KeyedBroadCastProcessFunction. Below is my code ,i am currently calling the getDataStreamOutput method in TestUtils class and passing inputdata and patternrules to method once the input data is evaluated against list of pattern rules

Re: flink app crashed

2020-07-15 Thread Jesse Lord
Hi Rainie, I am relatively new to flink, but I suspect that your error is somewhere else in the log. I have found most of my problems by doing a search for the word “error” or “exception”. Since all of these log lines are at the info level, they are probably not highlighting any real issues. If

Re: flink app crashed

2020-07-15 Thread Rainie Li
Thank you, Jesse. Here are more log info: 2020-07-15 18:19:36,456 INFO org.apache.flink.client.cli.CliFrontend - 2020-07-15 18:19:36,460 INFO org.apache.flink.configuration.GlobalConfiguration

Re: Flink Pojo Serialization for Map Values

2020-07-15 Thread KristoffSC
Hi, Any ideas about that one? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: flink app crashed

2020-07-15 Thread Rainie Li
These are the console log after launch the app: 2020-07-15 19:25:28,507 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - YARN application has been deployed successfully. Starting execution of program ---Environment Variables- DOCKER_CONFIG=/etc/.docker FLINK_BIN_DIR=/u

Re: Flink Pojo Serialization for Map Values

2020-07-15 Thread Theo Diefenthal
Hi Krzysztof, Your problems arise due to Java type erasure. If you have DataPoint with Map, all Flinks type system will see is a Map, i.e. Map. So in the first case, with DataPoint having an explicit member of type "BadPojo", Flink will deduce "DataPoint" to be a PojoType with two fields,

Using md5 hash while sinking files to s3

2020-07-15 Thread nikita Balakrishnan
Hello team, I’m developing a system where we are trying to sink to an immutable s3 bucket. This bucket has server side encryption set as KMS. The DataStream sink works perfectly fine when I don’t use the immutable bucket but when I use an immutable bucket, I get exceptions regarding multipart uplo

Re: Print table content in Flink 1.11

2020-07-15 Thread Leonard Xu
Hi, Flavio > 在 2020年7月16日,00:19,Flavio Pompermaier 写道: > > final JobExecutionResult jobRes = tableEnv.execute("test-job"); > In Flink 1.11, once a Table has transformed to DataStream, only StreamExecutionEnvironment can execute the DataStream program, please use env.execute(“test-job”) in t

Re: Print table content in Flink 1.11

2020-07-15 Thread Kurt Young
Hi Flavio, In 1.11 we have provided an easier way to print table content, after you got the `table` object, all you need to to is calling `table.execute().print();` Best, Kurt On Thu, Jul 16, 2020 at 9:35 AM Leonard Xu wrote: > Hi, Flavio > > > 在 2020年7月16日,00:19,Flavio Pompermaier 写道: > > f

HELP!!! Flink1.10 sql insert into HBase, error like validateSchemaAndApplyImplicitCast

2020-07-15 Thread Jim Chen
Hi, I use flink1.10.1 sql to connect hbase1.4.3. When inset into hbase, report an error like validateSchemaAndApplyImplicitCast. Means that the Query Schema and Sink Schema are inconsistent. Mainly Row (EXPR$0) in Query Schema, which are all expressions. Sink Schema is Row(device_id). I don't k

Re: Print table content in Flink 1.11

2020-07-15 Thread Jingsong Li
Hi Flavio, For print: - As Kurt said, you can use `table.execute().print();`, records will be collected to the client (NOTE it is client) and print to client console. - But if you want print records in runtime tasks like DataStream.print, you can use [1] [1] https://ci.apache.org/projects/flink/f

Re: HELP!!! Flink1.10 sql insert into HBase, error like validateSchemaAndApplyImplicitCast

2020-07-15 Thread Leonard Xu
Hi, Jim Could you post error message in text that contains the entire schema of query and sink? I doubt there are some fields type were mismatched. Best, Leonard Xu > 在 2020年7月16日,10:29,Jim Chen 写道: > > Hi, > I use flink1.10.1 sql to connect hbase1.4.3. When inset into hbase, report > an

ElasticSearch_Sink

2020-07-15 Thread C DINESH
Hello All, Can we implement 2 Phase Commit Protocol for elastic search sink. Will there be any limitations? Thanks in advance. Warm regards, Dinesh.

Re: ElasticSearch_Sink

2020-07-15 Thread Yun Gao
Hi Dinesh, As far as I know, to implement the 2 phase commit protocol for one external system, I think the external system is required to provide some kind of transactions that could stay across sessions. With such a transaction mechansim then we could first start a new transaction and write