Re: Question about Scala Case Class and List in Flink

2020-01-15 Thread Timo Walther
Hi, Reg. 1: Scala case classes are supported in the Scala specific version of the DataStream API. If you are using case classes in the Java API you will get the INFO below because the Java API uses pure reflection extraction for analyzing POJOs. The Scala API tries to analyze Scala classes

Filter with large key set

2020-01-15 Thread Jin Yi
Hi there, I have the following usecase: a key set say [A,B,C,] with around 10M entries, the type of the entries can be one of the types in BasicTypeInfo, e.g. String, Long, Integer etc... and each message looks like below: message: { header: A body: {} } I would like to use Flink to fi

Re: Flink Sql Join, how to clear the sql join state?

2020-01-15 Thread Benchao Li
Hi LakeShen, Maybe "Idle State Retention Time"[1] may help in your case. [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/query_configuration.html#idle-state-retention-time LakeShen 于2020年1月16日周四 上午10:15写道: > Hi community,now I am use flink sql inner join in my co

Re: How to handle startup for mandatory config parameters?

2020-01-15 Thread Yang Wang
Hi John, Most of the config options will have default values. However, you still need to specify some required fields. For example, the taskmanager resource related options. If you do not specify anyone, the exception will be thrown on the client side like following. Exception in thread "main" or

Flink Sql Join, how to clear the sql join state?

2020-01-15 Thread LakeShen
Hi community,now I am use flink sql inner join in my code,I saw the flink document, the flink sql inner join will keep both sides of the join input in Flink’s state forever. As result , the hdfs files size are so big , is there any way to clear the sql join state? Thanks to your reply.

Re: When I use flink 1.9.1 and produce data to Kafka 1.1.1, The streamTask checkpoint error .

2020-01-15 Thread Yun Tang
Hi The root cause is checkpoint error due to fail to send data to kafka during 'preCommit'. The right solution is avoid to send data to kafka unsuccessfully which might be scope of Kafka. If you cannot ensure the status of kafka with its client and no request for exactly once, you can pass Fli

Re: PubSub source throwing grpc errors

2020-01-15 Thread Richard Deurwaarder
Hi Itamar and Till, Yes this actually looks a lot worse than it is, fortunately. >From what I understand this means: something has not released or properly shutdown an grpc client and the library likes to inform you about this. I would definartly expect to see this if the job crashes at the 'wron

How to handle startup for mandatory config parameters?

2020-01-15 Thread John Smith
Hi, so I have no problem reading config from resources files or anything like that... But my question is around how do we handle mandatory fields? 1- If a mandatory field is missing during startup... Do we just "log" it and do System.exit()? 2- If we do log it where does the log end up, the task

Re: PubSub source throwing grpc errors

2020-01-15 Thread Till Rohrmann
Hi Itamar, could you share a bit more details about the serialization problem. Which class is not serializable and where does it originate from? Cheers, Till On Tue, Jan 14, 2020 at 9:47 PM Itamar Syn-Hershko < ita...@bigdataboutique.com> wrote: > Thanks! > > I was able to track this down. Esse

Re: Flink task node shut it self off.

2020-01-15 Thread John Smith
Hi, so far it seems stable. On Mon, 6 Jan 2020 at 14:16, John Smith wrote: > So I increased all the jobs to 1 minute checkpoint... I let you know how > it goes... Or of need to rethink gluster lol > > On Sat., Jan. 4, 2020, 9:27 p.m. John Smith, > wrote: > >> It seems to have happened again...

Re: When I use flink 1.9.1 and produce data to Kafka 1.1.1, The streamTask checkpoint error .

2020-01-15 Thread jose farfan
Hi I have the same issue. BR Jose On Thu, 9 Jan 2020 at 10:28, ouywl wrote: > Hi all: > When I use flink 1.9.1 and produce data to Kafka 1.1.1. the error was > happen as* log-1,code is::* > > input.addSink( > new FlinkKafkaProducer( > parameterTool.getRequired("bootstra

Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset

2020-01-15 Thread ??????
Hi all,  the related issue:https://issues.apache.org/jira/browse/FLINK-15573   As the title tells, what I do want to do is let the `FieldRefrence` use Unicode as its default charset (or maybe as an optional  charset which can be configured). According to the  `PlannerExpressionParserImpl`, cur

RE: Table API: Joining on Tables of Complex Types

2020-01-15 Thread Hailu, Andreas
Dawid, this approach looks promising. I'm able to flatten out my Avro records into Rows and run simple queries atop of them. I've got a question - when I register my Rows as a table, I see the following log providing a warning: 2020-01-14 17:16:43,083 [main] INFO TypeExtractor - class org.apac

Re: Custom File Sink using EventTime and defined custom file name for parquet file

2020-01-15 Thread Kostas Kloudas
Oops, sorry for not sending the reply to everyone and thanks David for reposting it here. Great to hear that you solved your issue! Kostas On Wed, Jan 15, 2020 at 1:57 PM David Magalhães wrote: > > Sorry, I've only saw the replies today. > > Regarding my previous email, > >> Still, there is so

Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset

2020-01-15 Thread ??????
Hi all,  the related issue:https://issues.apache.org/jira/browse/FLINK-15573   As the title tells, what I do want to do is let the `FieldRefrence` use Unicode as its default charset (or maybe as an optional  charset which can be configured). According to the  `PlannerExpressionParserImpl`, cur

Re: Custom File Sink using EventTime and defined custom file name for parquet file

2020-01-15 Thread David Magalhães
Sorry, I've only saw the replies today. Regarding my previous email, Still, there is something missing in this solution to close a window for > with a giving timeout, so it can write into the sink the last events if no > more events are sent. I've fixed this using a custom trigger, val flag =

Re: Slots Leak Observed when

2020-01-15 Thread Till Rohrmann
Hi, have you tried one of the latest Flink versions to see whether the problem still exists? I'm asking because there are some improvements which allow for slot reconciliation between the TaskManager and the JobMaster [1]. As a side note, the community is no longer supporting Flink 1.6.x. For fur

Re: How Flink read files from local filesystem

2020-01-15 Thread Tillman Peng
You can use env.readTextFile(path) which accepts path to a directory and reads all files in that directory producing record for each line. on 2020/1/15 17:58, Soheil Pourbafrani wrote: Suppose we have a Flink single node cluster with multiple slots and some input files exist in local file syste

Re: [DISCUSS] decentralized scheduling strategy is needed

2020-01-15 Thread HuWeihua
Hi, Andrey Thanks for your response. I have checked this Jira ticket and I think it can work in standalone mode which TaskManager has been started before scheduling tasks. But we are currently running flink on yarn in per-job cluster mode. I noticed that this issue has already been raised. I wi

Re: [DISCUSS] decentralized scheduling strategy is needed

2020-01-15 Thread Chesnay Schepler
This is a known issue that's will be fixed in 1.9.2/1.10.0; see https://issues.apache.org/jira/browse/FLINK-12122 . On 15/01/2020 10:07, HuWeihua wrote: Hi, All We encountered some problems during the upgrade from Flink 1.5 to Flink 1.9. Flink's scheduling strategy has changed. Flink 1.9 prefe

How Flink read files from local filesystem

2020-01-15 Thread Soheil Pourbafrani
Hi, Suppose we have a Flink single node cluster with multiple slots and some input files exist in local file system. In this case where we have no distributed file system to dedicate each file's block to taskmanagers, how Flink will read the file? Do all the task managers will open the file separa

Re: StreamingFileSink doesn't close multipart uploads to s3?

2020-01-15 Thread Kostas Kloudas
Hi Ken, Jingsong and Li, Sorry for the late reply. As Jingsong pointed out, upon calling close() the StreamingFileSink does not commit the in-progress/pending files. The reason for this is that the close() method of any UDF including sink functions is called on both normal termination and termina