Re: JDBC Table and parameters provider

2020-04-22 Thread Jingsong Li
Hi, Now in JDBCTableSource.getInputFormat, It's written explicitly: WHERE XXX BETWEEN ? AND ?. So we must use `NumericBetweenParametersProvider`. I don't think this is a good and long-term solution. I think we should support filter push-down for JDBCTableSource, so in this way, we can write the fi

Re: How to use OpenTSDB as Source?

2020-04-22 Thread Marta Paes Moreira
Hi, Lucas. There was a lot of refactoring in the Table API / SQL in the last release, so the user experience is not ideal at the moment — sorry for that. You can try using the DDL syntax to create your table, as shown in [1,2]. I'm CC'ing Timo and Jark, who should be able to help you further. Ma

Re: JDBC Table and parameters provider

2020-04-22 Thread Flavio Pompermaier
Maybe I am wrong but support pushdown for JDBC is one thing (that is probably useful) while parameters providers are required if you want to parallelize the fetch of the data. You are not mandated to use NumericBetweenParametersProvider, you can use the ParametersProvider you prefer, depending on t

Re: Re: Blink SQL java.lang.ArrayIndexOutOfBoundsException

2020-04-22 Thread Jingsong Li
Hi, Just like Jark said, it may be FLINK-13702[1]. Has been fixed in 1.9.2 and later versions. > Can it be a thread-safe problem or something else? Yes, it is a thread-safe problem with lazy materialization. [1]https://issues.apache.org/jira/browse/FLINK-13702 Best, Jingsong Lee On Tue, Apr 2

Re: Latency tracking together with broadcast state can cause job failure

2020-04-22 Thread Yun Tang
Hi Lasse After debug locally, this should be a bug in Flink (even the latest version). However, the bug should be caused in network stack with which I am not very familiar and not so easy to find root cause directly. After discussion with our network guys in Flink, we decide to first create FLI

Re: JDBC Table and parameters provider

2020-04-22 Thread Jingsong Li
Hi, You are right about the lower and upper, it is a must to parallelize the fetch of the data. And filter pushdown is used to filter more data at JDBC server. Yes, we can provide "scan.query.statement" and "scan.parameter.values.provider.class" for jdbc connector. But maybe we need be careful ab

Re: Re: Blink SQL java.lang.ArrayIndexOutOfBoundsException

2020-04-22 Thread Jingsong Li
Hi, Sorry for the mistake, [1] is related, but this bug has been fixed totally in [2], so the safe version should be 1.9.3+ and 1.10.1+, so there is no safe released version now. 1.10.1 will been released very soon. [1]https://issues.apache.org/jira/browse/FLINK-13702 [2]https://issues.apache.or

Re: JDBC Table and parameters provider

2020-04-22 Thread Flavio Pompermaier
Because in my use case the parallelism was not based on a range on keys/numbers but on a range of dates, so I needed a custom Parameter Provider. For what regards pushdown I don't know how Flink/Blink currently works..for example, let's say I have a Postgres catalog containing 2 tables (public.A an

Re: How to use OpenTSDB as Source?

2020-04-22 Thread Som Lima
For sake of brevity the code example does not show the complete code for setting up the environment using EnvironmentSettings class EnvironmentSettings settings = EnvironmentSettings.newInstance()... As you can see comparatively the same protocol is not followed when showing setting up the envi

Re: JDBC Table and parameters provider

2020-04-22 Thread Jingsong Li
Thanks for the explanation. You can create JIRA for this. For "SELECT public.A.x, public.B.y FROM public.A JOIN public.B on public.A.pk = public.B.fk . " We can not use a simple "scan.query.statement", because in JDBCTableSource, it also deal with project

Flink 1.10.0 stop command

2020-04-22 Thread seeksst
Hi, When i test 1.10.0, i found i must to set savepoint path otherwise i can’t stop the job. I confuse about this, beacuse as i know, savepoint offen large than checkpoint, so i usually resume job from checkpoint. Another problem is sometimes job throw exception and i can’t trigger a savepoin

Re: JDBC Table and parameters provider

2020-04-22 Thread Flavio Pompermaier
Sorry Jingsong but I didn't understand your reply..Can you better explain the following sentences please? Probably I miss some Table API background here (I used only JDBOutputFormat). "We can not use a simple "scan.query.statement", because in JDBCTableSource, it also deal with project pushdown too

Re: Latency tracking together with broadcast state can cause job failure

2020-04-22 Thread Lasse Nedergaard
Hi Yun Thanks for looking into it and forwarded it to the right place. Med venlig hilsen / Best regards Lasse Nedergaard > Den 22. apr. 2020 kl. 11.06 skrev Yun Tang : > >  > Hi Lasse > > After debug locally, this should be a bug in Flink (even the latest version). > However, the bug shou

Re: JDBC Table and parameters provider

2020-04-22 Thread Jingsong Li
Hi, You can configure table name for JDBC source. So this table name can be a rich sql: "(SELECT public.A.x, public.B.y FROM public.A JOIN public.B on public.A.pk = public.B.fk )" So the final scan query statement will be: "select ... from (SELECT public.

Re: How to use OpenTSDB as Source?

2020-04-22 Thread Jark Wu
Hi Som, You can have a look at ths documentation: https://ci.apache.org/projects/flink/flink-docs-master/dev/table/common.html#create-a-tableenvironment It describe how to create differnet TableEnvironments based on EnvironmentSettings. EnvironmentSettings is a setting to setup a table's environme

About ??distribute the function jar I submitted to all taskManager?? question

2020-04-22 Thread ??????
I want to make UDTF into a jar package??Then load the jar in the Main method of Job through dynamic loading and get the UDTF class. But in this way,flink does not automatically distribute Jar to tashManager??So it caused an error?? I find that FlinkClient provides the -C Bringing -C with t

Re: Change to StreamingFileSink in Flink 1.10

2020-04-22 Thread Averell
Thanks @Seth Wiesman and all. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: How to use OpenTSDB as Source?

2020-04-22 Thread Som Lima
Thanks for the link. On Wed, 22 Apr 2020, 12:19 Jark Wu, wrote: > Hi Som, > > You can have a look at ths documentation: > https://ci.apache.org/projects/flink/flink-docs-master/dev/table/common.html#create-a-tableenvironment > It describe how to create differnet TableEnvironments based > on En

Re: About “distribute the function jar I submitted to all taskManager” question

2020-04-22 Thread Yang Wang
I think you could use Flink distributed cache to make the files available on all TaskManagers. For example, *env.registerCachedFile(cacheFilePath, "cacheFile", false);* and then, using the following code to get the registered file in the operator. *getRuntimeContext().getDistributedCache().getF

Using Stateful Functions within a Flink pipeline

2020-04-22 Thread Annemarie Burger
I was wondering if it is possible to use a Stateful Function within a Flink pipeline. I know they work with different API's, so I was wondering if it is possible to have a DataStream as ingress for a Stateful Function. Some context: I'm working on a streaming graph analytics system, and want to sa

Re: Using Stateful Functions within a Flink pipeline

2020-04-22 Thread Oytun Tez
Hi Annemarie, Unfortunately this is not possible at the moment, but DataStream as in/egress is in the plans as much as I know. -- [image: MotaWord] Oytun Tez M O T A W O R D | CTO & Co-Founder oy...@motaword.com On Wed, Apr 22, 2020 at 10:26 AM Annemar

Re: JDBC Table and parameters provider

2020-04-22 Thread Jingsong Li
> Specify "query" and "provider" Yes, your proposal looks reasonable to me. Key can be "scan.***" like in [1]. > specify parameters Maybe we need add something like "scan.parametervalues.provider.type", it can be "bound, specify, custom": - when bound, using old partitionLowerBound and partitionUp

Re: Using Stateful Functions within a Flink pipeline

2020-04-22 Thread Igal Shilman
Hi Annemarie, There are plans to make stateful functions more easily embeddable within a Flink Job, perhaps skipping ingress/egress routing abstracting all together and basically exposing the core Flink job that is the heart of stateful functions. Although these plans are not concrete yet I believe

Re: Using Stateful Functions within a Flink pipeline

2020-04-22 Thread Oytun Tez
@Igal, this sounds more comprehensive (better) than just opening DataStreams: "basically exposing the core Flink job that is the heart of stateful functions. " Great! -- [image: MotaWord] Oytun Tez M O T A W O R D | CTO & Co-Founder oy...@motaword.com

Running 2 instances of LocalExecutionEnvironments with same consumer group.id

2020-04-22 Thread Suraj Puvvada
Hello, I have two JVMs that run LocalExecutionEnvorinments each using the same consumer group.id. i noticed that the consumers in each instance has all partitions assigned. I was expecting that the partitions will be split across consumers across the two JVMs Any help on what might be happening

Re: Two questions about Async

2020-04-22 Thread Gary Yao
> Bytes Sent but Records Sent is always 0 Sounds like a bug. However, I am unable to reproduce this using the AsyncIOExample [1]. Can you provide a minimal working example? > Is there an Async Sink? Or do I just rewrite my Sink as an AsyncFunction followed by a dummy sink? You will have to impl

Re: Changing number of partitions for a topic

2020-04-22 Thread Suraj Puvvada
Thanks, I'll check it out. On Mon, Apr 20, 2020 at 6:23 PM Benchao Li wrote: > Hi Suraj, > > There is a config option[1] to enable partition discovery, which is > disabled by default. > The community discussed to enable it by default[2], but only aims to the > new Source API. > > [1] > https://c

Re: Flink 1.10.0 stop command

2020-04-22 Thread Yun Tang
Hi I think you could still use ./bin/flink cancel to cancel the job. What is the exception thrown? Best Yun Tang From: seeksst Sent: Wednesday, April 22, 2020 18:17 To: user Subject: Flink 1.10.0 stop command Hi, When i test 1.10.0, i found i must to se

Re: Flink 1.10.0 stop command

2020-04-22 Thread tison
'flink cancel' broken because of https://issues.apache.org/jira/browse/FLINK-16626 Best, tison. Yun Tang 于2020年4月23日周四 上午1:18写道: > Hi > > I think you could still use ./bin/flink cancel to cancel the job. > What is the exception thrown? > > Best > Yun Tang > -- > *F

Re: Flink 1.10.0 stop command

2020-04-22 Thread tison
To be precise, the cancel command would succeed on cluster side but the response *might* lost so that client throws with TimeoutException. If it is the case, this is the root which will be fixed in 1.10.1. Best, tison. tison 于2020年4月23日周四 上午1:20写道: > 'flink cancel' broken because of > https://

Re: Flink incremental checkpointing - how long does data is kept in the share folder

2020-04-22 Thread Yun Tang
Hi Shachar The basic rule to remove old data: 1. No two running Flink jobs use the same checkpoint directory (at the job-id level). 2. Files are not recorded in the latest checkpoint metadata are the candidate to remove. 3. If the checkpoint directory is still been written by Flink job

RE: History Server Not Showing Any Jobs - File Not Found?

2020-04-22 Thread Hailu, Andreas
Hi Chesnay, thanks for responding. We're using Flink 1.9.1. I enabled DEBUG level logging and this is something relevant I see: 2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - Connecting to datanode 10.79.252.101:1019 2020-04-22 13:25:52,567 [Flink-Hi

Re: Running 2 instances of LocalExecutionEnvironments with same consumer group.id

2020-04-22 Thread Gary Yao
Hi Suraj, This question has been asked before: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-kafka-consumers-don-t-honor-group-id-td21054.html Best, Gary On Wed, Apr 22, 2020 at 6:08 PM Suraj Puvvada wrote: > > Hello, > > I have two JVMs that run LocalExecution

Re: Running 2 instances of LocalExecutionEnvironments with same consumer group.id

2020-04-22 Thread Som Lima
I followed the link , may be same Suraj is advertising DataBricks webinar going on live right now. On Wed, 22 Apr 2020, 18:38 Gary Yao, wrote: > Hi Suraj, > > This question has been asked before: > > > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-kafka-consumers-don

flink couchbase sink

2020-04-22 Thread 令狐月弦
Greetings, We have been using couchbase for large scale data storage. I saw there is a flink sink for couchbase: https://github.com/isopropylcyanide/flink-couchbase-data-starter But it was for flink 1.6.0, and not got updated for some months. Do you know if there is another available flink sink

Re: RocksDB default logging configuration

2020-04-22 Thread Bajaj, Abhinav
Bumping this one again to catch some attention. From: "Bajaj, Abhinav" Date: Monday, April 20, 2020 at 3:23 PM To: "user@flink.apache.org" Subject: RocksDB default logging configuration Hi, Some of our teams ran into the disk space issues because of RocksDB default logging configuration - FL

Re: Flink 1.10.0 stop command

2020-04-22 Thread seeksst
Thanks a lot. I’m glad to hear that and looking forward to 1.10.1 it there more plan about stop command? it will replace cancel in future. Is the state.savepoints.dir required at the end? 原始邮件 发件人:tisonwander4...@gmail.com 收件人:Yun tangmyas...@live.com 抄送:seeksstseek...@163.com; useru...@flink.ap

batch range sort support

2020-04-22 Thread Benchao Li
Hi, Currently the sort operator in blink planner is global, which has bottleneck if we sort a lot of data. And I found 'table.exec.range-sort.enabled' config in BatchExecSortRule, which makes me very exciting. After enabling this config, I found that it's not implemented completely now. This conf

Re: batch range sort support

2020-04-22 Thread Jingsong Li
Hi, Benchao, Glad to see your requirement about range partition. I have a branch to support range partition: [1] Can you describe your scene in more detail? What sink did you use for your jobs? A simple and complete business scenario? This can help the community judge the importance of the range

Re: Flink

2020-04-22 Thread Navneeth Krishnan
Thanks a lot Timo. I will take a look at it. But does flink automatically scale up and down at this point with native integration? Thanks On Tue, Apr 14, 2020 at 11:27 PM Timo Walther wrote: > Hi Navneeth, > > it might be also worth to look into Ververica Plaform for this. The > community editi

Task Assignment

2020-04-22 Thread Navneeth Krishnan
Hi All, Is there a way for an upstream operator to know how the downstream operator tasks are assigned? Basically I want to group my messages to be processed on slots in the same node based on some key. Thanks

Re: RocksDB default logging configuration

2020-04-22 Thread Chesnay Schepler
AFAIK this is not possible; the client doesn't know anything about the cluster configuration. FLINK-15747 proposes to add an additional config option for controlling the logging behavior. The only workaround I can think of would be to create a custom Flink distribution with a modified RocksD

Re: batch range sort support

2020-04-22 Thread Benchao Li
Hi Jingsong, Thanks for your quick response. I've CC'ed Chongchen who understands the scenario much better. Jingsong Li 于2020年4月23日周四 下午12:34写道: > Hi, Benchao, > > Glad to see your requirement about range partition. > I have a branch to support range partition: [1] > > Can you describe your sc

how to enable retract?

2020-04-22 Thread lec ssmi
Hi: Is there an aggregation operation or window operation, the result is with retract characteristics?