Re: KafkaConsumer keeps getting InstanceAlreadyExistsException

2020-03-18 Thread Rong Rong
information. There might be >>>> some slight impact on the accuracy of the consumer metrics, but should be >>>> almost ignorable because the partition discoverer is quite inactive >>>> compared with the actual consumer. >>>> >>>>

Re: KafkaConsumer keeps getting InstanceAlreadyExistsException

2020-03-15 Thread Rong Rong
We also had seen this issue before running Flink apps in a shared cluster environment. Basically, Kafka is trying to register a JMX MBean[1] for application monitoring. This is only a WARN suggesting that you are registering more than one MBean with the same client id "consumer-1", it should not a

Flink Kafka consumer auto-commit timeout

2020-03-08 Thread Rong Rong
Hi All, I would like to bring back this discussion which I saw multiple times in previous ML threads [1], but there seem to have no solution if checkpointing is disabled. All of these ML reported exceptions have one common pattern: > *INFO* org.apache.kafka.clients.consumer.internals.AbstractCoo

Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-21 Thread Rong Rong
Congratulations Jingsong!! Cheers, Rong On Fri, Feb 21, 2020 at 8:45 AM Bowen Li wrote: > Congrats, Jingsong! > > On Fri, Feb 21, 2020 at 7:28 AM Till Rohrmann > wrote: > >> Congratulations Jingsong! >> >> Cheers, >> Till >> >> On Fri, Feb 21, 2020 at 4:03 PM Yun Gao wrote: >> >>> Congr

Re: [ANNOUNCE] Apache Flink 1.10.0 released

2020-02-12 Thread Rong Rong
Congratulations, a big thanks to the release managers for all the hard works!! -- Rong On Wed, Feb 12, 2020 at 5:52 PM Yang Wang wrote: > Excellent work. Thanks Gary & Yu for being the release manager. > > > Best, > Yang > > Jeff Zhang 于2020年2月13日周四 上午9:36写道: > >> Congratulations! Really appre

Re: Yarn Kerberos issue

2020-01-13 Thread Rong Rong
hub.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/security/README.md#dt-renewal-renewers-and-yarn > > > > If you think that you need more information about our issue, we can > organize a call and discuss about it. > > > > Regards, > > J

Re: Yarn Kerberos issue

2020-01-12 Thread Rong Rong
> > > >> Yang > > > > >> > > > > >> Juan Gentile 于2020年1月6日周一 下午3:55写道: > > > > >> > > > > >>> Hello Rong, Chesnay, > > > > >>> > > > &

Re: Completed job wasn't saved to archive

2020-01-09 Thread Rong Rong
Hi Pavel, Sorry for bringing this thread up so late. I was digging into the usage of the Flink history server and I found one situation where there would be no logs and no failure/success message from the cluster: In very rare case in our Flink-YARN session cluster: if an application master (JobMa

Re: Yarn Kerberos issue

2020-01-04 Thread Rong Rong
Hi Juan, Chesnay was right. If you are using CLI to launch your session cluster based on the document [1], you following the instruction to use kinit [2] first seems to be one of the right way to go. Another way of approaching it is to setup the kerberos settings in the flink-conf.yaml file [3]. F

Re: Flink ML feature

2019-12-12 Thread Rong Rong
Hi guys, Yes, as Till mentioned. The community is working on a new ML library and we are working closely with the Alink project to bring the algorithms. You can find more information regarding the new ML design architecture in FLIP-39 [1]. One of the major change is that the new ML library [2] wi

Re: Apache Flink - Throttling stream flow

2019-11-27 Thread Rong Rong
Hi Mans, is this what you are looking for [1][2]? -- Rong [1] https://issues.apache.org/jira/browse/FLINK-11501 [2] https://github.com/apache/flink/pull/7679 On Mon, Nov 25, 2019 at 3:29 AM M Singh wrote: > Thanks Ciazhi & Thomas for your responses. > > I read the throttling example but want

Re: [DISCUSS] Support configure remote flink jar

2019-11-23 Thread Rong Rong
Thanks @Tison for starting the discussion and sorry for joining so late. Yes, I think this is a very good idea. we already tweak the flink-yarn package internally to support something similar to what @Thomas mentioned: to support registering a Jar that has already uploaded to some DFS (needless to

Re: Limit max cpu usage per TaskManager

2019-11-09 Thread Rong Rong
Hi Lu, Yang is right. enabling cgroup isolation is probably the one you are looking for to control how Flink utilize the CPU resources. One more idea is to enable DominantResourceCalculator[1] which I think you've probably done so already. Found an interesting read[2] about this if you would lik

Re: flink on yarn-cluster kerberos authentication for hbase

2019-11-08 Thread Rong Rong
Hi Can you share more information regarding how you currently setup your Kerberos that only works with Zookeeper? Does your HBase share the same KDC? -- Rong On Fri, Nov 8, 2019 at 12:33 AM venn wrote: > Thanks, I already seen, not work for me > > > > *发件人**:* Jaqie Chan > *发送时间:* Friday, No

Re: [DISCUSS] Semantic and implementation of per-job mode

2019-10-31 Thread Rong Rong
Hi All, Thanks @Tison for starting the discussion and I think we have very similar scenario with Theo's use cases. In our case we also generates the job graph using a client service (which serves multiple job graph generation from multiple user code) and we've found that managing the upload/downlo

Re: JDBC Table Sink doesn't seem to sink to database.

2019-10-17 Thread Rong Rong
> > > On Thu, 17 Oct 2019 at 14:00, Rong Rong wrote: > >> Yes, I think having a time interval execution (for the AppendableSink) >> should be a good idea. >> Can you please open a Jira issue[1] for further discussion. >> >> -- >> Rong >> >>

Re: JDBC Table Sink doesn't seem to sink to database.

2019-10-17 Thread Rong Rong
it to batch interval = 1 and it works fine. Anyways I > think the JDBC sink could have some improvements like batchInterval + time > interval execution. So if the batch doesn't fill up then execute what ever > is left on that time interval. > > On Thu, 17 Oct 2019 at 12:22, Ro

Re: JDBC Table Sink doesn't seem to sink to database.

2019-10-17 Thread Rong Rong
Hi John, You are right. IMO the batch interval setting is used for increasing the JDBC execution performance purpose. The reason why your INSERT INTO statement with a `non_existing_table` the exception doesn't happen is because the JDBCAppendableSink does not check table existence beforehand. That

Re: Flink Join Time Window

2019-09-30 Thread Rong Rong
Hi Nishant, On a brief look. I think this is a problem with your 2nd query: > > *Table2*... > Table latestBadIps = tableEnv.sqlQuery("SELECT MAX(b_proctime) AS > mb_proctime, bad_ip FROM BadIP ***GROUP BY bad_ip***HAVING > MIN(b_proctime) > CURRENT_TIMESTAMP - INTERVAL '2' DAY "); > tableEnv.regi

Re: Extending Flink's SQL-Parser

2019-09-18 Thread Rong Rong
Hi Dominik, To add to Rui's answer. there are other examples I can think of on how to extend Calcite's DDL syntax is already in Calcite's Server module [1] and one of our open-sourced project [2]. you might want to check them out. -- Rong [1] https://github.com/apache/calcite/blob/master/server/

Re: [ANNOUNCE] Zili Chen becomes a Flink committer

2019-09-11 Thread Rong Rong
Congratulations Zili! -- Rong On Wed, Sep 11, 2019 at 6:26 PM Hequn Cheng wrote: > Congratulations! > > Best, Hequn > > On Thu, Sep 12, 2019 at 9:24 AM Jark Wu wrote: > >> Congratulations Zili! >> >> Best, >> Jark >> >> On Wed, 11 Sep 2019 at 23:06, wrote: >> >> > Congratulations, Zili. >> >

Re: type error with generics ..

2019-08-26 Thread Rong Rong
;>> Otherwise the type has to be specified explicitly using type information. >>> [info] at >>> org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:882) >>> [info] at >>> org.apache.flink.api.java.typeutils

Re: type error with generics ..

2019-08-25 Thread Rong Rong
.. > > (BTW I am not sure how to invoke returns on a DataStream and hence had to > do a fake map - any suggestions here ?) > > regards. > > On Sat, Aug 24, 2019 at 10:26 PM Rong Rong wrote: > >> Hi Debasish, >> >> I think the error refers to the output of

Re: type error with generics ..

2019-08-24 Thread Rong Rong
Hi Debasish, I think the error refers to the output of your source instead of your result of the map function. E.g. DataStream ins = readStream(in, Data.class, serdeData)*.returns(new TypeInformation);* DataStream simples = ins.map((Data d) -> new Simple(d.getName())) .returns(new TypeHint(){}.get

Re: Problem with Flink on Yarn

2019-08-23 Thread Rong Rong
This seems like your Kerberos server is starting to issue invalid token to your job manager. Can you share how your Kerberos setting is configured? This might also relate to how your KDC servers are configured. -- Rong On Fri, Aug 23, 2019 at 7:00 AM Zhu Zhu wrote: > Hi Juan, > > Have you tried

Re: Issue with FilterableTableSource and the logical optimizer rules

2019-08-19 Thread Rong Rong
Hi Itamar, The problem you described sounds similar to this ticket[1]. Can you try to see if following the solution listed resolves your issue? -- Rong [1] https://issues.apache.org/jira/browse/FLINK-12399 On Mon, Aug 19, 2019 at 8:56 AM Itamar Ravid wrote: > Hi, I’m facing a strange issue wi

Re: [ANNOUNCE] Andrey Zagrebin becomes a Flink committer

2019-08-14 Thread Rong Rong
Congratulations Andrey! On Wed, Aug 14, 2019 at 10:14 PM chaojianok wrote: > Congratulations Andrey! > At 2019-08-14 21:26:37, "Till Rohrmann" wrote: > >Hi everyone, > > > >I'm very happy to announce that Andrey Zagrebin accepted the offer of the > >Flink PMC to become a committer of the Flink

Re: how to get the code produced by Flink Code Generator

2019-08-07 Thread Rong Rong
+1. I think this would be a very nice way to provide more verbose feedback for debugging. -- Rong On Wed, Aug 7, 2019 at 9:28 AM Fabian Hueske wrote: > Hi Vincent, > > I don't think there is such a flag in Flink. > However, this sounds like a really good idea. > > Would you mind creating a Jira

Re: [ANNOUNCE] Hequn becomes a Flink committer

2019-08-07 Thread Rong Rong
Congratulations Hequn, well deserved! -- Rong On Wed, Aug 7, 2019 at 8:30 AM wrote: > Congratulations, Hequn! > > > > *From:* Xintong Song > *Sent:* Wednesday, August 07, 2019 10:41 AM > *To:* d...@flink.apache.org > *Cc:* user > *Subject:* Re: [ANNOUNCE] Hequn becomes a Flink committer > > >

Re: Timestamp(timezone) conversion bug in non blink Table/SQL runtime

2019-07-23 Thread Rong Rong
Hi Shuyi, I think there were some discussions in the mailing list [1,2] and JIRA tickets [3,4] that might be related. Since the table-blink planner doesn't produce such error, I think this problem is valid and should be fixed. Thanks, Rong [1] http://apache-flink-user-mailing-list-archive.233605

Re: [Table API] ClassCastException when converting a table to DataStream

2019-07-23 Thread Rong Rong
ases that force us to covert Table from >>> and to DataStream. >>> One such case is to append to Table a column showing the current >>> watermark of each record; there's no other way but to do that as >>> ScalarFunction doesn't allow us to get the runtime con

Re: [Table API] ClassCastException when converting a table to DataStream

2019-07-19 Thread Rong Rong
a type > information hint as you said. > That is needed later when I need to make another table like >"*Table anotherTable = tEnv.fromDataStream(stream);"*, > Without the type information hint, I've got an error >"*An input of GenericTypeInfo cannot be

Re:

2019-07-18 Thread Rong Rong
Hi Tangkailin, If I understand correctly from the snippet, you are trying to invoke this in some sort of window correct? If that's the case, your "apply" method will be invoked every time at the window fire[1]. This means there will be one new instance of the HashMap created each time "apply" is i

Re: [Table API] ClassCastException when converting a table to DataStream

2019-07-18 Thread Rong Rong
Hi Dongwon, Can you provide a bit more information: which Flink version are you using? what is the "sourceTable.getSchema().toRowType()" return? what is the line *".map(a -> a)" *do and can you remove it? if I am understanding correctly, you are also using "time1" as the rowtime, is that want your

Re: Flink SQL API: Extra columns added from order by

2019-07-12 Thread Rong Rong
Hi Morrisa, Can you share more information regarding what type of function "formatDate" is and how did you configure the return type of that function? For the question on the first query If the return type is String, then ASC on a string value should be on alphabetical ordering. However on the th

Re: [ANNOUNCE] Rong Rong becomes a Flink committer

2019-07-11 Thread Rong Rong
gt;>> Best, >>> >>>> Xingcan >>> >>>> >>> >>>> On Jul 11, 2019, at 1:08 PM, Shuyi Chen wrote: >>> >>>> >>> >>>> Congratulations, Rong! >>> >>>> >>> >>>>

Re: How are kafka consumer offsets handled if sink fails?

2019-07-08 Thread Rong Rong
Hi John, I think what Konstantin is trying to say is: Flink's Kafka consumer does not start consuming from the Kafka commit offset when starting the consumer, it would actually start with the offset that's last checkpointed to external DFS. (e.g. the starting point of the consumer has no relevance

Re: Flink Table API and Date fields

2019-07-08 Thread Rong Rong
Hi Flavio, Yes I think the handling of the DateTime in Flink can be better when dealing with DATE TIME type of systems. There are several limitations Jingsong briefly mentioned about java.util.Date. Some of these limitations are even affecting correctness of the results (e.g. Gregorian vs Julian c

Re: Apache Flink Sql - How to access EXPR$0, EXPR$1 values from a Row in a table

2019-06-17 Thread Rong Rong
Hi Mans, I am not sure if you intended to name them like this. but if you were to access them you need to escape them like `EXPR$0` [1]. Also I think Flink defaults unnamed fields in a row to `f0`, `f1`, ... [2]. so you might be able to access them like that. -- Rong [1] https://calcite.apache.o

Re: How can I improve this Flink application for "Distinct Count of elements" in the data stream?

2019-06-13 Thread Rong Rong
t; Hi Rong, >>>> >>>> thanks for your answer. If I understood well, the option will be to use >>>> ProcessFunction [1] since it has the method onTimer(). But not the >>>> ProcessWindowFunction [2], because it does not have the method onTimer(). I >>>> will

Re: How can I improve this Flink application for "Distinct Count of elements" in the data stream?

2019-06-11 Thread Rong Rong
Hi Felipe, there are multiple ways to do DISTINCT COUNT in Table/SQL API. In fact there's already a thread going on recently [1] Based on the description you provided, it seems like it might be a better API level to use. To answer your question, - You should be able to use other TimeCharacteristi

Re: [SURVEY] Usage of flink-ml and [DISCUSS] Delete flink-ml

2019-05-23 Thread Rong Rong
+1 for the deletion. Also I think it also might be a good idea to update the roadmap for the plan of removal/development since we've reached the consensus on FLIP-39. Thanks, Rong On Wed, May 22, 2019 at 8:26 AM Shaoxuan Wang wrote: > Hi Chesnay, > Yes, you are right. There is not any active

Re: Propagating delta from window upon trigger

2019-05-22 Thread Rong Rong
Hi Miki, Upon trigger, window will be fired with all of its content in its state invoking the "emitWindowContent" method. which will further invoke the window function you define. Thus if your goal is to only emit the delta, one thing is to do so in your window function. One challenge you might fa

Re: RichAsyncFunction for Scala?

2019-05-17 Thread Rong Rong
Hi Shannon, I think the RichAsyncFunction[1] extends from the normal AsyncFunction so regarding on the API perspective you should be able to use it. The problem I think is with Scala anonymous function where I think it went through a different code path when wrapping the Scala RichAsyncFunction [

Re: Approach to Auto Scaling Flink Job

2019-05-16 Thread Rong Rong
Hi Anil, A typical Yarn Resource Manager setting consist of 2 RM nodes [1] for active/standby setup. FYI: We've also shared some practical experiences for the limitation of this setup, and potential redundant fail-save mechanisms in our latest talk[2] in this year's FlinkForward. Thanks, Rong [1

Re: Flink ML Use cases

2019-05-14 Thread Rong Rong
Hi Abhishek, Based on your description, I think this FLIP proposal[1] seems to fit perfectly for your use case. you can also checkout the Github repo by Boris (CCed) for the PMML implementation[2]. This proposal is still under development [3], you are more than welcome to test out and share your f

Re: Approach to Auto Scaling Flink Job

2019-05-12 Thread Rong Rong
Hi Anil, The reason why we are using Docker is because internally we support Dockerized container for microservices. Ideally speaking this can be any external service running on something other than the actual YARN cluster you Flink application resides. Basically watchdog runs outside of the Flin

Re: Approach to Auto Scaling Flink Job

2019-05-08 Thread Rong Rong
Hi Anil, We have a presentation[1] that briefly discuss the higher level of the approach (via watchdog) in FlinkForward 2018. We are also restructuring the approach of our open-source AthenaX: Right now our internal implementation has diverged from the open-source for too long, it has been a prob

Re: Inconsistent documentation of Table Conditional functions

2019-05-08 Thread Rong Rong
Hi Flavio, I believe the documentation meant "X" as a placeholder, where you can convert "X" into the numeric values (1, 2, ...) depends on how many "CASE WHEN" conditions you have. *"resultZ" *is the default result in the "ELSE" statement, and thus it is a literal. Thanks, Rong On Wed, May 8, 2

Re: Approach to Auto Scaling Flink Job

2019-05-08 Thread Rong Rong
Hi Anil, Thanks for reporting the issue. I went through the code and I believe the auto-scaling functionality is still in our internal branch and has not been merged to the open-source branch yet. I will change the documentation accordingly. Thanks, Rong On Mon, May 6, 2019 at 9:54 PM Anil wrot

Re: Filter push-down not working for a custom BatchTableSource

2019-05-04 Thread Rong Rong
Hi Josh, I think I found the root cause of this issue (please see my comment in https://issues.apache.org/jira/browse/FLINK-12399). As of now, you can try override the expalinSource() interface to let calcite know that the tablesource after calling applyPredicate is different from the one before c

Re: How to implement custom stream operator over a window? And after the Count-Min Sketch?

2019-04-25 Thread Rong Rong
t; *--* > *-- Felipe Gutierrez* > > *-- skype: felipe.o.gutierrez* > *--* *https://felipeogutierrez.blogspot.com > <https://felipeogutierrez.blogspot.com>* > > > On Wed, Apr 24, 2019 at 2:06 AM Rong Rong wrote: > >> Hi Felipe, >> >> In a short glance, the que

Re: Apache Flink - Question about dynamically changing window end time at run time

2019-04-24 Thread Rong Rong
;> Regarding CEP option - I believe that CEP patterns cannot be changed >> dynamically once they've been complied which limits it usage. >> >> Please feel free to correct me. >> >> Thanks for your help and pointers. >> >> On Tuesday, April 23, 2019, 8:12:5

Re: Apache Flink - Question about dynamically changing window end time at run time

2019-04-23 Thread Rong Rong
Hi Mans, I am not sure what you meant by "dynamically change the end-time of a window. If you are referring to dynamically determines the firing time of the window, then it fits into the description of session window [1]: If you want to handle window end time dynamically, one way of which I can th

Re: How to implement custom stream operator over a window? And after the Count-Min Sketch?

2019-04-23 Thread Rong Rong
Hi Felipe, In a short glance, the question can depend on how your window is (is there any overlap like sliding window) and how many data you would like to process. In general, you can always buffer all the data into a ListState and apply your window function by iterating through all those buffere

Re: Create Dynamic data type

2019-04-19 Thread Rong Rong
Hi Soheil, If I understand correctly, when you said "according to the number of rows", you were trying to dynamically determine the RowType based on how long one row is, correct? In this case, I am not sure this is considered supported in JDBCInputFormat at this moment and it would be hard to supp

Re: Service discovery on YARN - find out which port was dynamically assigned to the JobManager Web Interface

2019-04-17 Thread Rong Rong
As far as I know, the port will be set to random binding. Yarn actually have the ability to translate the proxy link to the right node/port. If your goal is trying to avoid going through the YARN rest proxy, this could be a problem: There's chances that the host/port will get changed by YARN witho

Re: Flink JDBC: Disable auto-commit mode

2019-04-15 Thread Rong Rong
> >> >> *From:* Papadopoulos, Konstantinos >> >> *Sent:* Δευτέρα, 15 Απριλίου 2019 12:30 μμ >> *To:* Fabian Hueske >> *Cc:* Rong Rong ; user >> *Subject:* RE: Flink JDBC: Disable auto-commit mode >> >> >> >> Hi Fabian, >>

Re: Flink JDBC: Disable auto-commit mode

2019-04-12 Thread Rong Rong
Hi Konstantinos, Seems like setting for auto commit is not directly possible in the current JDBCInputFormatBuilder. However there's a way to specify the fetch size [1] for your DB round-trip, doesn't that resolve your issue? Similarly in JDBCOutputFormat, a batching mode was also used to stash up

Re: [ANNOUNCE] Apache Flink 1.8.0 released

2019-04-10 Thread Rong Rong
Congrats! Thanks Aljoscha for being the release manager and all for making the release possible. -- Rong On Wed, Apr 10, 2019 at 4:23 AM Stefan Richter wrote: > Congrats and thanks to Aljoscha for managing the release! > > Best, > Stefan > > > On 10. Apr 2019, at 13:01, Biao Liu wrote: > > >

Re: Source reinterpretAsKeyedStream

2019-03-29 Thread Rong Rong
Hi Adrienne, I think you should be able to reinterpretAsKeyedStream by passing in a DataStreamSource based on the ITCase example [1]. Can you share the full code/error logs or the IAE? -- Rong [1] https://github.com/apache/flink/blob/release-1.7.2/flink-streaming-java/src/test/java/org/apache/fl

Re: Infinitely requesting for Yarn container in Flink 1.5

2019-03-29 Thread Rong Rong
Hi Qi, I think the problem may be related to another similar problem reported in a previous JIRA [1]. I think a PR is also in discussion. Thanks, Rong [1] https://issues.apache.org/jira/browse/FLINK-10868 On Fri, Mar 29, 2019 at 5:09 AM qi luo wrote: > Hello, > > Today we encountered an issue

Re: Calcite SQL Map to Pojo Map

2019-03-29 Thread Rong Rong
I think the proper solution should not be Types.GENERIC(Map.class) as you will not be able to do any success processing with the return object. For example, Map['k', 'v'].get('k') will not work. I think there might be some problem like you suggested that they are handled as GenericType instead of

Re: Calcite SQL Map to Pojo Map

2019-03-28 Thread Rong Rong
If your conversion is done using a UDF you need to override the getResultType method [1] to explicitly specify the key and value type information. As generic erasure will not preseve the part of your code. Thanks, Rong [1] https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/udf

Re: Map UDF : The Nothing type cannot have a serializer

2019-03-21 Thread Rong Rong
Based on what I saw in the implementation, I think you meant to implement a ScalarFunction right? since you are only trying to structure a VarArg string into a Map. If my understanding was correct. I think the Map constructor[1] is something you might be able to leverage. It doesn't resolve your N

Re: Backoff strategies for async IO functions?

2019-03-19 Thread Rong Rong
; Cheers, > Till > > On Wed, Mar 13, 2019 at 5:42 PM Rong Rong wrote: > >> Thanks for raising the concern @shuyi and the explanation @konstantin. >> >> Upon glancing on the Flink document, it seems like user have full control >> on the timeout behavior [1]. But u

Re: Backoff strategies for async IO functions?

2019-03-13 Thread Rong Rong
Thanks for raising the concern @shuyi and the explanation @konstantin. Upon glancing on the Flink document, it seems like user have full control on the timeout behavior [1]. But unlike AsyncWaitOperator, it is not straightforward to access the internal state of the operator to, for example, put th

Re: Schema Evolution on Dynamic Schema

2019-03-09 Thread Rong Rong
Hi Shahar, >From my understanding, if you use "groupby" withAggregateFunctions, they save the accumulators to SQL internal states: which are invariant from your input schema. Based on what you described I think that's why it is fine for recovering from existing state. I think one confusion you mig

Re: Schema Evolution on Dynamic Schema

2019-03-08 Thread Rong Rong
y i can transform the Row object to my generated >class by maybe the Row's column names corresponding to the generated class >field names, though i don't see Row object has any notion of column names. > > Would love to hear your thoughts. If you want me to paste some code he

Re: Schema Evolution on Dynamic Schema

2019-03-07 Thread Rong Rong
Hi Shahar, I wasn't sure which schema are you describing that is going to "evolve" (is it the registered_table? or the output sink?). It will be great if you can clarify more. For the example you provided, IMO it is more considered as logic change instead of schema evolution: - if you are changin

Re: [Flink-Question] In Flink parallel computing, how do different windows receive the data of their own partition, that is, how does Window determine which partition number the current Window belongs

2019-03-03 Thread Rong Rong
Hi I am not sure if I understand your question correctly, so will try to explain the flow how elements gets into window operators. Flink makes the partition assignment before invoking the operator to process element. For the word count example, WindowOperator is invoked by StreamInputProcessor[1]

Re: Flink window triggering and timing on connected streams

2019-02-26 Thread Rong Rong
Hi Andrew, To add to the answer Till and Hequn already provide. WindowOperator are operating on a per-key-group based. so as long as you only have one open session per partition key group, you should be able to manage the windowing using the watermark strategy Hequn mentioned. As Till mentioned, t

Re: SinkFunction.Context

2019-02-21 Thread Rong Rong
Hi Durga, 1. currentProcessingTime: refers to this operator(SinkFunction)'s system time at the moment of invoke 1a. the time you are referring to as "flink window got the message" is the currentProcessingTime() invoked at the window operator (which provided by the WindowContext similar to this one

Re: Metrics for number of "open windows"?

2019-02-21 Thread Rong Rong
Hi Andrew, I am assuming you are actually using customized windowAssigner, trigger and process function. I think the best way for you to keep in-flight, not-yet-triggered windows is to emit metrics in these 3 pieces. Upon looking at the window operator, I don't think there's a a metrics (guage) t

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-21 Thread Rong Rong
Rong On Thu, Feb 21, 2019 at 8:50 AM Stephan Ewen wrote: > Hi Rong Rong! > > I would add the security / kerberos threads to the roadmap. They seem to > be advanced enough in the discussions so that there is clarity what will > come. > > For the window operator with slicing, I

Re: Impact of occasional big pauses in stream processing

2019-02-14 Thread Rong Rong
derstanding is that this should not impact other flink jobs. Is that > correct? > > > > Thanks. > > > > Ajay > > > > *From: *Andrey Zagrebin > *Date: *Thursday, February 14, 2019 at 5:09 AM > *To: *Rong Rong > *Cc: *"Aggarwal, Ajay" , "

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-14 Thread Rong Rong
t;>> Because it is easier to update the roadmap on wiki compared to on flink web >>> site. And I guess we may need to update the roadmap very often at the >>> beginning as there's so many discussions and proposals in community >>> recently. We can move it into fli

Re: Impact of occasional big pauses in stream processing

2019-02-13 Thread Rong Rong
Hi Ajay, Flink handles "backpressure" in a graceful way so that it doesn't get affected when your processing pipeline is occasionally slowed down. I think the following articles will help [1,2]. In your specific case: the "KeyBy" operation will re-hash data so they can be reshuffled from all inpu

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-13 Thread Rong Rong
Thanks Stephan for the great proposal. This would not only be beneficial for new users but also for contributors to keep track on all upcoming features. I think that better window operator support can also be separately group into its own category, as they affects both future DataStream API and b

Re: Is there a windowing strategy that allows a different offset per key?

2019-02-11 Thread Rong Rong
getKey(IN value)Hi Stephen, Yes, we had a discussion regarding for dynamic offsets and keys [1]. The main idea was the same: we don't have many complex operators after the window operator, thus a huge spike of traffic will occur after firing on the window boundary. In the discussion the best idea

Re: Can an Aggregate the key from a WindowedStream.aggregate()

2019-02-11 Thread Rong Rong
Hi Stephen, Chesney was right, you will have to use a more complex version of the window processing function. Perhaps your goal can be achieve by this specific function with incremental aggregation [1]. If not you can always use the regular process window function [2]. Both of these methods have a

Re: Get nested Rows from Json string

2019-02-08 Thread Rong Rong
; I've changed Rows to Map, which ease the conversion process. > > Nevertheless I'm interested in any explanation about why row1.setField(i, > row2) appeends row2 at the end of row1. > > All the best > > François > > Le mer. 6 févr. 2019 à 19:33, Rong Rong a écrit :

Re: Using custom evictor and trigger on Table API

2019-02-06 Thread Rong Rong
Hi Dongwon, There was a previous thread regarding this[1], unfortunately this is not supported yet. However there are some latest development proposal[2,3] to enhance the TableAPI which might be able to support your use case. -- Rong [1] http://apache-flink-user-mailing-list-archive.2336050.n4.

Re: Get nested Rows from Json string

2019-02-06 Thread Rong Rong
Hi François, I wasn't exactly sure this is a JSON object or JSON string you are trying to process. For a JSON string this [1] article might help. For a JSON object, I am assuming you are trying to convert it into a TableSource and processing using Table/SQL API, you could probably use the example

Re: TimeZone shift problem in Flink SQL

2019-01-25 Thread Rong Rong
Hi Henry, Unix epoch time values are always under GMT timezone, for example: - 1548162182001 <=> GMT: Tuesday, January 22, 2019 1:03:02.001 PM, or CST: Tuesday, January 22, 2019 9:03:02.001 PM. - 1548190982001 <=> GMT: Tuesday, January 22, 2019 9:03:02.001 PM, or CST: Wednesday, January 23, 2019 4

Re: ElasticSearch RestClient throws NoSuchMethodError due to shade mechanism

2019-01-15 Thread Rong Rong
Hi Henry, I was not sure if this is the suggested way. but from what I understand of the pom file in elasticsearch5, you are allowed to change the sub version of the org.ealisticsearch.client via manually override using -Delasticsearch.version=5.x.x during maven build progress if you are using a

Re: Is the eval method invoked in a thread safe manner in Fink UDF

2019-01-13 Thread Rong Rong
According to the codegen result, I think each field is invoked sequentially. However, if you maintain internal state within your UDF, it is your responsibility to maintain the internal state consistency. Are you invoking external RPC in your "GetName" UDF method and that has to be async? -- Rong

Re: Java Exapmle of Stochastic Outlier Selection

2019-01-12 Thread Rong Rong
Hi James, Usually Flink ML is highly integrated with Scala. I did poke around to and try to make the example work in Java and it does require a significant amount of effort, but you can try: First the implicit type parameters needs to be passed over to the execution environment to generate the Da

Re: Questions about UDTF in flink SQL

2018-11-30 Thread Rong Rong
Hi Wangsan, If your require is essentially wha Jark describe, we already have a proposal following up [FLINK-9249] in its related/parent task: [FLINK-9484]. We are already implementing some of these internally and have one PR ready for review for FLINK-9294. Please kindly take a look and see if t

Re: Reset kafka offets to latest on restart

2018-11-21 Thread Rong Rong
Hi Vishal, You can probably try using similar offset configuration as a service consumer. Maybe this will be useful to look at [1] Thanks, Rong [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-consumers-start-position-configuration On Wed, Nov 21, 2018

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Rong Rong
Hi Xuefu, Thanks for putting together the overview. I would like to add some more on top of Timo's comments. 1,2. I agree with Timo that a proper catalog support should also address the metadata compatibility issues. I was actually wondering if you are referring to something like utilizing table s

Re: Does Flink SQL "in" operation has length limit?

2018-10-01 Thread Rong Rong
e corresponding code? I haven't looked into the code but we should >> definitely support this query. @Henry feel free to open an issue for it. >> >> Regards, >> Timo >> >> >> Am 28.09.18 um 19:14 schrieb Rong Rong: >> >> Yes. >> >&

Re: Does Flink SQL "in" operation has length limit?

2018-09-28 Thread Rong Rong
Best, Hequn > > On Fri, Sep 28, 2018 at 9:27 PM Rong Rong wrote: > >> Hi Henry, Vino. >> >> I think IN operator was translated into either a RexSubQuery or a >> SqlStdOperatorTable.IN operator. >> I think Vino was referring to the first case. >>

Re: Does Flink SQL "in" operation has length limit?

2018-09-28 Thread Rong Rong
Hi Henry, Vino. I think IN operator was translated into either a RexSubQuery or a SqlStdOperatorTable.IN operator. I think Vino was referring to the first case. For the second case (I think that's what you are facing here), they are converted into tuples and the maximum we currently have in Flink

Re: How to get the location of keytab when using flink on yarn

2018-09-24 Thread Rong Rong
Hi Just a quick thought on this: You might be able to use delegation token to access HBase[1]. It might be a more secure way instead of distributing your keytab over to all the YARN nodes. Hope this helps. -- Rong [1] https://wiki.apache.org/hadoop/Hbase/HBaseTokenAuthentication On Mon, Sep 24

Re: Accessing Global State when processing KeyedStreams

2018-09-19 Thread Rong Rong
Hi Scott, Your use case seems to be a perfect fit for the Broadcast state pattern [1]. -- Rong [1]: https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/state/broadcast_state.html On Wed, Sep 19, 2018 at 7:11 AM Scott Sue wrote: > Hi, > > In our application, we receive Orde

Re: Question about Window Tigger

2018-09-19 Thread Rong Rong
Hi Chang, There were some previous discussion regarding how to debug watermark and window triggers[1]. Basically if there's no data for some partitions there's no way to advance watermark. As it would not be able to determine whether this is due to network failure or actually there's no data arriv

Re: why same Sliding ProcessTime TimeWindow triggered twice

2018-09-17 Thread Rong Rong
I haven't dug too deep into the content. But seems like this line was the reason: .keyBy(s => s.endsWith("FRI")) essentially you are creating two key partitions (True, False) where each one of them has its own sliding window I believe. Can you printout the key space for each of th

Re: Potential bug in Flink SQL HOP and TUMBLE operators

2018-09-17 Thread Rong Rong
This is in fact a very strange behavior. To add to the discussion, when you mentioned: "raw Flink (windowed or not) nor when using Flink CEP", how were the comparisons being done? Also, were you able to get the results correct without the additional GROUP BY term of "foo" or "userId"? -- Rong On

Re: ListState - elements order

2018-09-13 Thread Rong Rong
I don't think ordering is guaranteed in the internal implementation, to the best of my knowledge. I agreed with Aljoscha, if there is no clear definition of ordering, it is assumed to be not preserved by default. -- Rong On Thu, Sep 13, 2018 at 7:30 PM vino yang wrote: > Hi Aljoscha, > > Regard

  1   2   >