Re: Concatenating a bounded and unbounded stream

2022-10-26 Thread Jin Yi
would using a hybrid source work for you if it's the same type between the sources? https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/hybridsource/ On Wed, Oct 26, 2022 at 8:05 AM Noel OConnor wrote: > Hi, > I have a need to create a new stream from a bounded and un

Re: Write to Aliyun OSS via FileSystem connector hang Job Master on Finishing

2022-04-25 Thread Yi Tang
/flink/blob/master/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/sink/FileSink.java > > Best, > Guowei > > > On Sun, Apr 24, 2022 at 11:39 AM Yi Tang wrote: > >> >> >> -- Forwarded message - >> Fro

Fwd: Write to Aliyun OSS via FileSystem connector hang Job Master on Finishing

2022-04-23 Thread Yi Tang
-- Forwarded message - From: Yi Tang Date: Sun, Apr 24, 2022 at 11:29 AM Subject: Write to Aliyun OSS via FileSystem connector hang Job Master on Finishing To: Hi team; I'm trying to write to aliyun oss via FileSystem connector. The job master always hangs on finishing.

Re: Weird Flink Kafka source watermark behavior

2022-04-13 Thread Jin Yi
t; Qingsheng > > > On Apr 13, 2022, at 17:52, Qingsheng Ren wrote: > > > > Hi Jin, > > > > Unfortunately I don’t have any quick bypass in mind except increasing > the tolerance of out of orderness. > > > > Best regards, > > > > Qingshe

Re: Weird Flink Kafka source watermark behavior

2022-04-08 Thread Jin Yi
confirmed that moving back to FlinkKafkaConsumer fixes things. is there some notification channel/medium that highlights critical bugs/issues on the intended features like this pretty readily? On Fri, Apr 8, 2022 at 2:18 AM Jin Yi wrote: > based on symptoms/observations on the first opera

Re: Weird Flink Kafka source watermark behavior

2022-04-08 Thread Jin Yi
rst > split pushes the watermark far away, then records from other splits will be > treated as late events. > > [1] https://issues.apache.org/jira/browse/FLINK-26018 > > Best regards, > > Qingsheng > > > > On Apr 8, 2022, at 15:54, Jin Yi wrote: > > > &g

Re: flink docker image (1.14.4) unable to access other pods from flink program (job and task manager access is fine)

2022-03-30 Thread Jin Yi
onous. thanks for trying to help, 胡伟华. On Tue, Mar 29, 2022 at 7:45 AM 胡伟华 wrote: > Are you referring to creating Flink cluster on Kubernetes by yaml file? > > How did you submit the job to Flink cluster? Not via the command line > (flink run xxx)? > > 2022年3月29日 下午10:38,Jin

Re: flink docker image (1.14.4) unable to access other pods from flink program (job and task manager access is fine)

2022-03-29 Thread Jin Yi
022年3月29日 下午10:32,Jin Yi 写道: > > it's running in k8s. we're not running in app mode b/c we have many jobs > running in the same flink cluster. > > On Tue, Mar 29, 2022 at 4:29 AM huweihua wrote: > >> Hi, Jin >> >> Can you provide more information

Re: flink docker image (1.14.4) unable to access other pods from flink program (job and task manager access is fine)

2022-03-29 Thread Jin Yi
ARN or standalone mode? > Maybe you can use application mode to keeps the environment (network > accessibility) always keep same. Application mode will run the user-main > method in the JobManager, > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/over

flink docker image (1.14.4) unable to access other pods from flink program (job and task manager access is fine)

2022-03-28 Thread Jin Yi
i have a flink job that uses redis as a sink. i optionally do some wiping and metadata writing from the job submitting flink program before it actually executes/submits the job to the job manager. when i don't do this redis preparation, the redis sink works completely fine. that is, the redis co

streaming mode with both finite and infinite input sources

2022-02-24 Thread Jin Yi
so we have a streaming job where the main work to be done is processing infinite kafka sources. recently, i added a fromCollection (finite) source to simply write some state once upon startup. this all seems to work fine. the finite source operators all finish, while all the infinite source oper

Re: Loading a Hive lookup table data into TM memory takes so long.

2022-02-07 Thread Jason Yi
is not only slow but also very > memory-consuming. > > Have you tried joining your main stream with the hive table directly (that > is, using streaming joins instead of lookup joins)? Does that meet your > need or why do you have to use lookup joins? > > Jason Yi <93t...@gmail.

Loading a Hive lookup table data into TM memory takes so long.

2022-02-04 Thread Jason Yi
Hello, I created external tables on Hive with data in s3 and wanted to use those tables as a lookup table in Flink. When I used an external table containing a small size of data as a lookup table, Flink quickly loaded the data into TM memory and did a Temporal join to an event stream. But, when I

Re: Question - Filesystem connector for lookup table

2022-01-20 Thread Jason Yi
wrote: > Hi Jason, > > It's not (properly) supported and we should update the documentation. > > There is no out of the box possibility to use a file from filesystem as a > lookup table as far as I know. > > Best regards, > > Martijn > > Op do 20 jan. 20

Question - Filesystem connector for lookup table

2022-01-20 Thread Jason Yi
Hello, I have data sets in s3 and want to use them as lookup tables in Flink. I defined tables with the filesystem connector and joined the tables to a table, defined with the Kinesis connector, in my Flink application. I expected its output to be written to s3, but no data was written to a sink t

class Foo extends TupleN {}

2022-01-12 Thread Jin Yi
probably a dumb quesiton, but will the serde performance in flink for the class Foo (from the subject) behave like a POJO or a TupleN?

Re: REST API for detached minicluster based integration test

2021-12-01 Thread Jin Yi
to fetch watermarks from the metrics. If > processWatermark is never called then it means no watermark ever comes > and you might want to check your watermark strategy implementation. > > Jin Yi 于2021年12月1日周三 上午4:14写道: > >> thanks for the reply caizhi! >> >> we're

Re: REST API for detached minicluster based integration test

2021-11-30 Thread Jin Yi
gt; I'm also curious about why do you need to extract the output watermarks > just for stopping the source. You can control the records and the watermark > strategy from the source. From my point of view, constructing some test > data with some specific row time would be enough. >

Re: REST API for detached minicluster based integration test

2021-11-29 Thread Jin Yi
bump. a more general question is what do people do for more end to end, full integration tests to test event time based jobs with timers? On Tue, Nov 23, 2021 at 11:26 AM Jin Yi wrote: > i am writing an integration test where i execute a streaming flink job > using faked, "unbou

REST API for detached minicluster based integration test

2021-11-23 Thread Jin Yi
i am writing an integration test where i execute a streaming flink job using faked, "unbounded" input where i want to control when the source function(s) complete by triggering them once the job's operator's maximum output watermarks are beyond some job completion watermark that's relative to the m

Re: redis sink from flink

2021-08-17 Thread Jin Yi
>> inside. Regarding Redis clients, Jedis (https://github.com/redis/jedis) >> is quite popular and simple to get started. >> >> Let me know if you love to learn more details about our implementation. >> >> Best, >> Yik San >> >> On Tue,

redis sink from flink

2021-08-16 Thread Jin Yi
is apache bahir still a thing? it hasn't been touched for months (since redis 2.8.5). as such, looking at the current flink connector docs, it's no longer pointing to anything from the bahir project. looking around in either the flink or bahir newsgroups doesn't turn up anything regarding bahir'

is there a way to get the watermark per operator in tabular form?

2021-06-15 Thread Jin Yi
in the flink ui, is there a way to update the columns being shown to include the watermarks? in lieu of this, is it possible to query the watermarks throughout a flink job somehow? the rest api? thanks.

should i expect POJO serialization warnings when dealing w/ kryo protobuf serialization?

2021-06-12 Thread Jin Yi
i'm currently using protobufs, and registering the serializers using kryo protobuf using the following snippet of code: static void optionalRegisterProtobufSerializer(ExecutionConfig config, Class clazz) { if (clazz != null) { config.registerTypeWithKryoSerializer(clazz,

Re: behavior differences when sinking side outputs to a custom KeyedCoProcessFunction

2021-06-03 Thread Jin Yi
ency to do. fixing the local watermark generator used in the strategy to account for this properly fixed all of my issues. On Fri, May 21, 2021 at 10:09 AM Jin Yi wrote: > (sorry that the last sentence fragment made it into my email... it was a > draft comment that i forgot to remove. my t

Re: behavior differences when sinking side outputs to a custom KeyedCoProcessFunction

2021-05-21 Thread Jin Yi
uld be the incorrect behavior? should i file an issue/bug? On Thu, May 20, 2021 at 3:39 PM Jin Yi wrote: > hello, > > sorry for a long post, but this is a puzzling problem and i am enough of a > flink non-expert to be unsure what details are important or not. > > background: >

behavior differences when sinking side outputs to a custom KeyedCoProcessFunction

2021-05-20 Thread Jin Yi
hello, sorry for a long post, but this is a puzzling problem and i am enough of a flink non-expert to be unsure what details are important or not. background: i have a flink pipeline that is a series of custom "joins" for the purposes of user event "flattening" that i wrote a custom KeyedCoProces

Re: two questions about flink stream processing: kafka sources and TimerService

2021-05-19 Thread Jin Yi
could be to implement this yourself using a MapState. Otherwise I think you > need to implement your own operator from which you can then access > InternalTimerService similar to how KeyedCoProcessOperator does it as well. > > > Regards > Ingo > > On Wed, May 12, 2021 a

Re: two questions about flink stream processing: kafka sources and TimerService

2021-05-17 Thread Jin Yi
ping? On Tue, May 11, 2021 at 11:31 PM Jin Yi wrote: > hello. thanks ahead of time for anyone who answers. > > 1. verifying my understanding: for a kafka source that's partitioned on > the same piece of data that is later used in a keyBy, if we are relying on > the ka

two questions about flink stream processing: kafka sources and TimerService

2021-05-11 Thread Jin Yi
hello. thanks ahead of time for anyone who answers. 1. verifying my understanding: for a kafka source that's partitioned on the same piece of data that is later used in a keyBy, if we are relying on the kafka timestamp as the event timestamp, is it guaranteed that the event stream of the source

flink cep and out of order events

2021-04-24 Thread Jin Yi
does the default within behavior of flink cep handle out of order events (relative to event time)? obviously, it'd be best if the event time was guaranteed correct, but sometimes it's too difficult to do. do people end up writing different patterns with some event orderings reversed to capture th

WindowFunction is stuck until next message is processed although Watermark with idle timeout is applied.

2021-04-14 Thread Sung Gon Yi
Hello, I have a question about watermark with idle timeout. I made an example about it, https://github.com/skonmeme/rare_stream/blob/main/src/main/scala/com/skonuniverse/flink/RareStreamWithIdealTimeout.scala

Re: Handle late message with flink SQL

2021-03-17 Thread Yi Tang
Thanks Timo. The whole idea is also based on the side output and output tag. Let me explain it in detail: 1. Introduce a VirtualTableScan(or SideOutputTableScan), which can be optimized as Physical RelNode. Then we can create a source catalog table which will be converted to a VirtualTableScan, a

parquet protobuf output and aws athena support

2021-03-15 Thread Jin Yi
using ParquetProtoWriters , does anyone have this working with aws athena ingestion via aws glue crawls? the parquet files being generated by our flink job looks fine

Handle late message with flink SQL

2021-03-15 Thread Yi Tang
We can get a stream from a DataStream api by SideOutput. But it's hard to do the same thing with Flink SQL. I have an idea about how to get the late records while using Flink SQL. Assuming we have a source table for the late records, then we can query late records on it. Obviously, it's not a rea

mixing java libraries between 1.12.x and 1.11.x

2021-03-10 Thread Jin Yi
(also updated https://issues.apache.org/jira/browse/FLINK-19955 w/ this question) i'm in the situation where i want to use ParquetProtoWriters found in flink-parquet 1.12.x. the rest of our codebase, anticipating possibly running on the fully-managed aws flink solution for production, is dependin

Re: Changing the topology while upgrading a job submitted by SQL

2020-11-10 Thread Yi Tang
operators or better > optimization rules that create a smarter pipeline could change the > entire topology with every major Flink version upgrade. > > Regards, > Timo > > On 10.11.20 10:15, Yi Tang wrote: > > Hi folks, > > > > A question about changing the top

Changing the topology while upgrading a job submitted by SQL

2020-11-10 Thread Yi Tang
Hi folks, A question about changing the topology while upgrading a job submitted by SQL. Is it possible for now? Looks like if we want to recover a job from a savepoint, it requires the uid of the operator matches the corresponding one in the state. The automatically generated uid depends largely

Re: Unify error handler and late window record output for SQL api

2020-10-29 Thread Yi Tang
Hi Yun Thanks for your quick reply. To be clear, It's not essential to implement these features into the SQL statement. And precisely because of the limitations of SQL, we want these features happen. 1. Yeah, I think the stream api also has not similar api. We want it because sometimes we want to

Unify error handler and late window record output for SQL api

2020-10-29 Thread Yi Tang
Hi, I'm looking for a way to handle potential errors in job submitted with SQL API, but unfortunately nothing found. Handle errors manually in SQL API is hard, I think. Is there a way to handle all errors and send them to a SideOutput to avoid task failure. Also one can put late records into a Sid

UnsupportedOperatorException with TensorFlow on checkpointing

2020-07-16 Thread Sung Gon Yi
Hi, Following codes have a UnsupportedOperatorException on checkpointing (every time). Could you suggest any solution? Example code: A.java -- public class A extends RichWindowFunction { private transient MapState state; @Override

Convert sql table with field of type MULITSET to datastream with field of type java.util.Map[T, java.lang.Integer]

2020-06-28 Thread YI
ator.scala) Process finished with exit code 1 ``` The result type of aggregation function collect is multiset. How do I convert it to a `java.util.Map[String, java.lang.Integer]`? Cheers, YI

Join tables created from Datastream whose element scala type has field Option[_]

2020-06-18 Thread YI
e Option to indicate some value is missing, just like null in database and hopefully without NPE. How should I define my data types? And which configuration should I take special care? Bests, Yi

Re: Convert flink table with field of type RAW to datastream

2020-06-17 Thread YI
l the intermediate data types. Best, Yi ‐‐‐ Original Message ‐‐‐ On Thursday, June 18, 2020 12:11 PM, Jark Wu wrote: > Hi YI, > > Flink doesn't have a TypeInformation for `java.util.Date`, but only > SqlTimeTypeInfo.DATE for `java.sql.Date`. > That's why the Typ

Convert flink table with field of type RAW to datastream

2020-06-17 Thread YI
http://mail-archives.apache.org/mod_mbox/flink-user/201907.mbox/%3CCA+3UsY2-L1OKTjNBwX2ajG3o6v5M6QS=jbwyybemzlvdm5x...@mail.gmail.com%3E, Unfortunately, I didn't find a satisfatory solutions. Cheers, Yi

[Question] enable end2end Kafka exactly once processing

2020-03-01 Thread Jin Yi
Hi experts, My application is using Apache Beam and with Flink to be the runner. My source and sink are kafka topics, and I am using KafkaIO connector provided by Apache Beam to consume and publish. I am reading through Beam's java doc: https://beam.apache.org/releases/javadoc/2.16.0/org/apache/b

Re: Apache Beam Side input vs Flink Broadcast Stream

2020-02-28 Thread Jin Yi
m+API > > On Thu, Feb 27, 2020 at 6:46 AM Jin Yi wrote: > >> Hi All, >> >> there is a recent published article in the flink official website for >> running beam on top of flink >> https://flink.apache.org/ecosystem/2020/02/22/apache-beam-how-beam-runs-on-top-of

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

2020-02-26 Thread Jin Yi
Hi Yang, regarding your statement below: Since you are starting JM/TM with K8s deployment, when they failed new JM/TM will be created. If you do not set the high availability configuration, your jobs could recover when TM failed. However, they could not recover when JM failed. With HA configured,

Apache Beam Side input vs Flink Broadcast Stream

2020-02-26 Thread Jin Yi
Hi All, there is a recent published article in the flink official website for running beam on top of flink https://flink.apache.org/ecosystem/2020/02/22/apache-beam-how-beam-runs-on-top-of-flink.html In the article: - You get additional features like side inputs and cross-language pipeline

Re: How does Flink manage the kafka offset

2020-02-20 Thread Jin Yi
this consume group has joined, and it will >> rebalance the partition consumption if needed. > > > No. Flink does not rebalance the partitions when new task managers joined > cluster. It only did so when job restarts and job parallelism changes. > > Hope it helps. > >

How does Flink manage the kafka offset

2020-02-20 Thread Jin Yi
Hi there, We are running apache beam application with flink being the runner. We use the KafkaIO connector to read from topics: https://beam.apache.org/releases/javadoc/2.19.0/ and we have the following configuration, which enables auto commit of offset, no checkpointing is enabled, and it is pe

Re: Parallelize Kafka Deserialization of a single partition?

2020-02-18 Thread Jin Yi
Hi Till, I just read your comment: Currently, enabling object reuse via ExecutionConfig.enableObjectReuse() only affects the DataSet API. DataStream programs will always do defensive copies. There is a FLIP to improve this behaviour [1]. I have an application that is written in apache beam, but th

[Quesetion] how to havee additional Logging in Apache Beam KafkaWriter

2020-02-18 Thread Jin Yi
Hi there, I am using Apache Beam (v2.16) in my application, and the Runner is Flink(1.8). I use KafkaIO connector to consume from source topics and publish to sink topics. Here is the class that Apache Beam provides for publishing messages. https://github.com/apache/beam/blob/master/sdks/java/io/

Re: [State Processor API] how to convert savepoint back to broadcast state

2020-01-27 Thread Jin Yi
Hi Yun, Thanks for the suggestion! Best Eleanore On Mon, Jan 27, 2020 at 1:54 AM Yun Tang wrote: > Hi Yi > > Glad to know you have already resolved it. State process API would use > data stream API instead of data set API in the future [1]. > > Besides, you could also follow

Re: [State Processor API] how to convert savepoint back to broadcast state

2020-01-26 Thread Jin Yi
: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#checkpointedfunction Thanks a lot for your help! Eleanore On Sun, Jan 26, 2020 at 6:53 PM Jin Yi wrote: > Hi Yun, > > Thanks for the response, I have checked official document, and I have > referred thi

Re: [State Processor API] how to convert savepoint back to broadcast state

2020-01-26 Thread Jin Yi
broadcast state in the BroadcastProcessFunction. Thanks a lot! Eleanore On Sun, Jan 26, 2020 at 8:53 AM Yun Tang wrote: > Hi Yi > > Can the official doc of writing broad cast state [1] satisfies your > request? > > [1] > https://ci.apache.org/projects/flink/flink

[State Processor API] how to convert savepoint back to broadcast state

2020-01-22 Thread Jin Yi
Hi there, I would like to read the savepoints (for broadcast state) back into the broadcast state, how should I do it? // load the existingSavepoint; ExistingSavepoint existingSavepoint = Savepoint.load(environment, "file:///tmp/new_savepoints", new MemoryStateBackend()); // read state from exis

Re: Question regarding checkpoint/savepoint and State Processor API

2020-01-21 Thread Jin Yi
t and stop of your job. It is just to bootstrap the > initial state of your application. After that, you will use savepoints to > carry over the current state of your applications between runs. > > > > On Mon, Jan 20, 2020 at 6:07 PM Jin Yi wrote: > >> Hi there, >> &g

Question regarding checkpoint/savepoint and State Processor API

2020-01-20 Thread Jin Yi
Hi there, 1. in my job, I have a broadcast stream, initially there is no savepoint can be used as bootstrap values for the broadcast stream states. BootstrapTransformation transform = OperatorTransformation.bootstrapWith(dataSet).transform(bootstrapFunction); Savepoint.create(new MemoryStateBac

Re: Filter with large key set

2020-01-20 Thread Jin Yi
eInformation, > serializer, and comparator yourself. The Either classes should give you > good guidance for that. > 2) have different operators and flows for each basic data type. This will > fan out your job, but should be the easier approach. > > Best, Fabian > > > > Am D

Filter with large key set

2020-01-15 Thread Jin Yi
Hi there, I have the following usecase: a key set say [A,B,C,] with around 10M entries, the type of the entries can be one of the types in BasicTypeInfo, e.g. String, Long, Integer etc... and each message looks like below: message: { header: A body: {} } I would like to use Flink to fi

Re: Maximal watermark when two streams are connected

2019-08-22 Thread Sung Gon Yi
t;. > One way to solve this problem is you should call > ".assignTimestampsAndWatermarks()" before the condition to make sure there > are messages. > > Best, > Jark > > On Thu, 22 Aug 2019 at 13:52, Sung Gon Yi <mailto:skonmem...@mac.com>> wrote: &g

Maximal watermark when two streams are connected

2019-08-21 Thread Sung Gon Yi
Hello, Originally, watermark of connected stream is set by minimum of watermarks two streams when two streams are connected. I wrote a code to connect two streams but one of streams does not have any message by a condition. In this situation, watermark is never increased and processing is stuck.

Parallelism issue

2019-07-19 Thread Sung Gon Yi
Hello. I wrote below codes. It works extraordinarily. Processing performs after SourceFunction generates all data and quit. If SourceFunction works infinitely, processing is never performed. But, it works well when parallelismForTimestamp is given other value (eg. 3), I want to know the mecha

Re: Checkpointing & File stream with

2019-06-18 Thread Sung Gon Yi
/projects/flink/flink-docs-release-1.8/dev/datastream_api.html#data-sources> > > Best > Yun Tang > From: Sung Gon Yi > Sent: Tuesday, June 18, 2019 14:13 > To: user@flink.apache.org > Subject: Checkpointing & File stream with > > Hello, > > I work on joining t

Checkpointing & File stream with

2019-06-17 Thread Sung Gon Yi
Hello, I work on joining two streams, one is from Kafka and another is from a file (small size). Stream processing works well, but checkpointing is failed with following message. The file only has less than 100 lines and the pipeline related file reading is finished with “FINISHED’ o as soon as

Re: POJO with private fields and toApeendStream of StreamTableEnvironment

2019-04-29 Thread Sung Gon Yi
Regards, > Timo > > > Am 29.04.19 um 15:44 schrieb Sung Gon Yi: >> In >> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/common.html#convert-a-table-into-a-datastream-or-dataset >> >> <https://ci.apache.org/projects/flink/flink-docs-r

Re: POJO with private fields and toApeendStream of StreamTableEnvironment

2019-04-29 Thread Sung Gon Yi
lection. > > Regards, > Timo > > > Am 29.04.19 um 15:44 schrieb Sung Gon Yi: >> In >> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/common.html#convert-a-table-into-a-datastream-or-dataset >> >> <https://ci.apache.org/projects/flink

POJO with private fields and toApeendStream of StreamTableEnvironment

2019-04-29 Thread Sung Gon Yi
In https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/common.html#convert-a-table-into-a-datastream-or-dataset , POJO data type is available to conve

Re: entrypoint for executing job in task manager

2018-05-18 Thread Jimmy (Yi Pin) Cao
Well this thread was super helpful. I'm also looking for ways to integrate DI In particular something like the JVM init hooks for TM and JM would be nice. On Wed, Mar 21, 2018 at 5:08 PM, Steven Wu wrote: > Thanks, let me clarify the requirement. Sorry that it wasn't clear in the > original em