Re: Concatenating a bounded and unbounded stream

2022-10-26 Thread Jin Yi
would using a hybrid source work for you if it's the same type between the sources? https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/hybridsource/ On Wed, Oct 26, 2022 at 8:05 AM Noel OConnor wrote: > Hi, > I have a need to create a new stream from a bounded and un

Re: Weird Flink Kafka source watermark behavior

2022-04-13 Thread Jin Yi
t; Qingsheng > > > On Apr 13, 2022, at 17:52, Qingsheng Ren wrote: > > > > Hi Jin, > > > > Unfortunately I don’t have any quick bypass in mind except increasing > the tolerance of out of orderness. > > > > Best regards, > > > > Qingshe

Re: Weird Flink Kafka source watermark behavior

2022-04-08 Thread Jin Yi
confirmed that moving back to FlinkKafkaConsumer fixes things. is there some notification channel/medium that highlights critical bugs/issues on the intended features like this pretty readily? On Fri, Apr 8, 2022 at 2:18 AM Jin Yi wrote: > based on symptoms/observations on the first opera

Re: Weird Flink Kafka source watermark behavior

2022-04-08 Thread Jin Yi
rst > split pushes the watermark far away, then records from other splits will be > treated as late events. > > [1] https://issues.apache.org/jira/browse/FLINK-26018 > > Best regards, > > Qingsheng > > > > On Apr 8, 2022, at 15:54, Jin Yi wrote: > > > &g

Re: flink docker image (1.14.4) unable to access other pods from flink program (job and task manager access is fine)

2022-03-30 Thread Jin Yi
onous. thanks for trying to help, 胡伟华. On Tue, Mar 29, 2022 at 7:45 AM 胡伟华 wrote: > Are you referring to creating Flink cluster on Kubernetes by yaml file? > > How did you submit the job to Flink cluster? Not via the command line > (flink run xxx)? > > 2022年3月29日 下午10:38,Jin

Re: flink docker image (1.14.4) unable to access other pods from flink program (job and task manager access is fine)

2022-03-29 Thread Jin Yi
022年3月29日 下午10:32,Jin Yi 写道: > > it's running in k8s. we're not running in app mode b/c we have many jobs > running in the same flink cluster. > > On Tue, Mar 29, 2022 at 4:29 AM huweihua wrote: > >> Hi, Jin >> >> Can you provide more information

Re: flink docker image (1.14.4) unable to access other pods from flink program (job and task manager access is fine)

2022-03-29 Thread Jin Yi
ARN or standalone mode? > Maybe you can use application mode to keeps the environment (network > accessibility) always keep same. Application mode will run the user-main > method in the JobManager, > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/over

flink docker image (1.14.4) unable to access other pods from flink program (job and task manager access is fine)

2022-03-28 Thread Jin Yi
i have a flink job that uses redis as a sink. i optionally do some wiping and metadata writing from the job submitting flink program before it actually executes/submits the job to the job manager. when i don't do this redis preparation, the redis sink works completely fine. that is, the redis co

streaming mode with both finite and infinite input sources

2022-02-24 Thread Jin Yi
so we have a streaming job where the main work to be done is processing infinite kafka sources. recently, i added a fromCollection (finite) source to simply write some state once upon startup. this all seems to work fine. the finite source operators all finish, while all the infinite source oper

class Foo extends TupleN {}

2022-01-12 Thread Jin Yi
probably a dumb quesiton, but will the serde performance in flink for the class Foo (from the subject) behave like a POJO or a TupleN?

Re: REST API for detached minicluster based integration test

2021-12-01 Thread Jin Yi
to fetch watermarks from the metrics. If > processWatermark is never called then it means no watermark ever comes > and you might want to check your watermark strategy implementation. > > Jin Yi 于2021年12月1日周三 上午4:14写道: > >> thanks for the reply caizhi! >> >> we're

Re: REST API for detached minicluster based integration test

2021-11-30 Thread Jin Yi
gt; I'm also curious about why do you need to extract the output watermarks > just for stopping the source. You can control the records and the watermark > strategy from the source. From my point of view, constructing some test > data with some specific row time would be enough. >

Re: REST API for detached minicluster based integration test

2021-11-29 Thread Jin Yi
bump. a more general question is what do people do for more end to end, full integration tests to test event time based jobs with timers? On Tue, Nov 23, 2021 at 11:26 AM Jin Yi wrote: > i am writing an integration test where i execute a streaming flink job > using faked, "unbou

REST API for detached minicluster based integration test

2021-11-23 Thread Jin Yi
i am writing an integration test where i execute a streaming flink job using faked, "unbounded" input where i want to control when the source function(s) complete by triggering them once the job's operator's maximum output watermarks are beyond some job completion watermark that's relative to the m

Re: redis sink from flink

2021-08-17 Thread Jin Yi
>> inside. Regarding Redis clients, Jedis (https://github.com/redis/jedis) >> is quite popular and simple to get started. >> >> Let me know if you love to learn more details about our implementation. >> >> Best, >> Yik San >> >> On Tue,

redis sink from flink

2021-08-16 Thread Jin Yi
is apache bahir still a thing? it hasn't been touched for months (since redis 2.8.5). as such, looking at the current flink connector docs, it's no longer pointing to anything from the bahir project. looking around in either the flink or bahir newsgroups doesn't turn up anything regarding bahir'

is there a way to get the watermark per operator in tabular form?

2021-06-15 Thread Jin Yi
in the flink ui, is there a way to update the columns being shown to include the watermarks? in lieu of this, is it possible to query the watermarks throughout a flink job somehow? the rest api? thanks.

should i expect POJO serialization warnings when dealing w/ kryo protobuf serialization?

2021-06-12 Thread Jin Yi
i'm currently using protobufs, and registering the serializers using kryo protobuf using the following snippet of code: static void optionalRegisterProtobufSerializer(ExecutionConfig config, Class clazz) { if (clazz != null) { config.registerTypeWithKryoSerializer(clazz,

Re: behavior differences when sinking side outputs to a custom KeyedCoProcessFunction

2021-06-03 Thread Jin Yi
ency to do. fixing the local watermark generator used in the strategy to account for this properly fixed all of my issues. On Fri, May 21, 2021 at 10:09 AM Jin Yi wrote: > (sorry that the last sentence fragment made it into my email... it was a > draft comment that i forgot to remove. my t

Re: behavior differences when sinking side outputs to a custom KeyedCoProcessFunction

2021-05-21 Thread Jin Yi
uld be the incorrect behavior? should i file an issue/bug? On Thu, May 20, 2021 at 3:39 PM Jin Yi wrote: > hello, > > sorry for a long post, but this is a puzzling problem and i am enough of a > flink non-expert to be unsure what details are important or not. > > background: >

behavior differences when sinking side outputs to a custom KeyedCoProcessFunction

2021-05-20 Thread Jin Yi
hello, sorry for a long post, but this is a puzzling problem and i am enough of a flink non-expert to be unsure what details are important or not. background: i have a flink pipeline that is a series of custom "joins" for the purposes of user event "flattening" that i wrote a custom KeyedCoProces

Re: two questions about flink stream processing: kafka sources and TimerService

2021-05-19 Thread Jin Yi
could be to implement this yourself using a MapState. Otherwise I think you > need to implement your own operator from which you can then access > InternalTimerService similar to how KeyedCoProcessOperator does it as well. > > > Regards > Ingo > > On Wed, May 12, 2021 a

Re: two questions about flink stream processing: kafka sources and TimerService

2021-05-17 Thread Jin Yi
ping? On Tue, May 11, 2021 at 11:31 PM Jin Yi wrote: > hello. thanks ahead of time for anyone who answers. > > 1. verifying my understanding: for a kafka source that's partitioned on > the same piece of data that is later used in a keyBy, if we are relying on > the ka

two questions about flink stream processing: kafka sources and TimerService

2021-05-11 Thread Jin Yi
hello. thanks ahead of time for anyone who answers. 1. verifying my understanding: for a kafka source that's partitioned on the same piece of data that is later used in a keyBy, if we are relying on the kafka timestamp as the event timestamp, is it guaranteed that the event stream of the source

flink cep and out of order events

2021-04-24 Thread Jin Yi
does the default within behavior of flink cep handle out of order events (relative to event time)? obviously, it'd be best if the event time was guaranteed correct, but sometimes it's too difficult to do. do people end up writing different patterns with some event orderings reversed to capture th

parquet protobuf output and aws athena support

2021-03-15 Thread Jin Yi
using ParquetProtoWriters , does anyone have this working with aws athena ingestion via aws glue crawls? the parquet files being generated by our flink job looks fine

mixing java libraries between 1.12.x and 1.11.x

2021-03-10 Thread Jin Yi
(also updated https://issues.apache.org/jira/browse/FLINK-19955 w/ this question) i'm in the situation where i want to use ParquetProtoWriters found in flink-parquet 1.12.x. the rest of our codebase, anticipating possibly running on the fully-managed aws flink solution for production, is dependin

[Question] enable end2end Kafka exactly once processing

2020-03-01 Thread Jin Yi
Hi experts, My application is using Apache Beam and with Flink to be the runner. My source and sink are kafka topics, and I am using KafkaIO connector provided by Apache Beam to consume and publish. I am reading through Beam's java doc: https://beam.apache.org/releases/javadoc/2.16.0/org/apache/b

Re: Apache Beam Side input vs Flink Broadcast Stream

2020-02-28 Thread Jin Yi
m+API > > On Thu, Feb 27, 2020 at 6:46 AM Jin Yi wrote: > >> Hi All, >> >> there is a recent published article in the flink official website for >> running beam on top of flink >> https://flink.apache.org/ecosystem/2020/02/22/apache-beam-how-beam-runs-on-top-of

Re: Flink on Kubernetes - Session vs Job cluster mode and storage

2020-02-26 Thread Jin Yi
Hi Yang, regarding your statement below: Since you are starting JM/TM with K8s deployment, when they failed new JM/TM will be created. If you do not set the high availability configuration, your jobs could recover when TM failed. However, they could not recover when JM failed. With HA configured,

Apache Beam Side input vs Flink Broadcast Stream

2020-02-26 Thread Jin Yi
Hi All, there is a recent published article in the flink official website for running beam on top of flink https://flink.apache.org/ecosystem/2020/02/22/apache-beam-how-beam-runs-on-top-of-flink.html In the article: - You get additional features like side inputs and cross-language pipeline

Re: How does Flink manage the kafka offset

2020-02-20 Thread Jin Yi
this consume group has joined, and it will >> rebalance the partition consumption if needed. > > > No. Flink does not rebalance the partitions when new task managers joined > cluster. It only did so when job restarts and job parallelism changes. > > Hope it helps. > >

How does Flink manage the kafka offset

2020-02-20 Thread Jin Yi
Hi there, We are running apache beam application with flink being the runner. We use the KafkaIO connector to read from topics: https://beam.apache.org/releases/javadoc/2.19.0/ and we have the following configuration, which enables auto commit of offset, no checkpointing is enabled, and it is pe

Re: Parallelize Kafka Deserialization of a single partition?

2020-02-18 Thread Jin Yi
Hi Till, I just read your comment: Currently, enabling object reuse via ExecutionConfig.enableObjectReuse() only affects the DataSet API. DataStream programs will always do defensive copies. There is a FLIP to improve this behaviour [1]. I have an application that is written in apache beam, but th

[Quesetion] how to havee additional Logging in Apache Beam KafkaWriter

2020-02-18 Thread Jin Yi
Hi there, I am using Apache Beam (v2.16) in my application, and the Runner is Flink(1.8). I use KafkaIO connector to consume from source topics and publish to sink topics. Here is the class that Apache Beam provides for publishing messages. https://github.com/apache/beam/blob/master/sdks/java/io/

Re: [State Processor API] how to convert savepoint back to broadcast state

2020-01-27 Thread Jin Yi
.html#why-dataset-api > [2] > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html#provided-apis > [3] > https://github.com/apache/flink/blob/53f956fb57dd5601d2e3ca9289207d21796cdc4d/flink-streaming-java/src/main/java/org/apache/flink/streaming/api

Re: [State Processor API] how to convert savepoint back to broadcast state

2020-01-26 Thread Jin Yi
: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#checkpointedfunction Thanks a lot for your help! Eleanore On Sun, Jan 26, 2020 at 6:53 PM Jin Yi wrote: > Hi Yun, > > Thanks for the response, I have checked official document, and I have > referred thi

Re: [State Processor API] how to convert savepoint back to broadcast state

2020-01-26 Thread Jin Yi
-docs-stable/dev/libs/state_processor_api.html#broadcast-state-1 > > Best > Yun Tang > ------ > *From:* Jin Yi > *Sent:* Thursday, January 23, 2020 8:12 > *To:* user ; user...@flink.apache.org < > user...@flink.apache.org> > *Subject:* [State Processor A

[State Processor API] how to convert savepoint back to broadcast state

2020-01-22 Thread Jin Yi
Hi there, I would like to read the savepoints (for broadcast state) back into the broadcast state, how should I do it? // load the existingSavepoint; ExistingSavepoint existingSavepoint = Savepoint.load(environment, "file:///tmp/new_savepoints", new MemoryStateBackend()); // read state from exis

Re: Question regarding checkpoint/savepoint and State Processor API

2020-01-21 Thread Jin Yi
t and stop of your job. It is just to bootstrap the > initial state of your application. After that, you will use savepoints to > carry over the current state of your applications between runs. > > > > On Mon, Jan 20, 2020 at 6:07 PM Jin Yi wrote: > >> Hi there, >> &g

Question regarding checkpoint/savepoint and State Processor API

2020-01-20 Thread Jin Yi
Hi there, 1. in my job, I have a broadcast stream, initially there is no savepoint can be used as bootstrap values for the broadcast stream states. BootstrapTransformation transform = OperatorTransformation.bootstrapWith(dataSet).transform(bootstrapFunction); Savepoint.create(new MemoryStateBac

Re: Filter with large key set

2020-01-20 Thread Jin Yi
eInformation, > serializer, and comparator yourself. The Either classes should give you > good guidance for that. > 2) have different operators and flows for each basic data type. This will > fan out your job, but should be the easier approach. > > Best, Fabian > > > > Am D

Filter with large key set

2020-01-15 Thread Jin Yi
Hi there, I have the following usecase: a key set say [A,B,C,] with around 10M entries, the type of the entries can be one of the types in BasicTypeInfo, e.g. String, Long, Integer etc... and each message looks like below: message: { header: A body: {} } I would like to use Flink to fi