Re: How to enable RocksDB native metrics?

2024-04-07 Thread Marco Villalobos
Hi Lei, Have you tried enabling these Flink configuration properties?Configurationnightlies.apache.orgSent from my iPhoneOn Apr 7, 2024, at 6:03 PM, Lei Wang wrote:I  want to enable it only for specified jobs, how can I specify the   configurations on  cmd line when submitting a job?Thanks,LeiOn

Re: Flink cache support

2024-03-28 Thread Marco Villalobos
Zhanghao is correct. You can use what is called "keyed state". It's like a cache. https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/datastream/fault-tolerance/state/ > On Mar 28, 2024, at 7:48 PM, Zhanghao Chen wrote: > > Hi, > > You can maintain a cache manually in your

Re: need flink support framework for dependency injection

2024-03-26 Thread Marco Villalobos
Hi Ganesh, I disagree. I don’t think Flink needs a dependency injection framework. I have implemented many complex jobs without one. Can you please articulate why you think it needs a dependency injection framework, along with some use cases that will show its benefit? I would rather see more

Re: Request for sample codes for Dockerizing Java application

2024-02-13 Thread Marco Villalobos
Hi Nida, I request that you read https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/ in order to learn how to Dockerize your Flink job. You're Welcome & Regard Marco A. Villalobos > On Feb 13, 2024, at 12:00 AM, Fidea Lidea wr

Re: Request to provide sample codes on Data stream using flink, Spring Boot & kafka

2024-02-07 Thread Marco Villalobos
Hi Nida, You can find sample code for using Kafka here: https://kafka.apache.org/documentation/ You can find sample code for using Flink here: https://nightlies.apache.org/flink/flink-docs-stable/ You can find sample code for using Flink with Kafka here: https://nightlies.apache.org/flink/fl

Re: [DISCUSS] Status of Statefun Project

2023-06-06 Thread Marco Villalobos
rves well as a bridge between a Flink Streaming job and >>>>> micro-services. >>>> >>>> This is essentially how I use it as well, and I would also be sad to see >>>> it sunsetted. It works well; I don't know that there is a lot of new >

Re: A WordCount job using DataStream API but behave like the batch WordCount example

2023-04-29 Thread Marco Villalobos
ame result. 3. You can also run the stream in batch mode. Remember, a stream does not end (unless it is run in batch mode). > On Apr 29, 2023, at 9:11 AM, Marco Villalobos > wrote: > > Hi Luke, > > A batch has a beginning and an end. Although a stream has a beginning,

Re: [DISCUSS] Status of Statefun Project

2023-04-17 Thread Marco Villalobos
I am currently using Stateful Functions in my application. I use Apache Flink for stream processing, and StateFun as a hand-off point for the rest of the application. It serves well as a bridge between a Flink Streaming job and micro-services. I would be disappointed if StateFun was sunsetted.

OOM taskmanager

2023-01-25 Thread marco andreas
Hello, We are deploying a flink application cluster in kubernetes, 2 pods one for the JM and the other for the TM. The problem is when we launch load tests we see that task manager memory usage increases, after the tests are finished and flink stop processing data the memory usage never comes d

Activate Flink HA without checkpoints on k8S

2022-10-13 Thread marco andreas
Hello, Can someone explain to me what is the point of using HA when deploying an application cluster with a single JM and the checkpoints are not activated. AFAK when the pod of the JM goes down kubernetes will restart it anyway so we don't need to activate the HA in this case. Maybe there's som

New flink release date

2022-10-09 Thread marco andreas
Hello, Does anyone know when the 1.16.0 version will be released please? Regards.

Re: Is it practicle to enrich a Flink DataStream in middle operator with Flink Stateful Functions?

2022-09-28 Thread Marco Villalobos
Did this list receive my email? I’m only asking because my last few questions have gone unanswered and maybe the list server is blocking me. Anybody, please let me know. > On Sep 26, 2022, at 8:41 PM, Marco Villalobos > wrote: > > I indeed see the value of Flink Statef

Is it practicle to enrich a Flink DataStream in middle operator with Flink Stateful Functions?

2022-09-26 Thread Marco Villalobos
k you. Marco A. Villalobos

Re: Enrichment of stream from another stream

2022-09-17 Thread Marco Villalobos
I might need more details, but conceptually, streams can be thought of as never ending tables and our code as functions applied to them. JOIN is a concept supported in the SQL API and DataStream API. However, the SQL API is more succinct (unlike my writing ;). So, how about the "fast stream" ma

Re: Deploying Jobmanager on k8s as a Deployment

2022-09-04 Thread marco andreas
écrit : > Hi! > You should check out the Flink Kubernetes Operator. I think that covers > all your needs . > > https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/ > > Cheers, > Gyula > > On Sat, 3 Sep 2022 at 13:45, marco andreas > wrote: > &

Deploying Jobmanager on k8s as a Deployment

2022-09-03 Thread marco andreas
We are deploying a flink application cluster on k8S. Following the official documentation the JM is deployed As a job resource , however we are deploying a long running flink job that is not supposed to be terminated and also we need to update the image of the flink job. The problem is that the j

Re: What is the recommended solution for this error of too many files open during a checkpoint?

2022-09-03 Thread Marco Villalobos
Thus, Flink 1.12.2 was using the version of RocksDB with a known bug. > On Sep 2, 2022, at 10:49 AM, Marco Villalobos > wrote: > > What is the recommended solution for this error of too many files open during > a checkpoint? > > 2022-09-02 10:04:56 java.io.IOExcepti

What is the recommended solution for this error of too many files open during a checkpoint?

2022-09-02 Thread Marco Villalobos
What is the recommended solution for this error of too many files open during a checkpoint? 2022-09-02 10:04:56 java.io.IOException: Could not perform checkpoint 119366 for operator tag enrichment (3/4)#104. at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(Strea

Re: Is this a Batch SQL Bug?

2022-08-18 Thread Marco Villalobos
sInputFormat is reused, but > DeserializationSchemaAdapter#Reader only do shallow copy of the produced > data, so that the finnal result will always be the last row value. > > Could you please help create a jira to track it? > > Best regards, > Yuxia > > - 原始邮件 -

Is this a Batch SQL Bug?

2022-08-17 Thread Marco Villalobos
hat the transient field: private transient RecordCollector collector; in org.apache.flink.connector.file.table.DeserializationSchemaAdapter.LineBytesInputFormat becomes empty on each iteration, as though it failed to serialize correctly. Regardless, I don't know what's wrong. Any advice would deeply help. Marco A. Villalobos

without DISTINCT unique lines show up many times in FLINK SQL

2022-08-16 Thread Marco Villalobos
Hello everybody, When I perform this simple set of queries, a unique line from the source file shows up many times. I have verified many times that a unique line in the source shows up as much as 100 times in the select statement. Is this the correct behavior for Flink 1.15.1? FYI, it does s

Flink SQL and tumble window by size (number of rows)

2022-08-02 Thread Marco Villalobos
Is it possible in Flink SQL to tumble a window by row size instead of time? Let's say that I want a window for every 1 rows for example using the Flink SQL API. is that possible? I can't find any documentation on how to do that, and I don't know if it is supported.

how does a slow FaaS affect the Flink StateFun cluster?

2022-05-25 Thread Marco Villalobos
If the performance of a stateful function (FaaS) is very slow, how does this impact performance on the Flink StateFun Cluster? I am trying to figure out what is too slow for a FaaS. I expect the Flink StateFun Cluster to receive about 2000 events per a minute, but some, not all FaaS might take

Re: Supporting different content-types such as String or JSON seems to always fail for me in Stateful Functions.

2022-04-16 Thread Marco Villalobos
You're right. I didn't notice that the ports were different. That was very subtle. Thank you for pointing this out to me. I was stuck on it for quite a while. > On Apr 16, 2022, at 6:17 PM, Marco Villalobos > wrote: > > I'm sorry, I accidentally hit send before

Re: Supporting different content-types such as String or JSON seems to always fail for me in Stateful Functions.

2022-04-16 Thread Marco Villalobos
16, 2022, at 6:12 PM, Marco Villalobos > wrote: > > IK > > If what you're saying is true, then why do most of the examples in the > flink-statefun-playground example use HTTP as an alternative entry point? > > Here is the greeter example: > > https://githu

Re: Supporting different content-types such as String or JSON seems to always fail for me in Stateful Functions.

2022-04-16 Thread Marco Villalobos
he/flink-statefun-playground/tree/main/java/greeter> > On Apr 16, 2022, at 6:06 PM, Tymur Yarosh wrote: > > Hi Marco, > The problem is that you’re trying to call the function directly via HTTP. > This is not how it's supposed to work. Instead, check out how to define y

Supporting different content-types such as String or JSON seems to always fail for me in Stateful Functions.

2022-04-13 Thread Marco Villalobos
I'm trying to write very simple echo app with Stateful Function to prove it as a technology for some of my use cases. I have not been able to accept different content types though. Here is an example of my code for a simple echo function: My Echo stateful function class. package statefun.impl

JobManager failed to renew it's leadership (K8S HA)

2022-03-27 Thread marco andreas
Hello, Does anyone have the same issue or have an idea why the jobmanager fails to renew its leadership when using kubernetes ha service. Configuration : kubernetes.namespace: flink-ps-flink-dev high-availability.kubernetes.leader-election.lease-duration: 200 s high-availability.kubernetes.leader

Kubernetes HA on an application cluster

2022-03-21 Thread marco andreas
Hello everyone, I am deploying a flink application cluster using k8S HA . I notice this message in the log @timestamp":"2022-03-21T17:11:39.436+01:00","@version":"1","message":"Renew deadline reached after 200 seconds while renewing lock ConfigMapLock: flink-pushavoo-flink-rec - elifibre-000

Re: Trouble sinking to Kafka

2022-02-23 Thread Marco Villalobos
eid...@ververica.com> wrote: > Hi Marco, > > I'm no expert on the Kafka producer, but I will try to help. [1] seems to > have a decent explanation of possible error causes for the error you > encountered. > Which leads me to two questions: >

Trouble sinking to Kafka

2022-02-22 Thread Marco Villalobos
I keep on receiving this exception during the execution of a simple job that receives time series data via Kafka, transforms it into avro format, and then sends into a Kafka topic consumed by druid. Any advise would be appreciated as to how to resolve this type of error. I'm using Apache Kafka

Cannot upgrade helm chart

2022-02-21 Thread marco andreas
Hello flink community, I am deploying a flink application cluster using a helm chart , the problem is that the jobmanager component type is a "Job" , and with helm i can't do an upgrade of the chart in order to change the application image version because helm is unable to upgrade the docker image

Flink 1.12.1 and KafkaSource

2022-02-02 Thread Marco Villalobos
KafkaSource and KafkaSourceBuilder? Should I revert to using FlinkKafkaSource? Any advice or insight would be very helpful. Thank you. Marco A. Villalobos

Re: Is it possible to support many different windows for many different keys?

2022-01-26 Thread Marco Villalobos
solution. Again, thank you for your input. > On Jan 26, 2022, at 1:32 PM, Alexander Fedulov > wrote: > >  > Hi Marco, > > Not sure if I get your problem correctly, but you can process those windows > on data "split" from the same input within the same Flink

Is it possible to support many different windows for many different keys?

2022-01-26 Thread Marco Villalobos
, if there are tens of thousands of time series names / windows? Any help or advice would be appreciated. Thank you. Marco A. Villalobos

Re: How do you configure setCommitOffsetsOnCheckpoints in Flink 1.12 when using KafkaSourceBuilder?

2021-12-01 Thread Marco Villalobos
Thank you. One last question. What is an RP? Where can I read it? Marco > On Nov 30, 2021, at 11:06 PM, Hang Ruan wrote: > > Hi, > > In 1.12.0-1.12.5 versions, committing offset to kafka when the checkpoint is > open is the default behavior in KafkaSourceBuilder. And it

Re: How do you configure setCommitOffsetsOnCheckpoints in Flink 1.12 when using KafkaSourceBuilder?

2021-11-30 Thread Marco Villalobos
ther additional properties could be found here : > https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/kafka/#additional-properties > > Marco Villalobos 于2021年11月30日周二 上午11:08写道: > >> Thank you for the information. That still does not answer my

Re: How do you configure setCommitOffsetsOnCheckpoints in Flink 1.12 when using KafkaSourceBuilder?

2021-11-29 Thread Marco Villalobos
that I am using KafkaSourceBuilder, how do I configure that behavior so that offsets get committed on checkpoints? Or is that the default behavior with checkpoints? -Marco On Mon, Nov 29, 2021 at 5:58 PM Caizhi Weng wrote: > Hi! > > Flink 1.14 release note states about this. See [1

How do you configure setCommitOffsetsOnCheckpoints in Flink 1.12 when using KafkaSourceBuilder?

2021-11-29 Thread Marco Villalobos
kpoint. > The interval of drawing checkpoints therefore defines how much the program > may have to go back at most, in case of a failure. To use fault tolerant > Kafka Consumers, checkpointing of the topology needs to be enabled in the > job. > If checkpointing is disabled, the Kafka consumer will periodically commit > the offsets to Zookeeper. Thank you. Marco

How do I configure commit offsets on checkpoint in new KafkaSource api?

2021-11-19 Thread Marco Villalobos
The FlinkKafkaConsumer that will be deprecated has the method "setCommitOffsetsOnCheckpoints(boolan)" method. However, that functionality is not the new KafkaSource class. How is this behavior / functionality configured in the new API? -Marco A. Villalobos

Re: to join or not to join, that is the question...

2021-11-05 Thread Marco Villalobos
Dario Heinisch wrote: > > Union creates a new stream containing all elements of the unioned > > streams: > > > https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/operators/overview/#union > > > > > > On 05.11.21 14:25, Marco Villalobo

to join or not to join, that is the question...

2021-11-05 Thread Marco Villalobos
Can two different streams flow to the same operator (an operator with the same name, uid, and implementation) and then share keyed state or will that require joining the streams first?

Re: Flink on kubernetes HA ::Renew deadline reached

2021-10-25 Thread marco
n either > try to fix the API server accessibility or increase the leadership renew > timeout (high-availability.kubernetes.leader-election.renew-deadline). > > Thank you~ > > Xintong Song > > > > On Mon, Oct 25, 2021 at 5:08 PM marco wrote: >

Re: Flink on kubernetes HA ::Renew deadline reached

2021-10-25 Thread marco
Any suggestions would be appreciated. On 2021/10/20 16:18:39, marco wrote: > > > Hello flink community:: > > I am deploying flink application cluster standalone mode on kubernetes, but i > am facing some problems > > the job starts normally and it continues to

Re: Problem with Flink job and Kafka.

2021-10-20 Thread Marco Villalobos
appreciated. -Marco On Mon, Oct 18, 2021 at 8:03 PM Qingsheng Ren wrote: > Hi Marco, > > Sorry I forgot to cc the user mailing list just now. > > From the exception message it looks like a versioning issue. Could you > provide some additional information, such as Flink & Kaf

Flink on kubernetes HA ::Renew deadline reached

2021-10-20 Thread marco
Hello flink community:: I am deploying flink application cluster standalone mode on kubernetes, but i am facing some problems the job starts normally and it continues to run but at some point in time it crushes and gets restarted. Does anyone facing the same problem or know how to resolve

Problem with Flink job and Kafka.

2021-10-18 Thread Marco Villalobos
I have the simplest Flink job that simply deques off of a kafka topic and writes to another kafka topic, but with headers, and manually copying the event time into the kafka sink. It works as intended, but sometimes I am getting this error: org.apache.kafka.common.errors.UnsupportedVersionExcepti

How do I verify data is written to a JDBC sink?

2021-09-26 Thread Marco Villalobos
In my Flink Job, I am using event time to process time-series data. Due to our business requirements, I need to verify that a specific subset of data written to a JDBC sink has been written before I send an activemq message to another component. My job flows like this: 1. Kafka Source 2. Split s

could not stop with a Savepoint.

2021-09-26 Thread Marco Villalobos
Today, I kept on receiving a timeout exception when stopping my job with a savepoint. This happened with Flink version 1.12.2 running in EMR. I had to use the deprecated cancel with savepoint feature instead. In fact, stopping with a savepoint, creating a savepoint, and cancelling with a savepoin

Re: stream processing savepoints and watermarks question

2021-09-24 Thread Marco Villalobos
never to use -d again for this situation. Again, thank you. -Marco On Thu, Sep 23, 2021 at 11:01 PM JING ZHANG wrote: > Hi Macro, > Do you specified drain flag when stop a job with a savepoint? > If the --drain flag is specified, then a MAX_WATERMARK will be emitted > before the las

stream processing savepoints and watermarks question

2021-09-23 Thread Marco Villalobos
Something strange happened today. When we tried to shutdown a job with a savepoint, the watermarks became equal to 2^63 - 1. This caused timers to fire indefinitely and crash downstream systems with overloaded untrue data. We are using event time processing with Kafka as our source. It seems imp

Re: What is the event time of an element produced in a timer?

2021-09-07 Thread Marco Villalobos
27;ll test my hypothesis later today. > On Sep 7, 2021, at 2:07 AM, JING ZHANG wrote: > > Hi Marco, > I'm not sure which API or SQL query do you use. > If you use Windowed Stream API in DataStream [1]. The input data would be > assigned to a Window based on which Window A

Re: What is the event time of an element produced in a timer?

2021-09-07 Thread Marco Villalobos
t triggers the timer firing. Must I create a special window assigner that accepts late data? On Tue, Sep 7, 2021 at 2:07 AM JING ZHANG wrote: > Hi Marco, > I'm not sure which API or SQL query do you use. > If you use Windowed Stream API in DataStream [1]. The input data would be >

What is the event time of an element produced in a timer?

2021-09-06 Thread Marco Villalobos
If an event time timer is registered to fire exactly every 15 minutes, starting from exactly at the top of the hour (exactly 00:00, 00:15, 00:30, 00:45 for example), and within that timer it produces an element in the stream, what event time will that element have, and what window will it belong to

aggregation, triggers, and no activity

2021-08-20 Thread Marco Villalobos
I use event time,with Kafka as my source. The system that I am developing requires data to be aggregated every 15 minutes, thus I am using a Tumbling Event Time window. However, my system also is required to take action every 15 minutes even if there is activity. I need the elements collected in t

State Processor API and existing state

2021-06-28 Thread Marco Villalobos
Let's say that a job has operators with UIDs: W, X, Y, and Z, and uses RocksDB as a backend with checkpoint data URI s3://checkpoints" Then I stop the job with a savepoint at s3://savepoint-1. I assume that all the data within the checkpoint are stored within the given Savepoint. Is that assumpti

Re: Please advise bootstrapping large state

2021-06-18 Thread Marco Villalobos
It was not clear to me that JdbcInputFormat was part of the DataSet api. Now I understand. Thank you. On Fri, Jun 18, 2021 at 5:23 AM Timo Walther wrote: > Hi Marco, > > as Robert already mentioned, the BatchTableEnvironment is simply build > on top of the DataSet API,

Re: Please advise bootstrapping large state

2021-06-17 Thread Marco Villalobos
bleEnvImpl.scala:555) at org.apache.flink.table.api.internal.BatchTableEnvImpl.translate(BatchTableEnvImpl.scala:537) at org.apache.flink.table.api.bridge.java.internal.BatchTableEnvironmentImpl.toDataSet(BatchTableEnvironmentImpl.scala:101) On Thu, Jun 17, 2021 at 12:51 AM Timo Walther wrote: > Hi Marco, > > which oper

Re: Please advise bootstrapping large state

2021-06-16 Thread Marco Villalobos
the DataSet API with the Table SQL api in Flink 1.12.1? On Wed, Jun 16, 2021 at 4:55 AM Robert Metzger wrote: > Hi Marco, > > The DataSet API will not run out of memory, as it spills to disk if the > data doesn't fit anymore. > Load is distributed by partitioning data. > &

Please advise bootstrapping large state

2021-06-15 Thread Marco Villalobos
I must bootstrap state from postgres (approximately 200 GB of data) and I notice that the state processor API requires the DataSet API in order to bootstrap state for the Stream API. I wish there was a way to use the SQL API and use a partitioned scan, but I don't know if that is even possible wit

Re: DataStream API in Batch Execution mode

2021-06-09 Thread Marco Villalobos
ening an issue for lacking the document? > > [1] > https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-files/src/test/java/org/apache/flink/connector/file/src/FileSourceTextLinesITCase.java > Best, > Guowei > > > On Tue, Jun 8, 2021 at 5:59 AM Marc

DataStream API in Batch Execution mode

2021-06-07 Thread Marco Villalobos
How do I use a hierarchical directory structure as a file source in S3 when using the DataStream API in Batch Execution mode? I have been trying to find out if the API supports that, because currently our data is organized by years, halves, quarters, months, and but before I launch the job, I flat

Re: Is it possible to use OperatorState, when NOT implementing a source or sink function?

2021-06-05 Thread Marco Villalobos
this operator, it only goes to one task manager node. I need state, but I don't really need it keyed. On Sat, Jun 5, 2021 at 4:56 AM Marco Villalobos wrote: > Does that work in the DataStream API in Batch Execution Mode? > > On Sat, Jun 5, 2021 at 12:04 AM JING ZHANG wrote: >

Re: Is it possible to use OperatorState, when NOT implementing a source or sink function?

2021-06-05 Thread Marco Villalobos
> Best regards, > JING ZHANG > > > Marco Villalobos 于2021年6月5日周六 下午1:55写道: > >> Is it possible to use OperatorState, when NOT implementing a source or >> sink function? >> >> If yes, then how? >> >

Is it possible to use OperatorState, when NOT implementing a source or sink function?

2021-06-04 Thread Marco Villalobos
Is it possible to use OperatorState, when NOT implementing a source or sink function? If yes, then how?

DataStream API in Batch Mode job is timing out, please advise on how to adjust the parameters.

2021-05-24 Thread Marco Villalobos
I am running with one job manager and three task managers. Each task manager is receiving at most 8 gb of data, but the job is timing out. What parameters must I adjust? Sink: back fill db sink) (15/32) (50626268d1f0d4c0833c5fa548863abd) switched from SCHEDULED to FAILED on [unassigned resource]

How do you debug a DataStream flat join on common window?

2021-05-23 Thread Marco Villalobos
Hi, Stream one has one element. Stream two has 2 elements. Both streams derive from side-outputs. I am using the DataStream API in batch execution mode. I want to join them on a common key and window. I am certain the keys match, but the flat join does not seem to be working. I deduce that t

Re: Does Flink 1.12.1 DataStream API batch execution mode support side outputs?

2021-05-23 Thread Marco Villalobos
I found the problem. I tried to sign timestamps to the operator (I don't know why), and when I did that, because I used the Flink API fluently, I was no longer referencing the operator that contained the side-outputs. Disregard my question. On Sat, May 22, 2021 at 9:28 PM Marco Villa

Does Flink 1.12.1 DataStream API batch execution mode support side outputs?

2021-05-22 Thread Marco Villalobos
I have been struggling for two days with an issue using the DataStream API in Batch Execution mode. It seems as though my side-output has no elements available to downstream operators. However, I am certain that the downstream operator received events. I logged the side-output element just before

Is it possible to leverage the sort order in DataStream Batch Execution Mode?

2021-05-21 Thread Marco Villalobos
Hello. I am using Flink 1.12.1 in EMR. I am processing historical time-series data with the DataStream API in Batch execution mode. I must average time series data into a fifteen minute interval and forward fill missing values. For example, this input: name, timestamp, value a,2019-06-23T00:07:

Re: Questions Flink DataStream in BATCH execution mode scalability advice

2021-05-19 Thread Marco Villalobos
> On May 19, 2021, at 7:26 AM, Yun Gao wrote: > > Hi Marco, > > For the remaining issues, > > 1. For the aggregation, the 500GB of files are not required to be fit into > memory. > Rough speaking for the keyed().window().reduce(), the input records would be

Re: DataStream Batch Execution Mode and large files.

2021-05-18 Thread Marco Villalobos
Thank you very much. You've been very helpful. Since my intermediate results are large, I suspect that io.tmp.dirs must literally be on the local file system. Thus, since I use EMR, I'll need to configure EBS to support more data. On Tue, May 18, 2021 at 11:08 PM Yun Gao wrote:

Re: DataStream API Batch Execution Mode restarting...

2021-05-18 Thread Marco Villalobos
Thank you. I used the default restart strategy. I'll change that. On Tue, May 18, 2021 at 11:02 PM Yun Gao wrote: > Hi Marco, > > Have you configured the restart strategy ? if the restart-strategy [1] is > configuration > into some strategies other than none, Flink shoul

Questions Flink DataStream in BATCH execution mode scalability advice

2021-05-18 Thread Marco Villalobos
Questions Flink DataStream in BATCH execution mode scalability advice. Here is the problem that I am trying to solve. Input is an S3 bucket directory with about 500 GB of data across many files. The instance that I am running on only has 50GB of EBS storage. The nature of this data is time series

DataStream API Batch Execution Mode restarting...

2021-05-18 Thread Marco Villalobos
I have a DataStream running in Batch Execution mode within YARN on EMR. My job failed an hour into the job two times in a row because the task manager heartbeat timed out. Can somebody point me out how to restart a job in this situation? I can't find that section of the documentation. thank you.

DataStream Batch Execution Mode and large files.

2021-05-18 Thread Marco Villalobos
ry curious how that happens. Can somebody please explain? Thank you. Marco A. Villalobos

How can I demarcate which event elements are the boundaries of a window?

2021-04-19 Thread Marco Villalobos
I have a tumbling window that aggregates into a process window function. Downstream there is a keyed process function. [window aggregate into process function] -> keyed process function I am not quite sure how the keyed process knows which elements are at the boundary of the window. Is there a m

Re: questions regarding stateful functions

2021-04-07 Thread Marco Villalobos
eam to a stateful function can > emit a message that > in turn will be routed to that function using the data stream integration. > > > On Wed, Apr 7, 2021 at 7:16 PM Marco Villalobos > wrote: > >> Thank you for the clarification. >> >> BUTthere was o

Re: questions regarding stateful functions

2021-04-07 Thread Marco Villalobos
Thank you for the clarification. BUTthere was one question not addressed: Can a stateful function be called by a process function? On Wed, Apr 7, 2021 at 8:19 AM Igal Shilman wrote: > Hello Marco! > > Your understanding is correct, but in addition > You can also use State

questions regarding stateful functions

2021-04-06 Thread Marco Villalobos
another datastream within the data stream job. Is my understanding correct? I was hoping that a stateful function could be called by a process function, is that possible? (I am guessing no). Please let me know. Thank you. Sincerely, Marco A. Villalobos

questions about broadcasts

2021-03-05 Thread Marco Villalobos
Is it possible for an operator to receive two different kinds of broadcasts? Is it possible for an operator to receive two different types of streams and a broadcast? For example, I know there is a KeyedCoProcessFunction, but is there a version of that which can also receive broadcasts?

clarification on backpressure metrics in Apache Flink Dashboard

2021-02-10 Thread Marco Villalobos
given: [source] -> [operator 1] -> [operator 2] -> [sink]. If within the dashboard, operator 1 shows that it has backpressure, does that mean I need to improve the performance of operator 2 in order to alleviate backpressure upon operator 1?

What is the difference between RuntimeContext state and ProcessWindowFunction.Context state when using a ProcessWindowFunction?

2021-02-09 Thread Marco Villalobos
Hi, I am having a difficult time distinguishing the difference between RuntimeContext state and global state when using a ProcessWindowFunction. A ProcessWindowFunction has three access different kinds of state. 1. RuntimeContext state. 2. ProcessWindowFunction.Context global state 3. ProcessWind

Re: threading and distribution

2021-02-05 Thread Marco Villalobos
. On Fri, Feb 5, 2021 at 3:06 AM Marco Villalobos wrote: > as data flows from a source through a pipeline of operators and finally > sinks, is there a means to control how many threads are used within an > operator, and how an operator is distributed across the network? > > Where c

hybrid state backends

2021-02-05 Thread Marco Villalobos
Is it possible to use different statebackends for different operators? There are certain situations where I want the state to reside completely in memory, and other situations where I want it stored in rocksdb.

threading and distribution

2021-02-05 Thread Marco Villalobos
as data flows from a source through a pipeline of operators and finally sinks, is there a means to control how many threads are used within an operator, and how an operator is distributed across the network? Where can I read up on these types of details specifically?

Re: org.codehaus.janino.CompilerFactory cannot be cast ....

2021-02-03 Thread Marco Villalobos
Oh, I found the solution. I simply need to not use TRACE log level for Flink. On Wed, Feb 3, 2021 at 7:07 PM Marco Villalobos wrote: > > Please advise me. I don't know what I am doing wrong. > > After I added the blink table planner to my my dependency managemen

org.codehaus.janino.CompilerFactory cannot be cast ....

2021-02-03 Thread Marco Villalobos
Please advise me. I don't know what I am doing wrong. After I added the blink table planner to my my dependency management: dependency "org.apache.flink:flink-table-planner-blink_${scalaVersion}:${flinkVersion}" and added it as a dependency: implementation "org.apache.flink:flink-table-planner-

Re: Question regarding a possible use case for Iterative Streams.

2021-02-03 Thread Marco Villalobos
our deployment topology for now) and package it with my Data Stream Job? Thank you. > On Feb 2, 2021, at 10:08 PM, Tzu-Li (Gordon) Tai wrote: > > Hi Marco, > > In the ideal setup, enrichment data existing in external databases is > bootstrapped into the streaming job via Fli

Question a possible use can for Iterative Streams.

2021-02-02 Thread Marco Villalobos
Hi everybody, I am brainstorming how it might be possible to perform database enrichment with the DataStream API, use keyed state for caching, and also utilize Async IO. Since AsyncIO does not support keyed state, then is it possible to use an Iterative Stream that uses keyed state for caching in

Re: question on checkpointing

2021-02-01 Thread Marco Villalobos
gt; checkpoint timeout. > > 2) What kind of effects are you worried about? > > On 1/28/2021 8:05 PM, Marco Villalobos wrote: > > Is it possible that checkpointing times out due to an operator taking > > too long? > > > > Also, does windowing affect the checkpoint barriers? > > >

Re: Flink and Amazon EMR

2021-02-01 Thread Marco Villalobos
s working. You would need to analyse what's working slower than > expected. Checkpointing times? (Async duration? Sync duration? Start > delay/back pressure?) Throughput? Recovery/startup? Are you being rate > limited by Amazon? > > Piotrek > > czw., 28 sty 2021 o 03:46 M

Re: flink checkpoints adjustment strategy

2021-01-29 Thread Marco Villalobos
wrote: > Hi Marco > You need to figure out why the checkpoint timed out(you can see the > consumed time of each period for one checkpoint in UI), if it indeed needs > such long time to complete the checkpoint, then you need to configure a > longer timeout. > If there are

flink checkpoints adjustment strategy

2021-01-28 Thread Marco Villalobos
errors during our onTimer calls, I don't know if that's related. Marco A. Villalobos

question on checkpointing

2021-01-28 Thread Marco Villalobos
Is it possible that checkpointing times out due to an operator taking too long? Also, does windowing affect the checkpoint barriers?

Re: presto s3p checkpoints and local stack

2021-01-28 Thread Marco Villalobos
Is it possible to use an environmental credentials provider? On Thu, Jan 28, 2021 at 8:35 AM Arvid Heise wrote: > Hi Marco, > > afaik you don't need HADOOP_HOME or core-site.xml. > > I'm also not sure from where you got your config keys. (I guess from the > Presto p

presto s3p checkpoints and local stack

2021-01-28 Thread Marco Villalobos
. Any advice would be appreciated. -Marco Villalobos

Re: What causes a buffer pool exception? How can I mitigate it?

2021-01-28 Thread Marco Villalobos
at 1:17 AM Arvid Heise wrote: > Hi Marco, > > In general, sending a compressed log to ML is totally fine. You can > further minimize the log by disabling restarts. > I looked into the logs that you provided. > > 2021-01-26 04:37:43,280 INFO org.apache

Flink and Amazon EMR

2021-01-27 Thread Marco Villalobos
Just curious, has anybody had success with Amazon EMR with RocksDB and checkpointing in S3? That's the configuration I am trying to setup, but my system is running more slowly than expected.

stopping with save points

2021-01-27 Thread Marco Villalobos
When I try to stop with a savepoint, I usually get the error below. I have not been able to create a single save point. Please advise. I am using Flink 1.11.0 Draining job "ed51084378323a7d9fb1c4c97c2657df" with a savepoint. The progr

  1   2   >