[jira] [Created] (FLINK-30095) Flink's JobCluster ResourceManager should throw an exception when the failure number of starting worker reaches the maximum failure rate

2022-11-19 Thread zhanglu153 (Jira)
zhanglu153 created FLINK-30095:
--

 Summary: Flink's JobCluster ResourceManager should throw an 
exception when the failure number of starting worker reaches the maximum 
failure rate
 Key: FLINK-30095
 URL: https://issues.apache.org/jira/browse/FLINK-30095
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.16.0, 1.15.0, 1.14.0, 1.13.0
Reporter: zhanglu153


As shown in https://issues.apache.org/jira/browse/FLINK-10868,although 
resourcemanager.start-worker.max-failure-rate and 
resourcemanager.start-worker.retry-interval are set, in a worse case, when new 
start containers consistently fail, YarnResourceManager will goes into an 
infinite resource acquirement process without failing the job. Resources on 
Yarn are continuously occupied and released after a period of time, affecting 
other tasks.

It should be considered that when the failure number of starting worker reaches 
the maximum failure rate, Flink JobCluster ResourceManager will directly throw 
an exception instead of sending a new request to start new worker after a 
period of time. This task does not fail but is always in the running state. 
Users may not be aware that tasks occupy resources on yarn in a timely manner, 
which affects other tasks' failure to obtain resources on yarn.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


RE: Could not connect Amazon S3 file system with different creds at runtime.

2022-11-19 Thread kursat kursatkurt . com
Thank you Martijn, I am watching the issue.

Regards,
Kursat

-Original Message-
From: Martijn Visser  
Sent: Saturday, November 19, 2022 01:01
To: dev@flink.apache.org
Subject: Re: Could not connect Amazon S3 file system with different creds at 
runtime.

Hi,

Like I mentioned in Stackoverflow, it's currently not possible. See
https://issues.apache.org/jira/browse/FLINK-19589

Best regards,

Martijn

On Fri, Nov 18, 2022 at 8:13 PM kursat kursatkurt.com 
wrote:

> Hi;
>
>
>
> I am new to Flink development.
>
> Is there any way to set S3 credentials at runtime?
>
> How can we connect 3 or more different s3 buckets (with different creds)?
>
> Lets say you have 3 csv file on AWS S3, and you want to join them with 
> their id fields.
>
>
>
> How can we do this? I don't want to use flink-conf.yaml file or 
> another config file.
>
> Because sources can change dynamically, so I need to set creds dynamically.
>
>
>
> I could not pass the creds checking for even 1 csv file, here you can 
> try the code(Scala):
>
>
>
> object AwsS3CSVTest {
>
>   def main(args: Array[String]): Unit = {
>
> val conf = new Configuration();
>
> conf.setString("fs.s3a.access.key", "***")
>
> conf.setString("fs.s3a.secret.key", "***")
>
> val env = ExecutionEnvironment.createLocalEnvironment(conf)
>
> val datafile = env.readCsvFile("s3a://anybucket/anyfile.csv")
>
>   .ignoreFirstLine()
>
>   .fieldDelimiter(";")
>
>   .types(classOf[String], classOf[String], classOf[String], 
> classOf[String], classOf[String], classOf[String])
>
> datafile.print()
>
>   }
>
> }
>
>
>
> I also asked on Stackoverflow for sharing.
>
>
>
>
> https://stackoverflow.com/questions/74482619/apache-flink-s3-file-syst
> em-credentials-does-not-work/
>
>
>
> I want to say that, I know I can do this with Spark. You can access 
> the HadoopConfiguration and set the creds at runtime:
>
>
>
>   def getAwsS3DF = {
>
> val ss = SparkFactory.getSparkSession
>
> ss.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", 
> "xxx")
>
> ss.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", 
> "xxx")
>
>
>
> val df = ss.read.format("csv")
>
>   .option("header", true)
>
>   .option("sep", "\t")
>
>   .load("s3a://anybucket/anyfile.csv ")
>
>df.show
>
>   }
>
>
>
> So is there anything am I missing or is it not possible?
>
>
>
> Thank you.
>
>


[jira] [Created] (FLINK-30096) Rename DynamoDB config destinationTableName to tableName

2022-11-19 Thread Hong Liang Teoh (Jira)
Hong Liang Teoh created FLINK-30096:
---

 Summary: Rename DynamoDB config destinationTableName to tableName
 Key: FLINK-30096
 URL: https://issues.apache.org/jira/browse/FLINK-30096
 Project: Flink
  Issue Type: Improvement
Reporter: Hong Liang Teoh


The word destination is redundant since it is part of a DDB table sink.

 

Renaming destinationTableName to tableName in all places



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30097) CachedDataStream java example in the document is not correct

2022-11-19 Thread Prabhu Joseph (Jira)
Prabhu Joseph created FLINK-30097:
-

 Summary: CachedDataStream java example in the document is not 
correct
 Key: FLINK-30097
 URL: https://issues.apache.org/jira/browse/FLINK-30097
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.16.0
Reporter: Prabhu Joseph


CachedDataStream java example in the document is not correct - 
[https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/overview/#datastream-rarr-cacheddatastream]

 
{code:java}
DataStream dataStream = //...
CachedDataStream cachedDataStream = dataStream.cache();{code}

The example shows to invoke cache() on a DataStream instance but DataStream 
class does not have cache() method. The right usage is to call cache() on an 
instance of DataStreamSource/SideOutputDataStream/SingleOutputStreamOperator. 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] OLM Bundles for Flink Kubernetes Operator

2022-11-19 Thread Hao t Chang
Hi everyone,



A while ago, we created the OLM bundle for Flink Kubernetes Operator on the 
OperatorHub [1]; however, the bundle is not endorsed by the Apache Flink 
community. I would like to ask your opinion how we can make the bundle more 
official or even as a part of the Flink Kubernetes Operator release. Please 
look at the document [2] where I have addressed some concerns from the 
community. I will be happy to answer any questions. Thank you.



For people who are not familiar with OLM(Operator Lifecycle manager), it 
manages the lifecycle of Operators on Kubernetes clusters and comes by default 
on the OpenShift Container Platform. The OperatorHub is the online catalog for 
users to search and install Operators with OLM.



[1] https://operatorhub.io/operator/flink-kubernetes-operator
[2] 
https://docs.google.com/document/d/1BiLzfPYb3Thzk01H4teB8_Crqs5FydQYS2Utj11HlLs/edit?usp=sharing

--
Best,
Ted Chang | Software Engineer | htch...@us.ibm.com



Re: [DISCUSS] FLIP-271: Autoscaling

2022-11-19 Thread Chen Qin
Hi Gyula,

Do we think the scaler could be a plugin or hard coded ?
We observed some cases scaler can't address (e.g async io dependency
service degradation or small spike that doesn't worth restarting job)

Thanks,
Chen

On Fri, Nov 18, 2022 at 1:03 AM Gyula Fóra  wrote:

> Hi Dong!
>
> Could you please confirm that your main concerns have been addressed?
>
> Some other minor details that might not have been fully clarified:
>  - The prototype has been validated on some production workloads yes
>  - We are only planning to use metrics that are generally available and are
> previously accepted to be standardized connector metrics (not Kafka
> specific). This is actually specified in the FLIP
>  - Even if some metrics (such as pendingRecords) are not accessible the
> scaling algorithm works and can be used. For source scaling based on
> utilization alone we still need some trivial modifications on the
> implementation side.
>
> Cheers,
> Gyula
>
> On Thu, Nov 17, 2022 at 5:22 PM Gyula Fóra  wrote:
>
> > Hi Dong!
> >
> > This is not an experimental feature proposal. The implementation of the
> > prototype is still in an experimental phase but by the time the FLIP,
> > initial prototype and review is done, this should be in a good stable
> first
> > version.
> > This proposal is pretty general as autoscalers/tuners get as far as I
> > understand and there is no history of any alternative effort that even
> > comes close to the applicability of this solution.
> >
> > Any large features that were added to Flink in the past have gone through
> > several iterations over the years and the APIs have evolved as they
> matured.
> > Something like the autoscaler can only be successful if there is enough
> > user exposure and feedback to make it good, putting it in an external
> repo
> > will not get us anywhere.
> >
> > We have a prototype implementation ready that works well and it is more
> or
> > less feature complete. We proposed this FLIP based on something that we
> see
> > as a working solution, please do not underestimate the effort that went
> > into this proposal and the validation of the ideas. So in this sense our
> > approach here is the same as with the Table Store and Kubernetes Operator
> > and other big components of the past. On the other hand it's impossible
> to
> > sufficiently explain all the technical depth/implementation details of
> such
> > complex components in FLIPs to 100%, I feel we have a good overview of
> the
> > algorithm in the FLIP and the implementation should cover all remaining
> > questions. We will have an extended code review phase following the FLIP
> > vote before this make it into the project.
> >
> > I understand your concern regarding the stability of Flink Kubernetes
> > Operator config and metric names. We have decided to not provide
> guarantees
> > there yet but if you feel that it's time for the operator to support such
> > guarantees please open a separate discussion on that topic, I don't want
> to
> > mix the two problems here.
> >
> > Regards,
> > Gyula
> >
> > On Thu, Nov 17, 2022 at 5:07 PM Dong Lin  wrote:
> >
> >> Hi Gyula,
> >>
> >> If I understand correctly, this autopilot proposal is an experimental
> >> feature and its configs/metrics are not mature enough to provide
> backward
> >> compatibility yet. And the proposal provides high-level ideas of the
> >> algorithm but it is probably too complicated to explain it end-to-end.
> >>
> >> On the one hand, I do agree that having an auto-tuning prototype, even
> if
> >> not mature, is better than nothing for Flink users. On the other hand, I
> >> am
> >> concerned that this FLIP seems a bit too experimental, and starting with
> >> an
> >> immature design might make it harder for us to reach a production-ready
> >> and
> >> generally applicable auto-tuner in the future. And introducing too
> >> backward
> >> incompatible changes generally hurts users' trust in the Flink project.
> >>
> >> One alternative might be to develop and experiment with this feature in
> a
> >> non-Flink repo. You can iterate fast without worrying about typically
> >> backward compatibility requirement as required for most Flink public
> >> features. And once the feature is reasonably evaluated and mature
> enough,
> >> it will be much easier to explain the design and address all the issues
> >> mentioned above. For example, Jingsong implemented a Flink Table Store
> >> prototype
> >> 
> >> before
> >> proposing FLIP-188 in this thread
> >> .
> >>
> >> I don't intend to block your progress. Just my two cents. It will be
> great
> >> to hear more from other developers (e.g. in the voting thread).
> >>
> >> Thanks,
> >> Dong
> >>
> >>
> >> On Thu, Nov 17, 2022 at 1:24 AM Gyula Fóra 
> wrote:
> >>
> >> > Hi Dong,
> >> >
> >> > Let me address your comments.
> >> >
> >> > Time for scale / backlog processing time

[jira] [Created] (FLINK-30098) Update DynamoDb Sink unit tests to JUnit 5

2022-11-19 Thread Hong Liang Teoh (Jira)
Hong Liang Teoh created FLINK-30098:
---

 Summary: Update DynamoDb Sink unit tests to JUnit 5
 Key: FLINK-30098
 URL: https://issues.apache.org/jira/browse/FLINK-30098
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / DynamoDB
Reporter: Hong Liang Teoh


Use JUnit 5 in all unit tests for dynamoDB sink



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30099) Add tests to cover model data APIs for all existing algorithms

2022-11-19 Thread Dong Lin (Jira)
Dong Lin created FLINK-30099:


 Summary: Add tests to cover model data APIs for all existing 
algorithms
 Key: FLINK-30099
 URL: https://issues.apache.org/jira/browse/FLINK-30099
 Project: Flink
  Issue Type: Improvement
Reporter: Dong Lin


test_linearsvc.py should be updated to cover the get_model_data() and 
set_model_data() usage. Same for other existing algorithms in Flink ML.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)