Re: [DISCUSS] Hadoop 2 vs Hadoop 3 usage

2024-01-15 Thread Yang Wang
I could share some metrics about Alibaba Cloud EMR clusters. The ratio of Hadoop2 VS Hadoop3 is 1:3. Best, Yang On Thu, Dec 28, 2023 at 8:16 PM Martijn Visser wrote: > Hi all, > > I want to get some insights on how many users are still using Hadoop 2 > vs how many users are us

[DISCUSS] Hadoop 2 vs Hadoop 3 usage

2023-12-28 Thread Martijn Visser
Hi all, I want to get some insights on how many users are still using Hadoop 2 vs how many users are using Hadoop 3. Flink currently requires a minimum version of Hadoop 2.10.2 for certain features, but also extensively uses Hadoop 3 (like for the file system implementations) Hadoop 2 has a

Re: Hadoop Error on ECS Fargate

2023-07-17 Thread Martijn Visser
Hi Mengxi Wang, Which Flink version are you using? Best regards, Martijn On Thu, Jul 13, 2023 at 3:21 PM Wang, Mengxi X via user < user@flink.apache.org> wrote: > Hi community, > > > > We got this kuerberos error with Hadoop as file system on ECS Fargate > deplo

Hadoop Error on ECS Fargate

2023-07-13 Thread Wang, Mengxi X via user
Hi community, We got this kuerberos error with Hadoop as file system on ECS Fargate deployment. Caused by: org.apache.hadoop.security.KerberosAuthException: failure to login: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name Caused by

Re: fail to mount hadoop-config-volume when using flink-k8s-operator

2022-10-13 Thread Yang Wang
Currently, exporting the env "HADOOP_CONF_DIR" could only work for native K8s integration. The flink client will try to create the hadoop-config-volume automatically if hadoop env found. If you want to set the HADOOP_CONF_DIR in the docker image, please also make sure the specified h

fail to mount hadoop-config-volume when using flink-k8s-operator

2022-10-12 Thread Liting Liu (litiliu)
Hi, community: I'm using flink-k8s-operator v1.2.0 to deploy flink job. And the "HADOOP_CONF_DIR" environment variable was setted in the image that i buiilded from flink:1.15. I found the taskmanager pod was trying to mount a volume named "hadoop-config-volume"

Setting boundedness for legacy Hadoop sequence file sources

2022-05-03 Thread Ken Krugler
Hi all, I’m converting several batch Flink workflows to streaming, with bounded sources. Some of our sources are reading Hadoop sequence files via StreamExecutionEnvironment.createInput(HadoopInputFormat). The problem is that StreamGraphGenerator.existsUnboundedSource is returning true

Re: [statefun] hadoop dependencies and StatefulFunctionsConfigValidator

2022-03-09 Thread Igal Shilman
s these old protobuf dependencies to get loaded over statefun's >> protobuf-java 3.7.1, and NoSuchMethod exceptions occur. >> > >> > We hacked together a version of statefun that doesn't perform the check >> whether the classloader settings contain the three patt

Re: [statefun] hadoop dependencies and StatefulFunctionsConfigValidator

2022-03-08 Thread Filip Karnicki
loaded over statefun's > protobuf-java 3.7.1, and NoSuchMethod exceptions occur. > > > > We hacked together a version of statefun that doesn't perform the check > whether the classloader settings contain the three patterns from above, and > as long as our job uses p

Re: [statefun] hadoop dependencies and StatefulFunctionsConfigValidator

2022-03-08 Thread Roman Khachatryan
uses protobouf-java 3.7.1 and the com.google.protobuf > pattern is not present in the classloader.parent-first-patterns.additional > setting, then all is well. > > Aside from removing old hadoop from the classpath, which may not be possible > given that it's a shared cluster,

[statefun] hadoop dependencies and StatefulFunctionsConfigValidator

2022-03-04 Thread Filip Karnicki
ntain the three patterns from above, and as long as our job uses protobouf-java 3.7.1 and the com.google.protobuf pattern is not present in the classloader.parent-first-patterns.additional setting, then all is well. Aside from removing old hadoop from the classpath, which may not be possible gi

Re: [DISCUSS] Changing the minimal supported version of Hadoop

2022-01-03 Thread David Morávek
As there were no strong objections, we'll proceed with bumping the Hadoop version to 2.8.5 and removing the safeguards and the CI for any earlier versions. This will effectively make the Hadoop 2.8.5 the least supported version in Flink 1.15. Best, D. On Thu, Dec 23, 2021 at 11:03 AM

Re: [DISCUSS] Changing the minimal supported version of Hadoop

2021-12-23 Thread Till Rohrmann
If there are no users strongly objecting to dropping Hadoop support for < 2.8, then I am +1 for this since otherwise we won't gain a lot as Xintong said. Cheers, Till On Wed, Dec 22, 2021 at 10:33 AM David Morávek wrote: > Agreed, if we drop the CI for lower versions, there is

Re: [DISCUSS] Changing the minimal supported version of Hadoop

2021-12-22 Thread David Morávek
Agreed, if we drop the CI for lower versions, there is actually no point of having safeguards as we can't really test for them. Maybe one more thought (it's more of a feeling), I feel that users running really old Hadoop versions are usually slower to adopt (they most likely use what t

Re: [DISCUSS] Changing the minimal supported version of Hadoop

2021-12-21 Thread Xintong Song
Sorry to join the discussion late. +1 for dropping support for hadoop versions < 2.8 from my side. TBH, warping the reflection based logic with safeguards sounds a bit neither fish nor fowl to me. It weakens the major benefits that we look for by dropping support for early versions. -

Re: [DISCUSS] Changing the minimal supported version of Hadoop

2021-12-21 Thread David Morávek
CC user@f.a.o Is anyone aware of something that blocks us from doing the upgrade? D. On Tue, Dec 21, 2021 at 5:50 PM David Morávek wrote: > Hi Martijn, > > from person experience, most Hadoop users are lagging behind the release > lines by a lot, because upgrading a Hadoop cl

Re: Passing arbitrary Hadoop s3a properties from FileSystem SQL Connector options

2021-12-15 Thread Arvid Heise
he effort will be >> tracked under the following ticket: >> >> https://issues.apache.org/jira/browse/FLINK-19589 >> >> I will loop-in Arvid (in CC) which might help you in contributing the >> missing functioniality. >> >> Regards, >> Timo >>

Re: Passing arbitrary Hadoop s3a properties from FileSystem SQL Connector options

2021-12-13 Thread Timothy James
> Regards, > Timo > > > On 10.12.21 23:48, Timothy James wrote: > > Hi, > > > > The Hadoop s3a library itself supports some properties we need, but the > > "FileSystem SQL Connector" (via FileSystemTableFactory) does not pass > > connec

Re: Passing arbitrary Hadoop s3a properties from FileSystem SQL Connector options

2021-12-13 Thread Timo Walther
23:48, Timothy James wrote: Hi, The Hadoop s3a library itself supports some properties we need, but the "FileSystem SQL Connector" (via FileSystemTableFactory) does not pass connector options for these to the "Hadoop/Presto S3 File Systems plugins" (via S3FileSystemFacto

Passing arbitrary Hadoop s3a properties from FileSystem SQL Connector options

2021-12-10 Thread Timothy James
Hi, The Hadoop s3a library itself supports some properties we need, but the "FileSystem SQL Connector" (via FileSystemTableFactory) does not pass connector options for these to the "Hadoop/Presto S3 File Systems plugins" (via S3FileSystemFactory). Instead, only Job-global Fli

Re: Issue with Flink jobs after upgrading to Flink 1.13.1/Ververica 2.5 - java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration

2021-12-08 Thread Natu Lauchande
> Something you could try is removing the packaged parquet format and > defining a custom format[1]. For this custom format you can then fix the > dependencies by packaging all of the following into the format: > > * flink-sql-parquet > * flink-shaded-hadoop-2-uber > * hadoop-

Re: Issue with Flink jobs after upgrading to Flink 1.13.1/Ververica 2.5 - java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration

2021-12-08 Thread Ingo Bürk
Hi Natu, Something you could try is removing the packaged parquet format and defining a custom format[1]. For this custom format you can then fix the dependencies by packaging all of the following into the format: * flink-sql-parquet * flink-shaded-hadoop-2-uber * hadoop-aws * aws-java-sdk

Re: Issue with Flink jobs after upgrading to Flink 1.13.1/Ververica 2.5 - java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration

2021-12-06 Thread Natu Lauchande
: building the image with hadoop-client libraries) : java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671) at java.lang.Class.getDeclaredConstructors

Re: Replacing S3 Client in Hadoop plugin

2021-11-23 Thread Martijn Visser
Hi Tamir, Thanks for providing the information. I don't know of a current solution right now, perhaps some other user has an idea, but I do find your input valuable for future improvements with regards to the S3 Client in Hadoop. Best regards, Martijn On Fri, 19 Nov 2021 at 09:21, Tamir

Replacing S3 Client in Hadoop plugin

2021-11-19 Thread Tamir Sagi
Hey Martijn, sorry for late respond. We wanted to replace the default client with our custom S3 client and not use the AmazonS3Client provided by the plugin. We used Flink-s3-fs-hadoop v1.12.2 and for our needs we had to upgrade to v1.14.0 [1]. AmazonS3 client factory is initialized[2] - if

Re: Replacing S3 Client in Hadoop plugin

2021-10-13 Thread Martijn Visser
Hi, Could you elaborate on why you would like to replace the S3 client? Best regards, Martijn On Wed, 13 Oct 2021 at 17:18, Tamir Sagi wrote: > I found the dependency > > > org.apache.hadoop > hadoop-aws > 3.3.1 > > > apparently its

Re: Replacing S3 Client in Hadoop plugin

2021-10-13 Thread Tamir Sagi
I found the dependency org.apache.hadoop hadoop-aws 3.3.1 apparently its possible, there is a method setAmazonS3Client I think I found the solution. Thanks. Tamir. From: Tamir Sagi Sent: Wednesday, October 13, 2021 5:44 PM To: user

Replacing S3 Client in Hadoop plugin

2021-10-13 Thread Tamir Sagi
Hey community. I would like to know if there is any way to replace the S3 client in Hadoop plugin[1] to a custom client(AmazonS3). I did notice that Hadoop plugin supports replacing the implementation of S3AFileSystem using "fs.s3a.impl" (in flink-conf.yaml it will be "s3.imp

Re: Any one can help me? How to connect offline hadoop cluster and realtime hadoop cluster by different hive catalog?

2021-08-26 Thread Caizhi Weng
Hi! It seems that your Flink cluster cannot connect to realtime-cluster-master001/xx.xx.xx.xx:8050. Please check your network and port status. Jim Chen 于2021年8月27日周五 下午2:20写道: > Hi, All > My flink version is 1.13.1 and my company have two hadoop cluster, > offline hadoop cluster and

Any one can help me? How to connect offline hadoop cluster and realtime hadoop cluster by different hive catalog?

2021-08-26 Thread Jim Chen
Hi, All My flink version is 1.13.1 and my company have two hadoop cluster, offline hadoop cluster and realtime hadoop cluster. Now, on realtime hadoop cluster, we want to submit flink job to connect offline hadoop cluster by different hive catalog. I use different hive configuration diretory in

Re: Issue with Flink jobs after upgrading to Flink 1.13.1/Ververica 2.5 - java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration

2021-07-23 Thread Flavio Pompermaier
ntil then, a > workaround could be to add Hadoop manually and set the HADOOP_CLASSPATH > environment variable. The root cause seems that Hadoop cannot be found. > > Alternatively, you could also build a custom image and include Hadoop in > the lib folder of Flink: > > https:/

Re: Issue with Flink jobs after upgrading to Flink 1.13.1/Ververica 2.5 - java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration

2021-07-22 Thread Timo Walther
Thanks, this should definitely work with the pre-packaged connectors of Ververica platform. I guess we have to investigate what is going on. Until then, a workaround could be to add Hadoop manually and set the HADOOP_CLASSPATH environment variable. The root cause seems that Hadoop cannot be

Re: Issue with Flink jobs after upgrading to Flink 1.13.1/Ververica 2.5 - java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration

2021-07-22 Thread Natu Lauchande
DDL? > > Regards, > Timo > > > On 22.07.21 14:11, Natu Lauchande wrote: > > Hey Timo, > > > > Thanks for the reply. > > > > No custom file as we are using Flink SQL and submitting the job directly > > through the SQL Editor UI. We are using Flink 1

Re: Issue with Flink jobs after upgrading to Flink 1.13.1/Ververica 2.5 - java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration

2021-07-22 Thread Timo Walther
1.13.1 as the supported flink version. No custom code all through Flink SQL on UI no jars. Thanks, Natu On Thu, Jul 22, 2021 at 2:08 PM Timo Walther <mailto:twal...@apache.org>> wrote: Hi Natu, Ververica Platform 2.5 has updated the bundled Hadoop version but this should n

Re: Issue with Flink jobs after upgrading to Flink 1.13.1/Ververica 2.5 - java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration

2021-07-22 Thread Natu Lauchande
Walther wrote: > Hi Natu, > > Ververica Platform 2.5 has updated the bundled Hadoop version but this > should not result in a NoClassDefFoundError exception. How are you > submitting your SQL jobs? You don't use Ververica's SQL service but have > built a regular JAR file

Re: Issue with Flink jobs after upgrading to Flink 1.13.1/Ververica 2.5 - java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration

2021-07-22 Thread Timo Walther
Hi Natu, Ververica Platform 2.5 has updated the bundled Hadoop version but this should not result in a NoClassDefFoundError exception. How are you submitting your SQL jobs? You don't use Ververica's SQL service but have built a regular JAR file, right? If this is the case, can you

Issue with Flink jobs after upgrading to Flink 1.13.1/Ververica 2.5 - java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration

2021-07-22 Thread Natu Lauchande
FAILED on 10.243.3.0:42337-2a3224 @ 10-243-3-0.flink-metrics.vvp-jobs.svc.cluster.local (dataPort=39309). java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration at java.lang.Class.getDeclaredConstructors0(Native Method) ~[?:1.8.0_292] at

Re: Failure running Flink locally with flink-s3-fs-hadoop + AWS SDK v2 as a dependency

2021-07-20 Thread Yaroslav Tkachenko
ven called. Could this be a LocalStreamEnvironment limitation? Is there any way to enable plugin loading locally? Thanks! On 2021/06/21 11:13:29, Yuval Itzchakov wrote: > Currently I have the s3-hadoop dependency in my build.sbt. > > I guess I need to move it to the PLUGIN direct

Flink Hive connector: hive-conf-dir supports hdfs URI, while hadoop-conf-dir supports local path only?

2021-04-26 Thread Yik San Chan
Hi community, This question is cross-posted on Stack Overflow https://stackoverflow.com/questions/67264156/flink-hive-connector-hive-conf-dir-supports-hdfs-uri-while-hadoop-conf-dir-sup In my current setup, local dev env can access testing env. I would like to run Flink job on local dev env

Re: Flink Hadoop config on docker-compose

2021-04-22 Thread Matthias Pohl
>> options. It is only used to construct the classpath for the JM/TM process. >> However, in "HadoopUtils"[2] we do not support getting the hadoop >> configuration from classpath. >> >> >> [1]. >> https://github.com/apache/flink/blob/release-1.11/fli

Re: Flink Hadoop config on docker-compose

2021-04-22 Thread Flavio Pompermaier
16, 2021 at 4:52 AM Yang Wang wrote: >> >>> It seems that we do not export HADOOP_CONF_DIR as environment variables >>> in current implementation, even though we have set the env.xxx flink config >>> options. It is only used to construct the classpath for the JM/TM

Re: Flink Hadoop config on docker-compose

2021-04-16 Thread Flavio Pompermaier
sed to construct the classpath for the JM/TM process. > However, in "HadoopUtils"[2] we do not support getting the hadoop > configuration from classpath. > > > [1]. > https://github.com/apache/flink/blob/release-1.11/flink-dist/src/main/flink-bin/bin/config.sh#L256 > [2].

Re: Flink Hadoop config on docker-compose

2021-04-15 Thread Yang Wang
It seems that we do not export HADOOP_CONF_DIR as environment variables in current implementation, even though we have set the env.xxx flink config options. It is only used to construct the classpath for the JM/TM process. However, in "HadoopUtils"[2] we do not support getting

Re: Flink Hadoop config on docker-compose

2021-04-15 Thread Flavio Pompermaier
Hi Robert, indeed my docker-compose does work only if I add also Hadoop and yarn home while I was expecting that those two variables were generated automatically just setting env.xxx variables in FLINK_PROPERTIES variable.. I just want to understand what to expect, if I really need to specify

Re: Flink Hadoop config on docker-compose

2021-04-15 Thread Robert Metzger
Hi, I'm not aware of any known issues with Hadoop and Flink on Docker. I also tried what you are doing locally, and it seems to work: flink-jobmanager| 2021-04-15 18:37:48,300 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint[] - Starting StandaloneSessionClusterEntry

Flink Hadoop config on docker-compose

2021-04-14 Thread Flavio Pompermaier
Hi everybody, I'm trying to set up reading from HDFS using docker-compose and Flink 1.11.3. If I pass 'env.hadoop.conf.dir' and 'env.yarn.conf.dir' using FLINK_PROPERTIES (under environment section of the docker-compose service) I see in the logs the following line

Re: Hadoop is not in the classpath/dependencies

2021-03-30 Thread Chesnay Schepler
This looks related to HDFS-12920; where Hadoop 2.X tries to read a duration from hdfs-default.xml expecting plain numbers, but in 3.x they also contain time units. On 3/30/2021 9:37 AM, Matthias Seiler wrote: Thank you all for the replies! I did as @Maminspapin suggested and indeed the

Re: Hadoop is not in the classpath/dependencies

2021-03-30 Thread Matthias Seiler
t;30s" // this is thrown by the flink-shaded-hadoop library ``` I thought that it relates to the windowing I do, which has a slide interval of 30 seconds, but removing it displays the same error. I also added the dependency to the maven pom, but without effect. Since I use Hadoop 3.2.1, I also t

Re: Hadoop is not in the classpath/dependencies

2021-03-26 Thread Robert Metzger
Hey Matthias, Maybe the classpath contains hadoop libraries, but not the HDFS libraries? The "DistributedFileSystem" class needs to be accessible to the classloader. Can you check if that class is available? Best, Robert On Thu, Mar 25, 2021 at 11:10 AM Matthias Seiler <

Re: Hadoop is not in the classpath/dependencies

2021-03-25 Thread Maminspapin
I downloaded the lib (last version) from here: https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.8.3-7.0/ and put it in the flink_home/lib directory. It helped. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Hadoop is not in the classpath/dependencies

2021-03-25 Thread Maminspapin
I have the same problem ... -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Hadoop is not in the classpath/dependencies

2021-03-25 Thread Matthias Seiler
Hello everybody, I set up a a Flink (1.12.1) and Hadoop (3.2.1) cluster on two machines. The job should store the checkpoints on HDFS like so: ```java StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.enableCheckpointing(15000

Re: Hadoop Integration Link broken in downloads page

2021-03-10 Thread Till Rohrmann
Thanks a lot for reporting this problem Debraj. I've created a JIRA issue for it [1]. [1] https://issues.apache.org/jira/browse/FLINK-21723 Cheers, Till On Tue, Mar 9, 2021 at 5:28 AM Debraj Manna wrote: > Hi > > It appears the Hadoop Interation > <https://ci.apache.org/

Hadoop Integration Link broken in downloads page

2021-03-08 Thread Debraj Manna
Hi It appears the Hadoop Interation <https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/deployment/hadoop.html> link is broken on downloads <https://flink.apache.org/downloads.html> page. Apache Flink® 1.12.2 is our latest stable release. > If you plan to use Apache

[DISCUSS] Removal of flink-swift-fs-hadoop module

2021-01-26 Thread Robert Metzger
Hi all, during a security maintenance PR [1], Chesnay noticed that the flink-swift-fs-hadoop module is lacking test coverage [2]. Also, there hasn't been any substantial change since 2018, when it was introduced. On the user@ ML, I could not find any proof of significant use of the module (n

Re: Issues with Flink Batch and Hadoop dependency

2020-08-31 Thread Arvid Heise
Hi Dan, Your approach in general is good. You might want to use the bundled hadoop uber jar [1] to save some time if you find the appropriate version. You can also build your own version and include it then in lib/. In general, I'd recommend moving away from sequence files. As soon as you c

Re: Issues with Flink Batch and Hadoop dependency

2020-08-29 Thread Dan Hill
I was able to get a basic version to work by including a bunch of hadoop and s3 dependencies in the job jar and hacking in some hadoop config values. It's probably not optimal but it looks like I'm unblocked. On Fri, Aug 28, 2020 at 12:11 PM Dan Hill wrote: > I'm assumi

Issues with Flink Batch and Hadoop dependency

2020-08-28 Thread Dan Hill
b to read these Sequence files, I get the following error: NoClassDefFoundError: org/apache/hadoop/mapred/FileInputFormat It fails on this readSequenceFile. env.createInput(HadoopInputs.readSequenceFile(Text.class, ByteWritable.class, INPUT_FILE)) If I directly depend on org-apache-hadoop/had

Re: Flink S3 Hadoop dependencies

2020-08-14 Thread Chesnay Schepler
Saley wrote: Hi team, Was there a reason for not shading hadoop-common https://github.com/apache/flink/commit/e1e7d7f7ecc080c850a264021bf1b20e3d27d373#diff-e7b798a682ee84ab804988165e99761cR38-R44 ? This is leaking lots of classes such as guava and causing issues in our flink application. I see

Flink S3 Hadoop dependencies

2020-08-14 Thread Satish Saley
Hi team, Was there a reason for not shading hadoop-common https://github.com/apache/flink/commit/e1e7d7f7ecc080c850a264021bf1b20e3d27d373#diff-e7b798a682ee84ab804988165e99761cR38-R44 ? This is leaking lots of classes such as guava and causing issues in our flink application. I see that hadoop

Re: Hadoop FS when running standalone

2020-07-16 Thread Lorenzo Nicora
Thanks Alessandro, I think I solved it. I cannot set any HADOOP_HOME as I have no Hadoop installed on the machine running my tests. But adding *org.apache.flink:flink-shaded-hadoop-2:2.8.3-10.0* as a compile dependency to the Maven profile building the standalone version fixed the issue. Lorenzo

Re: Hadoop FS when running standalone

2020-07-16 Thread Alessandro Solimando
Hi Lorenzo, IIRC I had the same error message when trying to write snappified parquet on HDFS with a standalone fat jar. Flink could not "find" the hadoop native/binary libraries (specifically I think for me the issue was related to snappy), because my HADOOP_HOME was not (properly) se

Hadoop FS when running standalone

2020-07-16 Thread Lorenzo Nicora
Hi I need to run my streaming job as a *standalone* Java application, for testing The job uses the Hadoop S3 FS and I need to test it (not a unit test). The job works fine when deployed (I am using AWS Kinesis Data Analytics, so Flink 1.8.2) I have *org.apache.flink:flink-s3-fs-hadoop* as a

Re: Dockerised Flink 1.8 with Hadoop S3 FS support

2020-07-03 Thread Yang Wang
Hi Lorenzo, Since Flink 1.8 does not support plugin mechanism to load filesystem, you need to copy flink-s3-fs-hadoop-*.jar from opt to lib directory. The dockerfile could be like following. FROM flink:1.8-scala_2.11 RUN cp /opt/flink/opt/flink-s3-fs-hadoop-*.jar /opt/flink/lib Then build you

Dockerised Flink 1.8 with Hadoop S3 FS support

2020-07-02 Thread Lorenzo Nicora
cluster support for S3 Hadoop File System (s3a://), we have on KDA out of the box. Note I do not want to add dependencies to the job directly, as I want to deploy locally exactly the same JAR I deploy to KDA. Flink 1.8 docs [1] say is supported out of the box but does not look to be the case for

Re: flink-s3-fs-hadoop retry configuration

2020-06-17 Thread Jeff Henrikson
23456 into the flink-conf.yaml file results in the following DEBUG log output: 2020-05-08 16:20:47,461 DEBUG org.apache.flink.fs.s3hadoop.common.HadoopConfigLoader       [] - Adding Flink config entry for s3.connection.maximum as fs.s3a.connection.maximum to Hadoop config I guess that i

Re: Testing jobs locally agains secure Hadoop cluster

2020-05-11 Thread Khachatryan Roman
Hi Őrhidi, Can you please provide some details about the errors you get? Regards, Roman On Mon, May 11, 2020 at 9:32 AM Őrhidi Mátyás wrote: > Dear Community, > > I'm having troubles testing jobs against a secure Hadoop cluster. Is that > possible? The mini cluster seem

Testing jobs locally agains secure Hadoop cluster

2020-05-11 Thread Őrhidi Mátyás
Dear Community, I'm having troubles testing jobs against a secure Hadoop cluster. Is that possible? The mini cluster seems to not load any security modules. Thanks, Matyas

Re: flink-s3-fs-hadoop retry configuration

2020-05-08 Thread Robert Metzger
fs.s3a.connection.maximum to Hadoop config I guess that is the recommended way of passing configuration into the S3 connectors of Flink. You also asked how to detect retries: DEBUG-log level is helpful again. I just tried connecting against an invalid port, and got these messages: 2020-05-08 16:26

Re: flink-s3-fs-hadoop retry configuration

2020-05-08 Thread Robert Metzger
Hey Jeff, Which Flink version are you using? Have you tried configuring the S3 filesystem via Flink's config yaml? Afaik all config parameters prefixed with "s3." are mirrored into the Hadoop file system connector. On Mon, May 4, 2020 at 8:45 PM Jeff Henrikson wrote: > &

Overriding hadoop core-site.xml keys using the flink-fs-hadoop-shaded assemblies

2020-05-05 Thread Jeff Henrikson
Has anyone had success overriding hadoop core-site.xml keys using the flink-fs-hadoop-shaded assemblies? If so, what versions were known to work? Using btrace, I am seeing a bug in the hadoop shaded dependencies distributed with 1.10.0. Some (but not all) of the core-site.xml keys cannot be

Flink - Hadoop Connectivity - Unable to read file

2020-05-05 Thread Samik Mukherjee
Hi All, I am trying to get some file from HDFS which is locally installed. But I am not able to. I tried with both these ways. But all the time the program is ending with "Process finished with exit code 239." Any help will be helpful- public class Processor { public static void main(String[]

Re: flink-s3-fs-hadoop retry configuration

2020-05-04 Thread Jeff Henrikson
> 2) How can I tell if flink-s3-fs-hadoop is actually managing to pick up > the hadoop configuration I have provided, as opposed to some separate > default configuration? I'm reading the docs and source of flink-fs-hadoop-shaded. I see that core-default-shaded.xml has fs.s3a.conn

flink-s3-fs-hadoop retry configuration

2020-05-01 Thread Jeff Henrikson
Hello Flink users, I could use help with three related questions: 1) How can I observe retries in the flink-s3-fs-hadoop connector? 2) How can I tell if flink-s3-fs-hadoop is actually managing to pick up the hadoop configuration I have provided, as opposed to some separate default

Re: Hadoop user jar for flink 1.9 plus

2020-03-20 Thread Vishal Santoshi
Awesome, thanks! On Tue, Mar 17, 2020 at 11:14 AM Chesnay Schepler wrote: > You can download flink-shaded-hadoop from the downloads page: > https://flink.apache.org/downloads.html#additional-components > > On 17/03/2020 15:56, Vishal Santoshi wrote: > > We have been on flink 1

Re: Hadoop user jar for flink 1.9 plus

2020-03-17 Thread Chesnay Schepler
You can download flink-shaded-hadoop from the downloads page: https://flink.apache.org/downloads.html#additional-components On 17/03/2020 15:56, Vishal Santoshi wrote: We have been on flink 1.8.x on production and were planning to go to flink 1.9 or above. We have always used hadoop uber jar

Hadoop user jar for flink 1.9 plus

2020-03-17 Thread Vishal Santoshi
We have been on flink 1.8.x on production and were planning to go to flink 1.9 or above. We have always used hadoop uber jar from https://mvnrepository.com/artifact/org.apache.flink/flink-shaded-hadoop2-uber but it seems they go up to 1.8.3 and their distribution ends 2019. How do or where do we

Re: Building with Hadoop 3

2020-03-04 Thread Stephan Ewen
Have you tried to just export Hadoop 3's classpath to `HADOOP_CLASSPATH` and see if that works out of the box? If the main use case is HDFS access, then there is a fair chance it might just work, because Flink uses only a small subset of the Hadoop FS API which is stable between 2.x and 3.

RE: Building with Hadoop 3

2020-03-03 Thread LINZ, Arnaud
Hello, Have you shared it somewhere on the web already? Best, Arnaud De : vino yang Envoyé : mercredi 4 décembre 2019 11:55 À : Márton Balassi Cc : Chesnay Schepler ; Foster, Craig ; user@flink.apache.org; d...@flink.apache.org Objet : Re: Building with Hadoop 3 Hi Marton, Thanks for your

Re: Flink 1.10 - Hadoop libraries integration with plugins and class loading

2020-02-28 Thread Piotr Nowojski
Hi, > Since we have "flink-s3-fs-hadoop" at the plugins folder and therefore being > dynamically loaded upon task/job manager(s) startup (also, we are keeping > Flink's default inverted class loading strategy), shouldn't Hadoop > dependencies be l

Flink 1.10 - Hadoop libraries integration with plugins and class loading

2020-02-26 Thread Ricardo Cardante
AR is submitted to a Flink setup running on docker, we're getting the following exception: - java.lang.NoClassDefFoundError: org/apache/hadoop/fs/Path - Which refers to the usage of that class in a RichSinkFunction while b

Re: Re: Flink connect hive with hadoop HA

2020-02-14 Thread Robert Metzger
There's a configuration value "env.hadoop.conf.dir" to set the hadoop configuration directory: https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#env-hadoop-conf-dir If the files in that directory correctly configure Hadoop HA, the client side should pick up the

Re:Re: Flink connect hive with hadoop HA

2020-02-10 Thread sunfulin
Hi ,guys Thanks for kind reply. Actually I want to know how to change client side haddop conf while using table API within my program. Hope some useful sug. At 2020-02-11 02:42:31, "Bowen Li" wrote: Hi sunfulin, Sounds like you didn't config the hadoop HA correctl

Re: Flink connect hive with hadoop HA

2020-02-10 Thread Bowen Li
Hi sunfulin, Sounds like you didn't config the hadoop HA correctly on the client side according to [1]. Let us know if it helps resolve the issue. [1] https://stackoverflow.com/questions/25062788/namenode-ha-unknownhostexception-nameservice1 On Mon, Feb 10, 2020 at 7:11 AM Khachatryan

Re: Flink connect hive with hadoop HA

2020-02-10 Thread Khachatryan Roman
Hi, Could you please provide a full stacktrace? Regards, Roman On Mon, Feb 10, 2020 at 2:12 PM sunfulin wrote: > Hi, guys > I am using Flink 1.10 and test functional cases with hive intergration. > Hive with 1.1.0-cdh5.3.0 and with hadoop HA enabled.Running flink job I can > se

Flink connect hive with hadoop HA

2020-02-10 Thread sunfulin
Hi, guys I am using Flink 1.10 and test functional cases with hive intergration. Hive with 1.1.0-cdh5.3.0 and with hadoop HA enabled.Running flink job I can see successful connection with hive metastore, but cannot read table data with exception: java.lang.IllegalArgumentException

Re: Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR)

2020-01-28 Thread Arvid Heise
└── s3 (name is arbitrary) └── flink-s3-fs-hadoop.jar On Tue, Jan 28, 2020 at 9:18 AM Arvid Heise wrote: > Hi Aaron, > > I encountered a similar issue when running on EMR. On the slaves, there > are some lingering hadoop versions that are older than 2.7 (it was 2.6 if

Re: Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR)

2020-01-28 Thread Arvid Heise
Hi Aaron, I encountered a similar issue when running on EMR. On the slaves, there are some lingering hadoop versions that are older than 2.7 (it was 2.6 if I remember correctly), which bleed into the classpath of Flink. Flink checks the Hadoop version to check if certain capabilities like file

Re: Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR)

2020-01-24 Thread Aaron Langford
This seems to confirm that the S3 file system implementation is not being loaded when you start your job. Can you share the details of how you are getting the flink-s3-fs-hadoop artifact onto your cluster? Are you simply ssh-ing to the master node and doing this manually? Are you doing this via a

Re: Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR)

2020-01-24 Thread Senthil Kumar
e.org" Subject: Re: Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR) When creating your cluster, you can provide configurations that EMR will find the right home for. Example for the aws cli: aws emr create-cluster ... --configurations '[{ "Classification": "

Re: Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR)

2020-01-23 Thread Aaron Langford
"InstanceGroupId": "", > "Configurations": [{ > "Classification": "flink-log4j", > "Properties": { > "log4j.rootLogger": "DEBUG,file" > } > },{ >

Re: Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR)

2020-01-23 Thread Senthil Kumar
Could you tell us how to turn on debug level logs? We attempted this (on driver) sudo stop hadoop-yarn-resourcemanager followed the instructions here https://stackoverflow.com/questions/27853974/how-to-set-debug-log-level-for-resourcemanager and sudo start hadoop-yarn-resourcemanager but we

Re: Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR)

2020-01-21 Thread Aaron Langford
thil Kumar wrote: > Yang, I appreciate your help! Please let me know if I can provide with any > other info. > > > > I resubmitted my executable jar file as a step to the flink EMR and here’s > are all the exceptions. I see two of them. > > > > I fished them out of /v

Re: Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR)

2020-01-21 Thread Senthil Kumar
Yang, I appreciate your help! Please let me know if I can provide with any other info. I resubmitted my executable jar file as a step to the flink EMR and here’s are all the exceptions. I see two of them. I fished them out of /var/log/Hadoop//syslog 2020-01-21 16:31:37,587 ERROR

Re: Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR)

2020-01-18 Thread Yang Wang
I think this exception is not because the hadoop version isn't high enough. It seems that the "s3" URI scheme could not be recognized by `S3FileSystemFactory`. So it fallbacks to the `HadoopFsFactory`. Could you share the debug level jobmanager/taskmanger logs so that we could conf

Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR)

2020-01-17 Thread Senthil Kumar
Hello all, Newbie here! We are running in Amazon EMR with the following installed in the EMR Software Configuration Hadoop 2.8.5 JupyterHub 1.0.0 Ganglia 3.7.2 Hive 2.3.6 Flink 1.9.0 I am trying to get a Streaming job from one S3 bucket into an another S3 bucket using the

Re: Building with Hadoop 3

2019-12-04 Thread vino yang
Hi Marton, Thanks for your explanation. Personally, I look forward to your contribution! Best, Vino Márton Balassi 于2019年12月4日周三 下午5:15写道: > Wearing my Cloudera hat I can tell you that we have done this exercise for > our distros of the 3.0 and 3.1 Hadoop versions. We have not contr

Re: Building with Hadoop 3

2019-12-04 Thread Márton Balassi
Wearing my Cloudera hat I can tell you that we have done this exercise for our distros of the 3.0 and 3.1 Hadoop versions. We have not contributed these back just yet, but we are open to do so. If the community is interested we can contribute those changes back to flink-shaded and suggest the

Re: Building with Hadoop 3

2019-12-04 Thread Chesnay Schepler
There's no JIRA and no one actively working on it. I'm not aware of any investigations on the matter; hence the first step would be to just try it out. A flink-shaded artifact isn't a hard requirement; Flink will work with any 2.X hadoop distribution (provided that there aren&#

Re: Building with Hadoop 3

2019-12-03 Thread vino yang
cc @Chesnay Schepler to answer this question. Foster, Craig 于2019年12月4日周三 上午1:22写道: > Hi: > > I don’t see a JIRA for Hadoop 3 support. I see a comment on a JIRA here > from a year ago that no one is looking into Hadoop 3 support [1]. Is there > a document or JIRA that now exi

  1   2   3   4   >