is to run " hadoop classpath " command, and paste
> its value for export HADOOP_CLASSPATH variable.
>
> This way we don't have to copy any S3 specific jars to Flink lib folder.
>
>
>
> --
> View this message in context: http://apache-flink-user-
> mail
336050.n4.nabble.com/Integrate-Flink-with-S3-on-EMR-cluster-tp5894p12101.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at
Nabble.com.
@vinay patil - Can you see if the same problem occurs if you use Flink 1.1
- to see if this is a regression in Flink 1.2?
On Tue, Mar 7, 2017 at 6:43 PM, Shannon Carey wrote:
> Generally, using S3 filesystem in EMR with Flink has worked pretty well
> for me in Flink < 1.2 (unless you run out o
Generally, using S3 filesystem in EMR with Flink has worked pretty well for me
in Flink < 1.2 (unless you run out of connections in your HTTP pool). When you
say, "using Hadoop File System class", what do you mean? In my experience, it's
sufficient to just use the "s3://" filesystem protocol and
eb.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
--
View this mes
libs are already included in the Hadoop
classpath.
Is there any other way I can make this work ?
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Integrate-Flink-with-S3-on-EMR-cluster-tp5894p12053.html
Sent from the Apache Flink User Mailing
You can always explicitly request a broadcast join, via "joinWithTiny",
"joinWithHuge", or by supplying a JoinHint.
Greetings,
Stephan
On Sat, Apr 9, 2016 at 1:56 AM, Timur Fayruzov
wrote:
> Thank you Robert. One of my test cases is broadcast join, so I need to
> make statistics work. The only
Thank you Robert. One of my test cases is broadcast join, so I need to make
statistics work. The only workaround I have found so far is to copy the
contents of /usr/share/aws/emr/emrfs/lib/, /usr/share/aws/aws-java-sdk/ and
/usr/share/aws/emr/s3-dist-cp/lib/s3-dist-cp-2.2.0.jar to flink/lib.
Puttin
Hi Timur,
the Flink optimizer runs on the client, so the exception is thrown from the
JVM running the ./bin/flink client.
Since the statistics sampling is an optional step, its surrounded by a try
/ catch block that just logs the error message.
More answers inline below
On Thu, Apr 7, 2016 at 1
The exception does not show up in the console when I run the job, it only
shows in the logs. I thought it means that it happens either on AM or TM (I
assume what I see in stdout is client log). Is my thinking right?
On Thu, Apr 7, 2016 at 12:29 PM, Ufuk Celebi wrote:
> Hey Timur,
>
> Just had a
Hey Timur,
Just had a chat with Robert about this. I agree that the error message
is confusing, but it is fine it this case. The file system classes are
not on the class path of the client process, which is submitting the
job. It fails to sample the input file sizes, but this is just an
optimizati
There's one more filesystem integration failure that I have found. My job
on a toy dataset succeeds, but Flink log contains the following message:
2016-04-07 18:10:01,339 ERROR
org.apache.flink.api.common.io.DelimitedInputFormat - Unexpected
problen while getting the file statistics for f
Yes, for sure.
I added some documentation for AWS here:
https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/aws.html
Would be nice to update that page with your pull request. :-)
– Ufuk
On Wed, Apr 6, 2016 at 4:58 AM, Chiwan Park wrote:
> Hi Timur,
>
> Great! Bootstrap action fo
Hi Timur,
Great! Bootstrap action for Flink is good for AWS users. I think the bootstrap
action scripts would be placed in `flink-contrib` directory.
If you want, one of people in PMC of Flink will be assign FLINK-1337 to you.
Regards,
Chiwan Park
> On Apr 6, 2016, at 3:36 AM, Timur Fayruzov
Yes, Hadoop version was the culprit. It turns out that EMRFS requires at
least 2.4.0 (judging from the exception in the initial post, I was not able
to find the official requirements).
Rebuilding Flink with Hadoop 2.7.1 and with Scala 2.11 worked like a charm
and I was able to run WordCount using
Hey Timur,
if you are using EMR with IAM roles, Flink should work out of the box.
You don't need to change the Hadoop config and the IAM role takes care
of setting up all credentials at runtime. You don't need to hardcode
any keys in your application that way and this is the recommended way
to go
Hello Ufuk,
I'm using EMR 4.4.0.
Thanks,
Timur
On Tue, Apr 5, 2016 at 2:18 AM, Ufuk Celebi wrote:
> Hey Timur,
>
> which EMR version are you using?
>
> – Ufuk
>
> On Tue, Apr 5, 2016 at 1:43 AM, Timur Fayruzov
> wrote:
> > Thanks for the answer, Ken.
> >
> > My understanding is that file syst
Hey Timur,
which EMR version are you using?
– Ufuk
On Tue, Apr 5, 2016 at 1:43 AM, Timur Fayruzov wrote:
> Thanks for the answer, Ken.
>
> My understanding is that file system selection is driven by the following
> sections in core-site.xml:
>
> fs.s3.impl
>
> org.apache.hadoop.fs.s3na
Thanks for the answer, Ken.
My understanding is that file system selection is driven by the following
sections in core-site.xml:
fs.s3.impl
org.apache.hadoop.fs.s3native.NativeS3FileSystem
fs.s3n.impl
com.amazon.ws.emr.hadoop.fs.EmrFileSystem
If I run the program using configurat
Normally in Hadoop jobs you’d want to use s3n:// as the protocol, not s3.
Though EMR has some support for magically treating the s3 protocol as s3n (or
maybe s3a now, with Hadoop 2.6 or later)
What happens if you use s3n:/// for the --input
parameter?
— Ken
> On Apr 4, 2016, at 2:51pm, Timur
Hello,
I'm trying to run a Flink WordCount job on an AWS EMR cluster. I succeeded
with a three-step procedure: load data from S3 to cluster's HDFS, run Flink
Job, unload outputs from HDFS to S3.
However, ideally I'd like to read/write data directly from/to S3. I
followed the instructions here:
ht
21 matches
Mail list logo