Hi Vinay,
using the HADOOP_CLASSPATH variable on the client machine is the
recommended way to solve this problem.
I'll update the documentation accordingly.
On Wed, Mar 8, 2017 at 10:26 AM, vinay patil
wrote:
> Hi ,
>
> @Shannon - I am not facing any issue while writing to S3, was getting
> N
Hi ,
@Shannon - I am not facing any issue while writing to S3, was getting
NoClassDef errors when reading the file from S3.
''Hadoop File System" - I mean I am using FileSystem class of Hadoop to read
the file from S3.
@Stephan - I tried with 1.1.4 , was getting the same issue.
The easiest way
@vinay patil - Can you see if the same problem occurs if you use Flink 1.1
- to see if this is a regression in Flink 1.2?
On Tue, Mar 7, 2017 at 6:43 PM, Shannon Carey wrote:
> Generally, using S3 filesystem in EMR with Flink has worked pretty well
> for me in Flink < 1.2 (unless you run out o
Generally, using S3 filesystem in EMR with Flink has worked pretty well for me
in Flink < 1.2 (unless you run out of connections in your HTTP pool). When you
say, "using Hadoop File System class", what do you mean? In my experience, it's
sufficient to just use the "s3://" filesystem protocol and
Hi Guys,
Has anyone got this error before ? If yes, have you found any other
solution apart from copying the jar files to flink lib folder
Regards,
Vinay Patil
On Mon, Mar 6, 2017 at 8:21 PM, vinay patil [via Apache Flink User Mailing
List archive.] wrote:
> Hi Guys,
>
> I am getting the same
Hi Guys,
I am getting the same exception:
EMRFileSystem not Found
I am trying to read encrypted S3 file using Hadoop File System class.
(using Flink 1.2.0)
When I copy all the libs from /usr/share/aws/emrfs/lib and /usr/lib/hadoop
to Flink lib folder , it works.
However I see that all these lib
You can always explicitly request a broadcast join, via "joinWithTiny",
"joinWithHuge", or by supplying a JoinHint.
Greetings,
Stephan
On Sat, Apr 9, 2016 at 1:56 AM, Timur Fayruzov
wrote:
> Thank you Robert. One of my test cases is broadcast join, so I need to
> make statistics work. The only
Thank you Robert. One of my test cases is broadcast join, so I need to make
statistics work. The only workaround I have found so far is to copy the
contents of /usr/share/aws/emr/emrfs/lib/, /usr/share/aws/aws-java-sdk/ and
/usr/share/aws/emr/s3-dist-cp/lib/s3-dist-cp-2.2.0.jar to flink/lib.
Puttin
Hi Timur,
the Flink optimizer runs on the client, so the exception is thrown from the
JVM running the ./bin/flink client.
Since the statistics sampling is an optional step, its surrounded by a try
/ catch block that just logs the error message.
More answers inline below
On Thu, Apr 7, 2016 at 1
The exception does not show up in the console when I run the job, it only
shows in the logs. I thought it means that it happens either on AM or TM (I
assume what I see in stdout is client log). Is my thinking right?
On Thu, Apr 7, 2016 at 12:29 PM, Ufuk Celebi wrote:
> Hey Timur,
>
> Just had a
Hey Timur,
Just had a chat with Robert about this. I agree that the error message
is confusing, but it is fine it this case. The file system classes are
not on the class path of the client process, which is submitting the
job. It fails to sample the input file sizes, but this is just an
optimizati
There's one more filesystem integration failure that I have found. My job
on a toy dataset succeeds, but Flink log contains the following message:
2016-04-07 18:10:01,339 ERROR
org.apache.flink.api.common.io.DelimitedInputFormat - Unexpected
problen while getting the file statistics for f
Yes, for sure.
I added some documentation for AWS here:
https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/aws.html
Would be nice to update that page with your pull request. :-)
– Ufuk
On Wed, Apr 6, 2016 at 4:58 AM, Chiwan Park wrote:
> Hi Timur,
>
> Great! Bootstrap action fo
Hi Timur,
Great! Bootstrap action for Flink is good for AWS users. I think the bootstrap
action scripts would be placed in `flink-contrib` directory.
If you want, one of people in PMC of Flink will be assign FLINK-1337 to you.
Regards,
Chiwan Park
> On Apr 6, 2016, at 3:36 AM, Timur Fayruzov
Yes, Hadoop version was the culprit. It turns out that EMRFS requires at
least 2.4.0 (judging from the exception in the initial post, I was not able
to find the official requirements).
Rebuilding Flink with Hadoop 2.7.1 and with Scala 2.11 worked like a charm
and I was able to run WordCount using
Hey Timur,
if you are using EMR with IAM roles, Flink should work out of the box.
You don't need to change the Hadoop config and the IAM role takes care
of setting up all credentials at runtime. You don't need to hardcode
any keys in your application that way and this is the recommended way
to go
Hello Ufuk,
I'm using EMR 4.4.0.
Thanks,
Timur
On Tue, Apr 5, 2016 at 2:18 AM, Ufuk Celebi wrote:
> Hey Timur,
>
> which EMR version are you using?
>
> – Ufuk
>
> On Tue, Apr 5, 2016 at 1:43 AM, Timur Fayruzov
> wrote:
> > Thanks for the answer, Ken.
> >
> > My understanding is that file syst
Hey Timur,
which EMR version are you using?
– Ufuk
On Tue, Apr 5, 2016 at 1:43 AM, Timur Fayruzov wrote:
> Thanks for the answer, Ken.
>
> My understanding is that file system selection is driven by the following
> sections in core-site.xml:
>
> fs.s3.impl
>
> org.apache.hadoop.fs.s3na
Thanks for the answer, Ken.
My understanding is that file system selection is driven by the following
sections in core-site.xml:
fs.s3.impl
org.apache.hadoop.fs.s3native.NativeS3FileSystem
fs.s3n.impl
com.amazon.ws.emr.hadoop.fs.EmrFileSystem
If I run the program using configurat
Normally in Hadoop jobs you’d want to use s3n:// as the protocol, not s3.
Though EMR has some support for magically treating the s3 protocol as s3n (or
maybe s3a now, with Hadoop 2.6 or later)
What happens if you use s3n:/// for the --input
parameter?
— Ken
> On Apr 4, 2016, at 2:51pm, Timur
20 matches
Mail list logo