Hi Kostia,
thank you for writing to the Flink mailing list. I actually started to try
out our S3 File system support after I saw your question on StackOverflow
[1].
I found that our S3 connector is very broken. I had to resolve two more
issues with it, before I was able to get the same exception y
Hi guys,
I,m trying to get work Apache Flink 0.9.1 on EMR, basically to read
data from S3. I tried the following path for data
s3://mybucket.s3.amazonaws.com/folder, but it throws me the following
exception:
java.io.IOException: Cannot establish connection to Amazon S3:
com.amazonaws.services
I filed a bug for this issue in our bug tracker
https://issues.apache.org/jira/browse/FLINK-2821 (even though we can not do
much about it, we should track the resolution of the issue).
On Mon, Oct 5, 2015 at 5:34 AM, Stephan Ewen wrote:
> I think this is yet another problem caused by Akka's over
Hi Hanen,
It appears that the environment variables are not set. Thus, Flink cannot
pick up the Hadoop configuration. Could you please paste the output of
"echo $HADOOP_HOME" and "echo $HADOOP_CONF_DIR" here?
In any case, your problem looks similar to the one discussed here:
http://stackoverflow.
Hi all,
I tried to start a Yarn session on an Amazon EMR cluster with Hadoop 2.6.0
following the instructions provided in this link and using Flink 0.9.1 for
Hadoop 2.6.0
https://ci.apache.org/projects/flink/flink-docs-release-0.9/setup/yarn_setup.html
Running the following command line: ./bin/ya
Hi Pieter,
a FlatMapFunction can only return values when the map() method is called.
However, in your use case, you would like to return values *after* the
function was called the last time. This is not possible with a
FlatMapFunction, because you cannot identify the last map() call.
The MapPartit
Hi Flavio,
I don't think this is a feature that needs to go into Flink core.
To me it looks like this be implemented as a utility method by anybody who
needs it without major effort.
Best, Fabian
2015-10-02 15:27 GMT+02:00 Flavio Pompermaier :
> Hi to all,
>
> in many of my jobs I have to read
Okay, nice to hear it works out!
On Mon, Oct 5, 2015 at 1:50 PM, Pieter Hameete wrote:
> Hi Stephen,
>
> it was not the SimpleInputProjection, because that is a stateless object.
> The boolean endReached was not reset upon opening a new file however, so
> for each consecutive file no records wer
Hi Stephen,
it was not the SimpleInputProjection, because that is a stateless object.
The boolean endReached was not reset upon opening a new file however, so
for each consecutive file no records were parsed.
Thanks alot for your help!
- Pieter
2015-10-05 12:50 GMT+02:00 Stephan Ewen :
> If yo
If you have more files than task slots, then some tasks will get multiple
files. That means that open() and close() are called multiple times on the
input format.
Make sure that your input format tolerates that and does not get confused
with lingering state (maybe create a new SimpleInputProjectio
Hi Stephen,
it concerns the DataSet API.
The program im running can be found at
https://github.com/PHameete/dawn-flink/blob/development/src/main/scala/wis/dawnflink/performance/xmark/XMarkQuery11.scala
The Custom Input Format at
https://github.com/PHameete/dawn-flink/blob/development/src/main/sca
I assume this concerns the streaming API?
Can you share your program and/or the custom input format code?
On Mon, Oct 5, 2015 at 12:33 PM, Pieter Hameete wrote:
> Hello Flinkers!
>
> I run into some strange behavior when reading from a folder of input files.
>
> When the number of input files i
I think this is yet another problem caused by Akka's overly strict message
routing.
An actor system bound to a certain URL can only receive messages that are
sent to that exact URL. All other messages are dropped.
This has many problems:
- Proxy routing (as described here, send to the proxy UR
Hello Flinkers!
I run into some strange behavior when reading from a folder of input files.
When the number of input files in the folder exceeds the number of task
slots I noticed that the size of my datasets varies with each run. It seems
as if the transformations don't wait for all input files
Thanks, the off-heap solution is indeed faster. 15s instead of 45s for the
amounts of memory I allocate.
On Fri, Oct 2, 2015 at 6:09 PM, Stephan Ewen wrote:
> Yeah, registration is fast, JVM heatup is what takes time.
>
> You can try two things:
>
> - Use the off-heap memory variant and see if
Hi Fabian,
I have a question regarding the first approach. Is there a benefit gained
from choosing a RichMapPartitionFunction over a RichMapFunction in this
case? I assume that each broadcasted dataset is sent only once to each task
manager?
If I would broadcast dataset B, then I could for each e
Matthias' solution should work in most cases.
In cases where you do not control the source (or the source can never be
finite, like the Kafka source), we often use a trick in the tests, which is
throwing a special type of exception (a SuccessException).
You can catch this exception on env.execute
Hi,
you just need to terminate your source (ie, return from run() method if
you implement your own source function). This will finish the complete
program. For already available sources, just make sure you read finite
input.
Hope this helps.
-Matthias
On 10/05/2015 12:15 AM, jay vyas wrote:
> H
I’m not a Janino expert but it might be related to the fact that Janino not
fully supports generic types (see http://unkrig.de/w/Janino under
limitations). Maybe it works of you use the untyped MapFunction type.
Cheers,
Till
On Sat, Oct 3, 2015 at 8:04 PM, Giacomo Licari
wrote:
> Hi guys,
> I
19 matches
Mail list logo