Hi, I find wordcount on Flink is slow and 75% of the time is spent on
groupBy operator. The dataset is 90G, with only 1000 distinct words. Could
you tell me how the groupBy is implemented?
Best Regards,
Jeffrey
Hi Johann!
You can try and use the Table API, it has logical tuples that you program
with, rather than tuple classes.
Have a look here:
https://ci.apache.org/projects/flink/flink-docs-master/libs/table.html
Stephan
On Thu, Oct 29, 2015 at 6:53 AM, Fabian Hueske wrote:
> Hi Johann,
>
> I see
Hi Johann,
I see three options for your use case.
1) Generate Pojo code at planning time, i.e., when the program is composed.
This does not work when the program is already running. The benefit is that
you can use key expressions, have typed fields, and type specific
serializers and comparators.
Hi Thomas,
Try to switch to Emr amo 3.5 and register hadoop's s3 FileSystem instead of
the one packed with flink
*Sent from my ZenFone
On Oct 29, 2015 4:36 AM, "Thomas Götzinger" wrote:
> Hello Flink Team,
>
> We at IESE Fraunhofer are evaluating Flink for a project and I'm a bit
> frustrated i
Ok, thanks a lot for the info guys!
On Thu, Oct 29, 2015 at 11:30 AM, Maximilian Michels wrote:
> Here's the jira issue for the cancel button:
> https://issues.apache.org/jira/browse/FLINK-2939
>
> On Thu, Oct 29, 2015 at 11:28 AM, Aljoscha Krettek
> wrote:
>
>> Hi
>> yes, a lot of people have
Here's the jira issue for the cancel button:
https://issues.apache.org/jira/browse/FLINK-2939
On Thu, Oct 29, 2015 at 11:28 AM, Aljoscha Krettek
wrote:
> Hi
> yes, a lot of people have complained about the missing cancel button
> already. :D (myself included)
>
> The number of retained jobs can
Hi
yes, a lot of people have complained about the missing cancel button already.
:D (myself included)
The number of retained jobs can be configured in conf/flink-conf.yaml by
setting the configuration key “jobmanager.web.history” to a different number.
Cheers,
Aljoscha
> On 29 Oct 2015, at 11
Yes, I was referring exactly to that :)
Thanks for the clarification Aljoscha.
Is it planned to improve the dashboard with some button to manage jobs
(cancel for example could be useful when running tests..)?
And where do I set the number of completed jobs to show in history?
On Thu, Oct 29, 2015
Hi,
are you referring to the “Job statistics/Accumulators” tab? This tab does not
display actual information but is a placeholder page that we forgot to remove.
It will be removed before the 0.10 release, there is currently a pull request
open to remove it.
Cheers,
Aljoscha
> On 29 Oct 2015, at
Hi to all,
I'm using Flink 0.10-SNAPSHOT and on my cluster I've tested the new
Dashboard (some days ago).
In the job info the parallelism was wrong (I see 2 but it's 36).
Does it happen only to me..?
Best,
Flavio
Hi Thomas,
until recently, Flink provided an own implementation of a S3FileSystem
which wasn't fully tested and buggy.
We removed that implementation and are using now (in 0.10-SNAPSHOT)
Hadoop's S3 implementation by default.
If you want to continue using 0.9.1 you can configure Flink to use Hado
Hello Flink Team,
We at IESE Fraunhofer are evaluating Flink for a project and I'm a bit
frustrated in the moment.
I've wrote a few testcases with the flink API and want to deploy them to
an Flink EC2 Cluster. I setup the cluster using the
karamel receipt which was adressed in the following video
12 matches
Mail list logo