Yes, the issue I mentioned is solved in the next release.
On Tue, May 5, 2015 at 12:56 AM, Flavio Pompermaier
wrote:
> Thanks for the support! so this issue will be solved in the next 0.9
> release?
>
> On Tue, May 5, 2015 at 12:17 AM, Stephan Ewen wrote:
>
>> Here is a list of all values you c
Thanks for the support! so this issue will be solved in the next 0.9
release?
On Tue, May 5, 2015 at 12:17 AM, Stephan Ewen wrote:
> Here is a list of all values you can set:
> http://ci.apache.org/projects/flink/flink-docs-master/setup/config.html
>
> On Tue, May 5, 2015 at 12:17 AM, Stephan Ew
Here is a list of all values you can set:
http://ci.apache.org/projects/flink/flink-docs-master/setup/config.html
On Tue, May 5, 2015 at 12:17 AM, Stephan Ewen wrote:
> Hi Flavio!
>
> This may be a known and fixed issue. It relates to the fact that task
> deployment may take long in case of big
Hi Flavio!
This may be a known and fixed issue. It relates to the fact that task
deployment may take long in case of big jar files. The current master
should not have this issue any more, but 0.9-SNAPSHOT has it.
As a temporary workaround, you can increase "akka.ask.timeout"in the flink
configura
Hi to all,
In my current (local) job I receive a lot of Akka timeout errors during
task deploy at:
org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:342)
is it normal? Which parameter do I have to increase?
Best,
Flavio
Hi Stephan,
I confirm that I was using custom types in the collect(), and that the bug is
not present in the master.
Thanks
Flavio
From: ewenstep...@gmail.com [mailto:ewenstep...@gmail.com] On Behalf Of Stephan
Ewen
Sent: Monday, May 04, 2015 2:33 PM
To: user@flink.apache.org
Subj
That might help with cardinality estimation for cost-based optimization.
For example when deciding about join strategies (broadcast vs. repartition,
build-side of a hash join).
However, as Stephan said, there are many cases where it does not make a
difference, e.g. if the input cardinality of the f
When you use cross() it is always goot to use crossWithLarge or
crossWithTiny to tell the system which side is small. It can not always
infer that automagically at this point.
If you have an optimized structure for the lookups, go with a broadcast
variable and a map() function.
On Mon, May 4, 201
If the system has to decide data shipping strategies for a join (e.g.,
broadcasting one side) it helps to have good estimates of the input sizes.
On 04.05.2015 14:53, Flavio Pompermaier wrote:
Thanks Sebastian and Fabian for the feedback, just one last question:
what does change from the system
Thanks Sebastian and Fabian for the feedback, just one last question:
what does change from the system point of view to know that the output
tuples is <= the number of input tuples?
Is there any optimization that Flink can apply to the pipeline?
On Mon, May 4, 2015 at 2:49 PM, Fabian Hueske wrot
Hah, interesting to see how opinions differ ;-)
Sebastian has a point, that filter + project is more transparent to the
system. In some situations, this knowledge can help the optimizer, but
often, it will not matter.
Greetings,
Stephan
On Mon, May 4, 2015 at 2:49 PM, Fabian Hueske wrote:
> I
It should not make a difference. I think its just personal taste.
If your filter condition is simple enough, I'd go with Flink's Table API
because it does not require to define a Filter or FlatMapFunction.
2015-05-04 14:43 GMT+02:00 Flavio Pompermaier :
> Hi Flinkers,
>
> I'd like to know wheth
filter + project is easier to understand for the system, as the number
of output tuples is guaranteed to be <= the number of input tuples. With
flatMap, the system cannot know an upper bound.
--sebastian
On 04.05.2015 14:43, Flavio Pompermaier wrote:
Hi Flinkers,
I'd like to know whether it'
Hi Flinkers,
I'd like to know whether it's better to perform a filter.project or a
flatMap to filter tuples and do some projection after the filter.
Functionally they are equivalent but maybe I'm ignoring something..
Thanks in advance,
Flavio
Hi Flavio!
This issue is known and has been fixed already. It occurs when you use
custom types in collect, because it uses the wrong classloader/serializer
to transfer them.
The current master should not have this issue any more.
Greetings,
Stephan
On Mon, May 4, 2015 at 2:09 PM, Flavio Baront
Hello,
I'm testing the new DataSet.collect() method on version 0.9-milestone-1, but
I obtain the following error on cluster execution (no problem with local
execution), which also causes the job manager to crash:
14:05:41,145 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph
- Deployin
Hi,
Thanks. The use case I have right now does not require too much magic ; my
historical data set is small enough to fit in RAM, I'll spread it over each
node and use a simple mapping with a log(n) look up. It was more a theorical
question.
If my dataset becomes too large, I may use some hashi
Hi,
there is no other system support to express this join.
However, you could perform some "hand wired" optimization by
partitioning your input data into distinct intervals. It might be tricky
though. Especially, if the time-ranges in your "range-key" dataset are
overlapping everywhere (-> data r
Hello,
I was wondering how to join large data sets on inequalities.
Let say I have a data set whose “keys” are two timestamps (start time & end
time of validity) and value is a label :
final DataSet> historical = …;
I also have events, with an event name and a timestamp :
final
19 matches
Mail list logo