Hi!
The bottom of this page also has an illustration of task to task slots.
https://ci.apache.org/projects/flink/flink-docs-release-0.9/setup/config.html
There are two optimizations involved:
(1) Chaining:
Here sources, mappers, filters are chained together. This is pretty
classic, most systems
Good morning!
I have the following usecase:
My program reads nested data (in this specific case XML) based on
projections (path expressions) of this data. Often multiple paths are
projected onto the same input. I would like each path to result in its own
dataset.
Is it possible to generate more
Hi Pieter,
at the moment there is no support to partition a `DataSet` into multiple
sub sets with one pass over it. If you really want to have distinct data
sets for each path, then you have to filter, afaik.
Cheers,
Till
On Thu, Oct 22, 2015 at 11:38 AM, Pieter Hameete wrote:
> Good morning!
Hello!
> I have thought about a workaround where the InputFormat would return
> Tuple2s and the first field is the name of the dataset to which a record
> belongs. This would however require me to filter the read data once for
> each dataset or to do a groupReduce which is some overhead i'm
> look
I fear that the filter operations are not chained because there are at
least two of them which have the same DataSet as input. However, it's true
that the intermediate results are not materialized.
It is also correct that the filter operators are deployed colocated to the
data sources. Thus, there
It might even be materialized (to disk) if both derived data sets are
joined.
2015-10-22 12:01 GMT+02:00 Till Rohrmann :
> I fear that the filter operations are not chained because there are at
> least two of them which have the same DataSet as input. However, it's true
> that the intermediate re
Thanks for your responses!
The derived datasets would indeed be grouped after the filter operations.
Why would this cause them to be materialized to disk? And if I understand
correctly the the data source will not chain to more than one filter,
causing (de)serialization to transfer the records fro
Hello,
Trying to understand why my code was giving strange results, I’ve ended up
adding “useless” controls in my code and came with what seems to me a bug. I
group my dataset according to a key, but in the reduceGroup function I am
passed values with different keys.
My code has the following
In principle, a data set the branches needs only to be materialized if both
branches are pipelined until they are merged (i.e., in a hybrid-hash join).
Otherwise, the data flow might deadlock due to pipelining.
If you group both data sets before they are joined, the pipeline is broken
due to the b
Hi!
You are checking for equality / inequality with "!=" - can you check with
"equals()" ?
The key objects will most certainly be different in each record (as they
are deserialized individually), but they should be equal.
Stephan
On Thu, Oct 22, 2015 at 12:20 PM, LINZ, Arnaud
wrote:
> Hello,
Hi,
but he’s comparing it to a primitive long, so shouldn’t the Long key be unboxed
and the comparison still be valid?
My question is whether you enabled object-reuse-mode on the
ExecutionEnvironment?
Cheers,
Aljoscha
> On 22 Oct 2015, at 12:31, Stephan Ewen wrote:
>
> Hi!
>
> You are checki
If not, could you provide us with the program and test data to reproduce
the error?
Cheers,
Till
On Thu, Oct 22, 2015 at 12:34 PM, Aljoscha Krettek
wrote:
> Hi,
> but he’s comparing it to a primitive long, so shouldn’t the Long key be
> unboxed and the comparison still be valid?
>
> My question
Hi,
I was using primitive types, and EnableObjectReuse was turned on. My next move
was to turn it off, and it did solved the problem.
It also increased execution time by 10%, but it’s hard to say if this overhead
is due to the copy or to the change of behavior of the reduceGroup algorithm
once
With object reuse activated, Flink heavily reuses objects. Each call to the
Iterator in the reduceGroup function gives back one of the same two
objects, with has been filled with different contents.
Your list of all values will effectively only contain two different objects.
Further more, the loo
You don’t modify the objects, however, the ReusingKeyGroupedIterator, which
is the iterator you have in your reduce function, does. Internally it uses
two objects, in your case of type Tuple2, to
deserialize the input records. These two objects are alternately returned
when you call next on the ite
Hi,
I want to write a long running (i.e. never stop it) streaming flink
application on a kerberos secured Hadoop/Yarn cluster. My application needs
to do things with files on HDFS and HBase tables on that cluster so having
the correct kerberos tickets is very important. The stream is to be
ingeste
Hi,
Thanks a lot for the explanation. I cannot even say that it wasn’t stated in
the documentation, I’ve simply missed the iterator part :
“by default, user defined functions (like map() or reduce()) are getting new
objects on each call (or through an iterator). So it is possible to keep
refe
Stephan Ewen wrote
> This is actually not a bug, or a POJO or Avro problem. It is simply a
> limitation in the functionality, as the exception message says:
> "Specifying
> fields by name is only supported on Case Classes (for now)."
>
> Try this with a regular reduce function that selects the max
Hi Trevor,
that’s actually my bad since I only tested my branch against a remote
cluster. I fixed the problem (not properly starting the
LocalFlinkMiniCluster) so that you can now use Zeppelin also in local mode.
Just check out my branch again.
Cheers,
Till
On Wed, Oct 21, 2015 at 10:00 PM, Tr
Hi,
I am trying to load the dataset with the part of null value by using
readCsvFile().
// e.g _date|_click|_sales|_item|_web_page|_user
case class WebClick(_click_date: Long, _click_time: Long, _sales: Int,
_item: Int,_page: Int, _user: Int)
private def getWebClickDataSet(env: ExecutionEnviro
In the Java API, we only support the `max` operation for tuple types where
you reference the fields via indices.
Cheers,
Till
On Thu, Oct 22, 2015 at 4:04 PM, aawhitaker wrote:
> Stephan Ewen wrote
> > This is actually not a bug, or a POJO or Avro problem. It is simply a
> > limitation in the f
21 matches
Mail list logo