date:20151022

Re: Question regarding parallelism

2015-10-22 Thread Stephan Ewen

Hi! The bottom of this page also has an illustration of task to task slots. https://ci.apache.org/projects/flink/flink-docs-release-0.9/setup/config.html There are two optimizations involved: (1) Chaining: Here sources, mappers, filters are chained together. This is pretty classic, most systems

Reading multiple datasets with one read operation

2015-10-22 Thread Pieter Hameete

Good morning! I have the following usecase: My program reads nested data (in this specific case XML) based on projections (path expressions) of this data. Often multiple paths are projected onto the same input. I would like each path to result in its own dataset. Is it possible to generate more

Re: Reading multiple datasets with one read operation

2015-10-22 Thread Till Rohrmann

Hi Pieter, at the moment there is no support to partition a `DataSet` into multiple sub sets with one pass over it. If you really want to have distinct data sets for each path, then you have to filter, afaik. Cheers, Till On Thu, Oct 22, 2015 at 11:38 AM, Pieter Hameete wrote: > Good morning!

Re: Reading multiple datasets with one read operation

2015-10-22 Thread Gábor Gévay

Hello! > I have thought about a workaround where the InputFormat would return > Tuple2s and the first field is the name of the dataset to which a record > belongs. This would however require me to filter the read data once for > each dataset or to do a groupReduce which is some overhead i'm > look

Re: Reading multiple datasets with one read operation

2015-10-22 Thread Till Rohrmann

I fear that the filter operations are not chained because there are at least two of them which have the same DataSet as input. However, it's true that the intermediate results are not materialized. It is also correct that the filter operators are deployed colocated to the data sources. Thus, there

Re: Reading multiple datasets with one read operation

2015-10-22 Thread Fabian Hueske

It might even be materialized (to disk) if both derived data sets are joined. 2015-10-22 12:01 GMT+02:00 Till Rohrmann : > I fear that the filter operations are not chained because there are at > least two of them which have the same DataSet as input. However, it's true > that the intermediate re

Re: Reading multiple datasets with one read operation

2015-10-22 Thread Pieter Hameete

Thanks for your responses! The derived datasets would indeed be grouped after the filter operations. Why would this cause them to be materialized to disk? And if I understand correctly the the data source will not chain to more than one filter, causing (de)serialization to transfer the records fro

Multiple keys in reduceGroup ?

2015-10-22 Thread LINZ, Arnaud

Hello, Trying to understand why my code was giving strange results, I’ve ended up adding “useless” controls in my code and came with what seems to me a bug. I group my dataset according to a key, but in the reduceGroup function I am passed values with different keys. My code has the following

Re: Reading multiple datasets with one read operation

2015-10-22 Thread Fabian Hueske

In principle, a data set the branches needs only to be materialized if both branches are pipelined until they are merged (i.e., in a hybrid-hash join). Otherwise, the data flow might deadlock due to pipelining. If you group both data sets before they are joined, the pipeline is broken due to the b

Re: Multiple keys in reduceGroup ?

2015-10-22 Thread Stephan Ewen

Hi! You are checking for equality / inequality with "!=" - can you check with "equals()" ? The key objects will most certainly be different in each record (as they are deserialized individually), but they should be equal. Stephan On Thu, Oct 22, 2015 at 12:20 PM, LINZ, Arnaud wrote: > Hello,

Re: Multiple keys in reduceGroup ?

2015-10-22 Thread Aljoscha Krettek

Hi, but he’s comparing it to a primitive long, so shouldn’t the Long key be unboxed and the comparison still be valid? My question is whether you enabled object-reuse-mode on the ExecutionEnvironment? Cheers, Aljoscha > On 22 Oct 2015, at 12:31, Stephan Ewen wrote: > > Hi! > > You are checki

Re: Multiple keys in reduceGroup ?

2015-10-22 Thread Till Rohrmann

If not, could you provide us with the program and test data to reproduce the error? Cheers, Till On Thu, Oct 22, 2015 at 12:34 PM, Aljoscha Krettek wrote: > Hi, > but he’s comparing it to a primitive long, so shouldn’t the Long key be > unboxed and the comparison still be valid? > > My question

RE: Multiple keys in reduceGroup ?

2015-10-22 Thread LINZ, Arnaud

Hi, I was using primitive types, and EnableObjectReuse was turned on. My next move was to turn it off, and it did solved the problem. It also increased execution time by 10%, but it’s hard to say if this overhead is due to the copy or to the change of behavior of the reduceGroup algorithm once

Re: Multiple keys in reduceGroup ?

2015-10-22 Thread Stephan Ewen

With object reuse activated, Flink heavily reuses objects. Each call to the Iterator in the reduceGroup function gives back one of the same two objects, with has been filled with different contents. Your list of all values will effectively only contain two different objects. Further more, the loo

Re: Multiple keys in reduceGroup ?

2015-10-22 Thread Till Rohrmann

You don’t modify the objects, however, the ReusingKeyGroupedIterator, which is the iterator you have in your reduce function, does. Internally it uses two objects, in your case of type Tuple2, to deserialize the input records. These two objects are alternately returned when you call next on the ite

Running continuously on yarn with kerberos

2015-10-22 Thread Niels Basjes

Hi, I want to write a long running (i.e. never stop it) streaming flink application on a kerberos secured Hadoop/Yarn cluster. My application needs to do things with files on HDFS and HBase tables on that cluster so having the correct kerberos tickets is very important. The stream is to be ingeste

RE: Multiple keys in reduceGroup ?

2015-10-22 Thread LINZ, Arnaud

Hi, Thanks a lot for the explanation. I cannot even say that it wasn’t stated in the documentation, I’ve simply missed the iterator part : “by default, user defined functions (like map() or reduce()) are getting new objects on each call (or through an iterator). So it is possible to keep refe

Re: Flink+avro integration

2015-10-22 Thread aawhitaker

Stephan Ewen wrote > This is actually not a bug, or a POJO or Avro problem. It is simply a > limitation in the functionality, as the exception message says: > "Specifying > fields by name is only supported on Case Classes (for now)." > > Try this with a regular reduce function that selects the max

Re: Zeppelin Integration

2015-10-22 Thread Till Rohrmann

Hi Trevor, that’s actually my bad since I only tested my branch against a remote cluster. I fixed the problem (not properly starting the LocalFlinkMiniCluster) so that you can now use Zeppelin also in local mode. Just check out my branch again. Cheers, Till On Wed, Oct 21, 2015 at 10:00 PM, Tr

reading csv file from null value

2015-10-22 Thread Philip Lee

Hi, I am trying to load the dataset with the part of null value by using readCsvFile(). // e.g _date|_click|_sales|_item|_web_page|_user case class WebClick(_click_date: Long, _click_time: Long, _sales: Int, _item: Int,_page: Int, _user: Int) private def getWebClickDataSet(env: ExecutionEnviro

Re: Flink+avro integration

2015-10-22 Thread Till Rohrmann

In the Java API, we only support the `max` operation for tuple types where you reference the fields via indices. Cheers, Till On Thu, Oct 22, 2015 at 4:04 PM, aawhitaker wrote: > Stephan Ewen wrote > > This is actually not a bug, or a POJO or Avro problem. It is simply a > > limitation in the f

Re: Question regarding parallelism

Reading multiple datasets with one read operation

Re: Reading multiple datasets with one read operation

Re: Reading multiple datasets with one read operation

Re: Reading multiple datasets with one read operation

Re: Reading multiple datasets with one read operation

Re: Reading multiple datasets with one read operation

Multiple keys in reduceGroup ?

Re: Reading multiple datasets with one read operation

Re: Multiple keys in reduceGroup ?

Re: Multiple keys in reduceGroup ?

Re: Multiple keys in reduceGroup ?

RE: Multiple keys in reduceGroup ?

Re: Multiple keys in reduceGroup ?

Re: Multiple keys in reduceGroup ?

Running continuously on yarn with kerberos

RE: Multiple keys in reduceGroup ?

Re: Flink+avro integration

Re: Zeppelin Integration

reading csv file from null value

Re: Flink+avro integration

21 matches

Site Navigation

Mail list logo

Footer information