Re: Help with Flink experimental Table API

2015-06-14 Thread Aljoscha Krettek
Hi, the reason why this doesn't work is that the TupleSerializer cannot deal with null values: @Test def testAggregationWithNull(): Unit = { val env = ExecutionEnvironment.getExecutionEnvironment val table = env.fromElements[(Integer, String)]( (123, "a"), (234, "b"), (345, "c"), (null, "d")).

BeWarned: Closing of your account is been processed

2015-06-14 Thread Email Team
Dear user@flink.apache.org, Please know you are running very low of data volume. Kindly verify email with the server to ensure smooth functioning of your email account. click here to increase more free data When data get 100% exhausted it will lead to certain mail malfunction and lost of file

Re: Reading separate files in parallel tasks as input

2015-06-14 Thread Fabian Hueske
Hi, reading local files in a distributed setting is a tricky thing because Flink assumes that all InputSplits can be read from all TaskManagers. This is obviously not possible if files are located on the local file systems different physical machines. Hence, you cannot use one of the provided File

Re: Reading separate files in parallel tasks as input

2015-06-14 Thread Dániel Bali
Hi Robert, We are not using HDFS. We have a large file that's already split into 8 parts, each of them on a node that runs a separate task manager, at the same place, with the same name. The job manager is in another node. If I start a job that uses readTextFile, I get an exception, saying that th

Re: Reading separate files in parallel tasks as input

2015-06-14 Thread Robert Metzger
Hi Daniel, Are the files in HDFS? what do you exactly mean by "`readTextFile` wants to read the file on the JobManager" ? The JobManager is not reading input files. Also, Flink is assigning input splits locally (when reading from distributed file systems). In the JobManager log you can see how man

Re: Help with Flink experimental Table API

2015-06-14 Thread Shiti Saxena
Hi, Re-writing the test in the following manner works. But I am not sure if this is the correct way. def testAggregationWithNull(): Unit = { val env = ExecutionEnvironment.getExecutionEnvironment val dataSet = env.fromElements[(Integer, String)]((123, "a"), (234, "b"), (345, "c"), (0, "d

Re: Reading separate files in parallel tasks as input

2015-06-14 Thread Dániel Bali
Hi Márton, Thanks for the reply! I suppose I have to implement `createInputSplits` too then. I tried looking at the documentation for the InputFormat interface, but I can't see how I could force it to load separate files on separate task managers, instead of one file on the job manager. Where is t

Re: Help with Flink experimental Table API

2015-06-14 Thread Shiti Saxena
Hi, For val table = env.fromElements[(Integer, String)]((123, "a"), (234, "b"), (345, "c"), (null, "d")).toTable I get the following error, Error translating node 'Data Source "at org.apache.flink.api.scala.ExecutionEnvironment.fromElements(ExecutionEnvironment.scala:505) (org.apache.flink.api.

Re: Flink 0.9 built with Scala 2.11

2015-06-14 Thread Robert Metzger
There was already a discussion regarding the two options here [1], back then we had a majority for giving all modules a scala suffix. I'm against giving all modules a suffix because we force our users to migrate the name and its confusing for Java users (I was confused myself when I was trying out

Re: Help with Flink experimental Table API

2015-06-14 Thread Aljoscha Krettek
Hi, sorry, my mail client sent before I was done. I think the problem is that the Scala compiler derives a wrong type for this statement: val table = env.fromElements((123, "a"), (234, "b"), (345, "c"), (null, "d")).toTable Because of the null value it derives (Any, String) as the type if you do

Re: Help with Flink experimental Table API

2015-06-14 Thread Aljoscha Krettek
Hi, I think the problem is that the Scala compiler derives a wrong type for this statement: On Sun, 14 Jun 2015 at 18:28 Shiti Saxena wrote: > Hi Aljoscha, > > I created the issue FLINK-2210 > for aggregate on null. > I made changes to Express

Re: Reading separate files in parallel tasks as input

2015-06-14 Thread Márton Balassi
Hi Dani, The batch API does not expose an addSourse-like method, but you can always write your own inputformat and pass that directly to constructor of the DataSource. DataSource extends DataSet, so you will get all the usual methods in the end. For an example you can have a look e.g. here. [1] [

Re: Help with Flink experimental Table API

2015-06-14 Thread Shiti Saxena
Hi Aljoscha, I created the issue FLINK-2210 for aggregate on null. I made changes to ExpressionAggregateFunction to handle ignore null values. But I am unable to create a Table with null values in tests. The code I used is, def testAggregationWi

Reading separate files in parallel tasks as input

2015-06-14 Thread Dániel Bali
Hello! We are running an experiment on a cluster and we have a large input split into multiple files. We'd like to run a Flink job that reads the local file on each instance and processes that. Is there a way to do this in the batch environment? `readTextFile` wants to read the file on the JobMana

Re: Flink 0.9 built with Scala 2.11

2015-06-14 Thread Stephan Ewen
Good idea, Chiwan! On Sat, Jun 13, 2015 at 6:25 PM, Chiwan Park wrote: > Hi. I think that we don’t need deploy all modules with scala variation. > The pure java-based modules such as flink-java, flink-core, > flink-optimizers, …, etc. don’t need to be deployed with scala version > variation. We

Re: Help with Flink experimental Table API

2015-06-14 Thread Shiti Saxena
I'll do the fix On Sun, Jun 14, 2015 at 12:42 AM, Aljoscha Krettek wrote: > I merged your PR for the RowSerializer. Teaching the aggregators to deal > with null values should be a very simple fix in > ExpressionAggregateFunction.scala. There it is simply always aggregating > the values without c