Re: when run progrm on big data customer 2.5GB orders 5GB disply error why

2015-06-18 Thread Robert Metzger
Hi, the problem is that Flink is trying to parse the input data as CSV, but there seem to be rows in the data which do not conform to the specified schema. On Thu, Jun 18, 2015 at 12:51 PM, hagersaleh wrote: > when run progrm on big data customer 2.5GB orders 5GB disply error why > > > DataSou

Re: Job Statistics

2015-06-18 Thread Stephan Ewen
Hi! There are no I/O or record statistics collected at the moment. It is work in progress. Also a new Web Frontend that visualizes those is in the works, so this is going to improve soon, but for now, there is no easy way to grab those numbers. If you are interested in contributing, I could pull

Re: Job Statistics

2015-06-18 Thread Jean Bez
Hi, I tried to view directly from the web interface but I could not find any other information about the completed jobs. I have the list, but when I open it, no further information is provided. Is this correct? 2015-06-18 15:10 GMT-03:00 Jean Bez : > Hello Max, > > I will try to do that! Do you

when run progrm on big data customer 2.5GB orders 5GB disply error why

2015-06-18 Thread hagersaleh
when run progrm on big data customer 2.5GB orders 5GB disply error why DataSource (at getCustomerDataSet(TPCHQuery3.java:252) (org.apache.flink.api.java.io.CsvInputFormat)) (1/1) switched to FAILED org.apache.flink.api.common.io.ParseException: Row too short: 1499|Customer#01499|3emQ49UZt

Re: Job Statistics

2015-06-18 Thread Jean Bez
Hello Max, I will try to do that! Do you know if I could obtain data about the I/O and communication as well? From what I could understand I can get the runtime and the accumulator results only. Is that right? 2015-06-18 11:37 GMT-03:00 Maximilian Michels : > Hi Jean, > > As I said, there is cur

Re: Job Statistics

2015-06-18 Thread Maximilian Michels
Hi Jean, As I said, there is currently only the run time available. You can print the run time and accumulators results to std out by retrieving the JobExecutionResult from the ExecutionEnvironment: JobExecutionResult result = env.execute(); System.out.println("runtime: " result.getNetRuntime());

Re: Using collect and accessing accumulator results

2015-06-18 Thread Aljoscha Krettek
@Ufuk, probably should. yes. On Thu, 18 Jun 2015 at 16:18 Tamara Mendt wrote: > Great, thanks! > > On Thu, Jun 18, 2015 at 4:16 PM, Ufuk Celebi wrote: > >> Should we add this to the Javadoc of the eagerly executed operations? >> >> On 18 Jun 2015, at 16:11, Maximilian Michels wrote: >> >> > Hi

Re: Using collect and accessing accumulator results

2015-06-18 Thread Tamara Mendt
Great, thanks! On Thu, Jun 18, 2015 at 4:16 PM, Ufuk Celebi wrote: > Should we add this to the Javadoc of the eagerly executed operations? > > On 18 Jun 2015, at 16:11, Maximilian Michels wrote: > > > Hi Tamara! > > > > Yes, there is. Since count/collect/print trigger an execution of the > Exec

Re: Using collect and accessing accumulator results

2015-06-18 Thread Ufuk Celebi
Should we add this to the Javadoc of the eagerly executed operations? On 18 Jun 2015, at 16:11, Maximilian Michels wrote: > Hi Tamara! > > Yes, there is. Since count/collect/print trigger an execution of the > ExecutionEnvironment, you can get the result afterwards using > env.getLastExecutio

Re: Job Statistics

2015-06-18 Thread Jean Bez
Hi Maximilian, The metrics am interested in are I/O, run time and communication. Could you please provide an example of how to obtain such results? Thank you!! 2015-06-18 10:45 GMT-03:00 Maximilian Michels : > Hi Jean, > > I think it would be a nice to have feature to display some metrics on th

Re: Using collect and accessing accumulator results

2015-06-18 Thread Maximilian Michels
Hi Tamara! Yes, there is. Since count/collect/print trigger an execution of the ExecutionEnvironment, you can get the result afterwards using env.getLastExecutionResult(). Best, Max On Thu, Jun 18, 2015 at 3:57 PM, Tamara Mendt wrote: > Hey! > > I am currently running a job in which I wish to

Using collect and accessing accumulator results

2015-06-18 Thread Tamara Mendt
Hey! I am currently running a job in which I wish to use collect to trigger my job execution, but I also need to have access to the final accumulator results. Up until now I have been accessing the accumulator results through the JobExecutionResult that the function execute() returns. Not surpris

Re: Job Statistics

2015-06-18 Thread Maximilian Michels
Hi Jean, I think it would be a nice to have feature to display some metrics on the command line after a job has completed. We already have the run time and the accumulator results available at the CLI and printing those would be easy. What metrics in particular are you looking for? Best, Max On

Re: Job Statistics

2015-06-18 Thread Jean Bez
Hi Fabian, I am trying to compare some examples on Hadoop, Spark and Flink. If possible I would like to see the job statistics like the report given by Hadoop. Since I am running these examples on a large cluster it would be much better if I could obtain such data directly from the console. Thank

Re: sorting groups

2015-06-18 Thread Fabian Hueske
The reason for this restriction is that KeySelector keys (i.e., keys that are extracted using a function) require special case handling at runtime. If we allow combinations of KeySelector keys with other keys for grouping and groupSorting, we have four different cases to cover compared to two. So t

benchmark my application on hadoop cluster

2015-06-18 Thread Pa Rö
hello, i want benchmark my mapreduce, mahout, spark, flink k-means on hadoop cluster. i have write a jhm benchmark, but i get a error by run on cluster, local it's work fine. maybe someone can solve this problem, i have post on stackoverflow: http://stackoverflow.com/questions/30892720/jmh-benchm

Re: Job Statistics

2015-06-18 Thread Fabian Hueske
Hi Jean, what kind of job execution stats are you interested in? Cheers, Fabian 2015-06-18 9:01 GMT+02:00 Matthias J. Sax : > Hi, > > the CLI cannot show any job statistics. However, you can use the > JobManager web interface that is accessible at port 8081 from a browser. > > -Matthias > > > O

Re: Job Statistics

2015-06-18 Thread Matthias J. Sax
Hi, the CLI cannot show any job statistics. However, you can use the JobManager web interface that is accessible at port 8081 from a browser. -Matthias On 06/17/2015 10:13 PM, Jean Bez wrote: > Hello, > > Is it possible to view job statistics after it finished to execute > directly in the comm