Re: word count on parquet file

2016-08-22 Thread shamu
I changed the code to below... JavaPairRDD rdd = sc.newAPIHadoopFile(inputFile, ParquetInputFormat.class, NullWritable.class, String.class, mrConf); JavaRDD words = rdd.values().flatMap( new FlatMapFunction() { public Iterable call(String x) { return Arrays.asLi

Re: word count on parquet file

2016-08-22 Thread ayan guha
You are missing input. Mrconf is not the way to add input files. In spark, try Dataframe read functions or sc.textfile function. Best Ayan On 23 Aug 2016 07:12, "shamu" wrote: > Hi All, > I am a newbie to Spark/Hadoop. > I want to read a parquet file and a perform a simple word-count. Below is >