Hi, I'm afraid you sent this email to the wrong Mailing list. This is the Spark users mailing list. We could probably tell you how to do this with Spark, but I think that's not your intention :)
kr, Gerard. On Thu, May 9, 2019 at 11:03 AM Balakumar iyer S <bala93ku...@gmail.com> wrote: > Hi All, > > I am trying to read a orc file and perform groupBy operation on it , but > When i run it on a large data set we are facing the following error > message. > > Input format of INPUT DATA > > |178111256| 107125374| > |178111256| 107148618| > |178111256| 107175361| > |178111256| 107189910| > > and we are try to group by the first column. > > But as per the logic and syntax the code is appropriate but it is working > well on small data set. I have attached the code in the text file. > > Thank you for your time. > > ERROR MESSAGE: > Error: java.lang.ArrayIndexOutOfBoundsException at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1453) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1349) > at java.io.DataOutputStream.writeByte(DataOutputStream.java:153) at > org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:273) at > org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:253) at > org.apache.hadoop.io.Text.write(Text.java:330) at > org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1149) > at > org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:610) > at orc_groupby.orc_groupby.Orc_groupBy$MyMapper.map(Orc_groupBy.java:73) at > orc_groupby.orc_groupby.Orc_groupBy$MyMapper.map(Orc_groupBy.java:39) at > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164) > > > > -- > REGARDS > BALAKUMAR SEETHARAMAN > > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org