Hi,

I'm afraid you sent this email to the wrong Mailing list.
This is the Spark users mailing list. We could probably tell you how to do
this with Spark, but I think that's not your intention :)

kr, Gerard.


On Thu, May 9, 2019 at 11:03 AM Balakumar iyer S <bala93ku...@gmail.com>
wrote:

> Hi All,
>
> I am trying to read a orc file and  perform groupBy operation on it , but
> When i run it on a large data set we are facing the following error
> message.
>
> Input format of INPUT DATA
>
> |178111256|  107125374|
> |178111256|  107148618|
> |178111256|  107175361|
> |178111256|  107189910|
>
> and we are try to group by the first column.
>
> But as per the logic and syntax the code is appropriate but it is  working
> well on small data set. I have attached the code in the text file.
>
> Thank you for your time.
>
> ERROR MESSAGE:
> Error: java.lang.ArrayIndexOutOfBoundsException at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1453)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1349)
> at java.io.DataOutputStream.writeByte(DataOutputStream.java:153) at
> org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:273) at
> org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:253) at
> org.apache.hadoop.io.Text.write(Text.java:330) at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1149)
> at
> org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:610)
> at orc_groupby.orc_groupby.Orc_groupBy$MyMapper.map(Orc_groupBy.java:73) at
> orc_groupby.orc_groupby.Orc_groupBy$MyMapper.map(Orc_groupBy.java:39) at
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:422) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
>
>
>
> --
> REGARDS
> BALAKUMAR SEETHARAMAN
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to