Re: The following Java MR code works for small dataset but throws(arrayindexoutofBound) error for large dataset

Gerard Maas Thu, 09 May 2019 04:02:18 -0700

Hi,

I'm afraid you sent this email to the wrong Mailing list.
This is the Spark users mailing list. We could probably tell you how to do
this with Spark, but I think that's not your intention :)


kr, Gerard.


On Thu, May 9, 2019 at 11:03 AM Balakumar iyer S <bala93ku...@gmail.com>
wrote:

> Hi All,
>
> I am trying to read a orc file and  perform groupBy operation on it , but
> When i run it on a large data set we are facing the following error
> message.
>
> Input format of INPUT DATA
>
> |178111256|  107125374|
> |178111256|  107148618|
> |178111256|  107175361|
> |178111256|  107189910|
>
> and we are try to group by the first column.
>
> But as per the logic and syntax the code is appropriate but it is  working
> well on small data set. I have attached the code in the text file.
>
> Thank you for your time.
>
> ERROR MESSAGE:
> Error: java.lang.ArrayIndexOutOfBoundsException at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1453)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1349)
> at java.io.DataOutputStream.writeByte(DataOutputStream.java:153) at
> org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:273) at
> org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:253) at
> org.apache.hadoop.io.Text.write(Text.java:330) at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1149)
> at
> org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:610)
> at orc_groupby.orc_groupby.Orc_groupBy$MyMapper.map(Orc_groupBy.java:73) at
> orc_groupby.orc_groupby.Orc_groupBy$MyMapper.map(Orc_groupBy.java:39) at
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:422) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
>
>
>
> --
> REGARDS
> BALAKUMAR SEETHARAMAN
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: The following Java MR code works for small dataset but throws(arrayindexoutofBound) error for large dataset

Reply via email to