I think if you wrap the byte[] into an object and implement equals and hashcode methods, you may be able to do this. There will be the overhead of extra object, but conceptually it should work unless I am missing something.
Best Regards, Sonal Founder, Nube Technologies <http://www.nubetech.co> Check out Reifier at Spark Summit 2015 <https://spark-summit.org/2015/events/real-time-fuzzy-matching-with-spark-and-elastic-search/> <http://in.linkedin.com/in/sonalgoyal> On Thu, Jun 11, 2015 at 9:27 PM, Mark Tse <mark....@d2l.com> wrote: > I would like to work with RDD pairs of Tuple2<byte[], obj>, but byte[]s > with the same contents are considered as different values because their > reference values are different. > > > > I didn't see any to pass in a custom comparer. I could convert the byte[] > into a String with an explicit charset, but I'm wondering if there's a more > efficient way. > > > > Also posted on SO: http://stackoverflow.com/q/30785615/2687324 > > > > Thanks, > > Mark >