Good point Sean. I've filed a ticket to document the equals() / hashCode() requirements for custom keys in the Spark documentation, as this has come up a few times on the user@ list.
https://issues.apache.org/jira/browse/SPARK-2148 On Sun, Jun 15, 2014 at 12:11 PM, Sean Owen <so...@cloudera.com> wrote: > In Java at large, you must always implement hashCode() when you implement > equals(). This is not specific to Spark. This is to maintain the contract > that two equals() instances have the same hash code, and that's not the > case for your class now. This causes weird things to happen wherever the > hash code contract is depended upon. > > This probably works fine: > > @Override > public int hashCode() { > return dcxId.hashCode() ^ trxId.hashCode() ^ msgType.hashCode(); > } > > > On Sun, Jun 15, 2014 at 11:45 AM, Gaurav Jain <ja...@student.ethz.ch> > wrote: > >> I have a simple Java class as follows, that I want to use as a key while >> applying groupByKey or reduceByKey functions: >> >> private static class FlowId { >> public String dcxId; >> public String trxId; >> public String msgType; >> >> public FlowId(String dcxId, String trxId, String msgType) >> { >> this.dcxId = dcxId; >> this.trxId = trxId; >> this.msgType = msgType; >> } >> >> public boolean equals(Object other) { >> if (other == this) return true; >> if (other == null) return false; >> if (getClass() != other.getClass()) return false; >> FlowId fid = (FlowId) other; >> if (this.dcxId.equals(fid.dcxId) && >> this.trxId.equals(fid.trxId) && >> this.msgType.equals(fid.msgType)) >> { >> return true; >> } >> return false; >> } >> } >> >> I figured that an equals() method would need to be overridden to ensure >> comparison of keys, but still entries with the same key are listed >> separately after applying a groupByKey(), for example. What further >> modifications are necessary to enable usage of above class as a key. Right >> now, I have fallen back to using Tuple3<String, String, String> instead of >> the FlowId class, but it makes the code unnecessarily verbose. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Using-custom-class-as-a-key-for-groupByKey-or-reduceByKey-tp7640.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > >