> On March 8, 2014, 12:33 a.m., Xuefu Zhang wrote: > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java, > > line 154 > > <https://reviews.apache.org/r/18925/diff/2/?file=513918#file513918line154> > > > > I guess I wasn't clear. It's not inapproapriate, but goes beyond its > > responsibility. Equality implementation is within a context which is the > > class. The instance to be checked doesn't necessarily has the runtime class > > info. In fact, the context shouldn't be aware the runtime class of these > > instances, as child classes can be added any time. Doing getClass == > > other.getClass() goes beyond the current context. > > > > What's more appropriate to check type compatibility by calling if > > (other instanceof this.class). This is different from checking > > this.getClass() == other.getClass(). > > > > Take Java ArrayList.equals() method as an example. > > (http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/AbstractList.java#AbstractList.equals%28java.lang.Object%29). > > This method doesn't do runtime class check. The implementation is saying, > > other.getClass() doesn't have to be ArrayList, but has to be an instance of > > ArrayList. It could be an instance of MyArrayList as long as MyArrayList is > > inherited from ArrayList. > > > > If we think it's more protical to do such a check, we'd expect that > > ArrayList.equals() would also check this.getClass() == other.getClass(). > > > > Btw, I don't understand how it breaks transitivity by removing this > > check. > > > > I understand this check was there before your change. I missed it in my > > previous review. > > > > Szehon Ho wrote: > Hm I actually did not realize that Java's code has that for collections, > thanks for pointing that out. I suppose in list case, the semantic is the > user doesn't care about list implementation, but about the contents. > > What I meant about breaking the transitive property if you allow each > class to decide: Say we remove the check of RT class equality. There is a > subclass called 'A' which choose to override equal to return true only if > 'other' is A. Another subclass 'B' doesn't override .equals, and by > inheritance can return true if 'other' is any subclass of parent (A or B). > A.equals(B) is false, B.equals(A) is true, breaking transitive. Now that I > think about it, this argument doesn't justify having the parent one way or > another, all I meant is that a class cannot implement .equals just in its own > context as you mentioned, all subclass must choose the same way to be > consistent, and I felt that having this check at the parent would ensure that > all the children followed it. > > But coming back down to this particular issue, I still don't think its > safe to remove that check. There are two subclass of > AbstractParquetMapInspector, the Deep and Standard one depending on the type > of map. If we don't do this check, then Deep will be considered equal to > Standard, and perhaps the wrong one may be returned from cache and used in > the inspection, they are not interchangeable. This is unlike java list,map, > here the actual class matters more than the content. At least that is my > understanding looking at the code.
okay. Frankly, I don't know what's the difference between the two child class: the whole parquet code is very confusing. Since the code was there before this, it's fine to keep it as it is. - Xuefu ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18925/#review36586 ----------------------------------------------------------- On March 8, 2014, 12:01 a.m., Szehon Ho wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/18925/ > ----------------------------------------------------------- > > (Updated March 8, 2014, 12:01 a.m.) > > > Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang. > > > Repository: hive-git > > > Description > ------- > > The issue is, as part of select * query, a DeepParquetHiveMapInspector is > used for one column of an overall parquet-table struct object inspector. > > The problem lies in the ObjectInspectorFactory's cache for struct object > inspector. For performance, there is a cache keyed on an array list, of all > object inspectors of columns. The second time the query is run, it attempts > to lookup cached struct inspector. But when the hashmap looks up the part of > the key consisting of the DeepParquetHiveMapInspector, java calls .equals > against the existing DeepParquetHivemapInspector. This fails, as the .equals > method casted the "other" to a "StandardParquetHiveInspector". > > Regenerating the .equals and .hashcode from eclipse. > > Also adding one more check in .equals before casting, to handle the case if > another class of object inspector gets hashed to the same hashcode in the > cache. Then java would call .equals against the other, which in this case is > not of the same class. > > > Diffs > ----- > > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java > 1d72747 > > Diff: https://reviews.apache.org/r/18925/diff/ > > > Testing > ------- > > Manual testing. > > > Thanks, > > Szehon Ho > >