> On March 8, 2014, 12:33 a.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java,
> >  line 154
> > <https://reviews.apache.org/r/18925/diff/2/?file=513918#file513918line154>
> >
> >     I guess I wasn't clear. It's not inapproapriate, but goes beyond its 
> > responsibility. Equality implementation is within a context which is the 
> > class. The instance to be checked doesn't necessarily has the runtime class 
> > info. In fact, the context shouldn't be aware the runtime class of these 
> > instances, as child classes can be added any time. Doing getClass == 
> > other.getClass() goes beyond the current context.
> >     
> >     What's more appropriate to check type compatibility by calling if 
> > (other instanceof this.class). This is different from checking 
> > this.getClass() == other.getClass().
> >     
> >     Take Java ArrayList.equals() method as an example. 
> > (http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/AbstractList.java#AbstractList.equals%28java.lang.Object%29).
> >  This method doesn't do runtime class check. The implementation is saying, 
> > other.getClass() doesn't have to be ArrayList, but has to be an instance of 
> > ArrayList. It could be an instance of MyArrayList as long as MyArrayList is 
> > inherited from ArrayList.
> >     
> >     If we think it's more protical to do such a check, we'd expect that 
> > ArrayList.equals() would also check this.getClass() == other.getClass().
> >     
> >     Btw, I don't understand how it breaks transitivity by removing this 
> > check.
> >     
> >     I understand this check was there before your change. I missed it in my 
> > previous review.
> >
> 
> Szehon Ho wrote:
>     Hm I actually did not realize that Java's code has that for collections, 
> thanks for pointing that out.  I suppose in list case, the semantic is the 
> user doesn't care about list implementation, but about the contents. 
>     
>     What I meant about breaking the transitive property if you allow each 
> class to decide:  Say we remove the check of RT class equality.  There is a 
> subclass called 'A' which choose to override equal to return true only if 
> 'other' is A.  Another subclass 'B' doesn't override .equals, and by 
> inheritance can return true if 'other' is any subclass of parent (A or B).  
> A.equals(B) is false, B.equals(A) is true, breaking transitive.  Now that I 
> think about it, this argument doesn't justify having the parent one way or 
> another, all I meant is that a class cannot implement .equals just in its own 
> context as you mentioned, all subclass must choose the same way to be 
> consistent, and I felt that having this check at the parent would ensure that 
> all the children followed it.
>     
>     But coming back down to this particular issue, I still don't think its 
> safe to remove that check.  There are two subclass of 
> AbstractParquetMapInspector, the Deep and Standard one depending on the type 
> of map.  If we don't do this check, then Deep will be considered equal to 
> Standard, and perhaps the wrong one may be returned from cache and used in 
> the inspection, they are not interchangeable.  This is unlike java list,map, 
> here the actual class matters more than the content.  At least that is my 
> understanding looking at the code.

okay. Frankly, I don't know what's the difference between the two child class: 
the whole parquet code is very confusing. Since the code was there before this, 
it's fine to keep it as it is.


- Xuefu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18925/#review36586
-----------------------------------------------------------


On March 8, 2014, 12:01 a.m., Szehon Ho wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18925/
> -----------------------------------------------------------
> 
> (Updated March 8, 2014, 12:01 a.m.)
> 
> 
> Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The issue is, as part of select * query, a DeepParquetHiveMapInspector is 
> used for one column of an overall parquet-table struct object inspector.  
> 
> The problem lies in the ObjectInspectorFactory's cache for struct object 
> inspector.  For performance, there is a cache keyed on an array list, of all 
> object inspectors of columns.  The second time the query is run, it attempts 
> to lookup cached struct inspector.  But when the hashmap looks up the part of 
> the key consisting of the DeepParquetHiveMapInspector, java calls .equals 
> against the existing DeepParquetHivemapInspector.  This fails, as the .equals 
> method casted the "other" to a "StandardParquetHiveInspector".
> 
> Regenerating the .equals and .hashcode from eclipse.  
> 
> Also adding one more check in .equals before casting, to handle the case if 
> another class of object inspector gets hashed to the same hashcode in the 
> cache.  Then java would call .equals against the other, which in this case is 
> not of the same class.
> 
> 
> Diffs
> -----
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
>  1d72747 
> 
> Diff: https://reviews.apache.org/r/18925/diff/
> 
> 
> Testing
> -------
> 
> Manual testing.
> 
> 
> Thanks,
> 
> Szehon Ho
> 
>

Reply via email to