> On Jan. 29, 2018, 6:26 p.m., Jesús Camacho Rodríguez wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
> > Lines 3699 (patched)
> > <https://reviews.apache.org/r/65304/diff/3/?file=1946552#file1946552line3699>
> >
> >     Is this the permission for each results directory? Does this mean that 
> > results cannot be shared by different users?
> >     
> >     Why does this need to be configurable? (I would assume this is not 
> > something that you let the user decide).

If this is HiveServer2 (with doAs=false), the hive user would be the directory 
owner regardless of which user is submitting the query, so I would not expect 
issues with sharing the cache in the HiveServer2 case. One danger with making 
this directory readable by others is the possibility that it allows a user to 
access cached results which the user may not have permission to see, if 
user-level filtering/masking rules are enabled. Different instances of Hive do 
not share caches, so sharing results between different Hive CLI instances is 
not something I am worrying about here.

I was basing the directory creation rules on the scratchdir logic, which does 
allow this to be configurable. But really the setting at the time of cache 
initialization is what matters. If you think this should just be hardcoded to 
700 let me know and I can make the change.


> On Jan. 29, 2018, 6:26 p.m., Jesús Camacho Rodríguez wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
> > Lines 3710 (patched)
> > <https://reviews.apache.org/r/65304/diff/3/?file=1946552#file1946552line3710>
> >
> >     What is _entry_ referring to? Does it mean the SQL query string size? 
> > We should extend the description to make it clear.

Entry here refers to the size of the cached results that are saved on the 
filesystem. Will make this more clear.


> On Jan. 29, 2018, 6:26 p.m., Jesús Camacho Rodríguez wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/Driver.java
> > Lines 1829 (patched)
> > <https://reviews.apache.org/r/65304/diff/3/?file=1946566#file1946566line1829>
> >
> >     Currently what happens when we get this exception?
> >     
> >     It seems to me that mechanism in HIVE-17626 works when execution itself 
> > fails, but in this case, whether the cache entry is still valid or not can 
> > be inferred statically at planning time, hence I am not sure whether it 
> > should be handled the same way? It seems we will have certain overhead that 
> > might not be necessary.
> >     
> >     Can't we check the validity of the entry when we are replacing the plan 
> > by the scan on the cached results, e.g., in SemanticAnalyzer?

So the table locks for the query are not acquired until query execution, which 
occurs after query compilation. If we implement automatic invalidation of the 
cache based on updates to the Hive tables, the following could happen:

1. Query A goes through query compilation and finds an entry in the cache that 
can be used to satisfy the query. At this point there have been no updates to 
the table which would invalidate the cache.
2. Query B acquires a write lock and begins making updates to one or more of 
the tables involved in query A
3. Query A attempts to acquire read locks and blocks while query B is running.
4. Query B finishes updating the tables and releases its lock.
5. Query A now acquires the read lock, but at this point the cached result is 
stale.

The options here would be to either attempt to recompile the query (the current 
approach), or to just go ahead and serve the stale results.


> On Jan. 29, 2018, 6:26 p.m., Jesús Camacho Rodríguez wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java
> > Lines 174 (patched)
> > <https://reviews.apache.org/r/65304/diff/3/?file=1946568#file1946568line174>
> >
> >     What happens if execution fails? Will the results still be cleaned 
> > properly?

This is being called in Driver.closeInProcess()/Driver.close(), so my 
impression was this should be the case.


> On Jan. 29, 2018, 6:26 p.m., Jesús Camacho Rodríguez wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java
> > Lines 262 (patched)
> > <https://reviews.apache.org/r/65304/diff/3/?file=1946568#file1946568line262>
> >
> >     Code might have been simpler using Guava cache. This is just a note, 
> > maybe something that can be considered for a follow-up.

Thanks for the pointer, will check it out.


> On Jan. 29, 2018, 6:26 p.m., Jesús Camacho Rodríguez wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOpMaterializationValidator.java
> > Lines 72 (patched)
> > <https://reviews.apache.org/r/65304/diff/3/?file=1946571#file1946571line72>
> >
> >     Leaf can be other node too, e.g., _DruidQuery_. Now reusing results is 
> > time-based, but for those type of tables, we do not have guarantees (as 
> > with other external tables). Thus, we should probably return true iff scan 
> > is an instance of HiveTableScan, false otherwise.

Ok, will fix.


> On Jan. 29, 2018, 6:26 p.m., Jesús Camacho Rodríguez wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOpMaterializationValidator.java
> > Lines 77 (patched)
> > <https://reviews.apache.org/r/65304/diff/3/?file=1946571#file1946571line77>
> >
> >     _getProjects()_

will fix.


> On Jan. 29, 2018, 6:26 p.m., Jesús Camacho Rodríguez wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOpMaterializationValidator.java
> > Lines 87 (patched)
> > <https://reviews.apache.org/r/65304/diff/3/?file=1946571#file1946571line87>
> >
> >     It should be _getCondition()_.

will fix


> On Jan. 29, 2018, 6:26 p.m., Jesús Camacho Rodríguez wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOpMaterializationValidator.java
> > Lines 96 (patched)
> > <https://reviews.apache.org/r/65304/diff/3/?file=1946571#file1946571line96>
> >
> >     I think all the information should be in _getCondition()_.

will fix. i really messed up this class :)


> On Jan. 29, 2018, 6:26 p.m., Jesús Camacho Rodríguez wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOpMaterializationValidator.java
> > Lines 124 (patched)
> > <https://reviews.apache.org/r/65304/diff/3/?file=1946571#file1946571line124>
> >
> >     - There are some operators missing, e.g., semijoin.
> >     - I would create specific visit methods for all operators, then a 
> > fall-back visit method on RelNode that returns false if it does not 
> > recognize the operator that we are visiting (as a way to prevent any 
> > incorrect assesment, e.g., if new operators are added in the future).

good suggestion, will do.


- Jason


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65304/#review196440
-----------------------------------------------------------


On Jan. 25, 2018, 7:34 a.m., Jason Dere wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65304/
> -----------------------------------------------------------
> 
> (Updated Jan. 25, 2018, 7:34 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Gopal V, and Jesús Camacho 
> Rodríguez.
> 
> 
> Bugs: HIVE-18513
>     https://issues.apache.org/jira/browse/HIVE-18513
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> - For queries that result in MR/Tez/Spark jobs on the cluster, save the 
> temporary query results to a cache directory where they can be re-used.
> - Add QueryResultsCache to manage cached results. Currently cache 
> invalidation is time-based, update-based cache invalidation needs to be added 
> later.
> - Driver/SemanticAnalyzer/Calcite planner changes to lookup queries in the 
> cache and use in place of the query plan.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 0c2cf05 
>   common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java 2767bca 
>   data/conf/hive-site.xml 01f83d1 
>   data/conf/llap/hive-site.xml cdda875 
>   data/conf/perf-reg/spark/hive-site.xml 497a61f 
>   data/conf/perf-reg/tez/hive-site.xml 012369f 
>   data/conf/rlist/hive-site.xml 9de00e5 
>   data/conf/spark/local/hive-site.xml fd0e6a0 
>   data/conf/spark/standalone/hive-site.xml 1e5bd65 
>   data/conf/spark/yarn-client/hive-site.xml a9a788b 
>   data/conf/tez/hive-site.xml 4519678 
>   itests/hive-blobstore/src/test/resources/hive-site.xml 038db0d 
>   itests/src/test/resources/testconfiguration.properties 1017249 
>   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 3f377f9 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 74595b0 
>   ql/src/java/org/apache/hadoop/hive/ql/cache/results/CacheUsage.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 2e1fd37 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveCalciteUtil.java 
> f0dd167 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOpMaterializationValidator.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
> 372cfad 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 85a1f34 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java ae2ec3d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 28e1041 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java 7243dc7 
>   ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 8af19d8 
>   ql/src/test/queries/clientpositive/results_cache_1.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/results_cache_2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/results_cache_capacity.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/results_cache_lifetime.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/results_cache_temptable.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/results_cache_with_masking.q 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/llap/results_cache_1.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/results_cache_1.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/results_cache_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/results_cache_capacity.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/results_cache_lifetime.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/results_cache_temptable.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/results_cache_with_masking.q.out 
> PRE-CREATION 
>   service/src/java/org/apache/hive/service/server/HiveServer2.java 2a528cd 
> 
> 
> Diff: https://reviews.apache.org/r/65304/diff/3/
> 
> 
> Testing
> -------
> 
> qfile tests added.
> 
> 
> Thanks,
> 
> Jason Dere
> 
>

Reply via email to