----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/64193/#review192601 -----------------------------------------------------------
Since we touch the `LoadSemanticAnalyzer` could we add a q-test (could be added to one of the existing `lineage*.q` files) for `LOAD` statements. Same for import / export statements (as far as I can tell there are no existing ones, correct me if I am wrong). If you have time, it would be great to run some of the lineage tests for HoS too, but since thats a bit orthogonal to this JIRA, it can be done in a follow up JIRA. ql/src/java/org/apache/hadoop/hive/ql/Driver.java Lines 365 (patched) <https://reviews.apache.org/r/64193/#comment270856> Sounds good. Just curious, is there any way to know for sure where code run by a `Driver`, creates another `Driver`? How did you determine when this is necessary? ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java Line 401 (original), 401 (patched) <https://reviews.apache.org/r/64193/#comment270854> Ok, but do we need to do `if (queryState.getLineageState() != null)` to ensure an NPE isn't thrown? That seems to be what the old code is doing. ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java Lines 111 (patched) <https://reviews.apache.org/r/64193/#comment270857> Doesn't a `TaskCompiler` already have a `QueryState` object? Why do we need to explicitly pass in a `LineageState`? - Sahil Takiar On Nov. 30, 2017, 1:22 a.m., Andrew Sherman wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/64193/ > ----------------------------------------------------------- > > (Updated Nov. 30, 2017, 1:22 a.m.) > > > Review request for hive. > > > Repository: hive-git > > > Description > ------- > > A Hive Session can contain multiple concurrent sql Operations. > Lineage is currently tracked in SessionState and is cleared when a query > completes. This results in Lineage for other running queries being lost. > > To fix this, move LineageState from SessionState to QueryState. > In MoveTask/MoveWork use the LineageState from the MoveTask's QueryState > rather than trying to use it from MoveWork. > Add a test which runs multiple jdbc queries in a thread pool > against the same connection and show that Vertices are not lost from Lineage. > As part of this test, add ReadableHook, an ExecuteWithHookContext that stores > HookContexts in memory and makes them available for reading. > Make LineageLogger methods static so they can be used elsewhere. > > Sometimes a running query (originating in a Driver) will instantiate > another Driver to run or compile another query. Because these Drivers > shared a Session, the child Driver would accumulate Lineage information > along with that of the parent Driver. For consistency a LineageState is > passed to these child Drivers and stored in the new Driver's QueryState. > > > Diffs > ----- > > > itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithMiniHS2.java > f5ed735c1ec14dfee338e56020fa2629b168389d > ql/src/java/org/apache/hadoop/hive/ql/Driver.java > af9f193dc94e2e05caa88d965a34f4483c9d7069 > ql/src/java/org/apache/hadoop/hive/ql/QueryState.java > 7d5aa8b179e536e25c41a8946e667f8dd5669e0f > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java > e7af5e004fb560b574b82f6d1b60517511802f37 > ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java > e2f8c1f8012ad25114e279747e821b291c7f4ca6 > ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java > 1f0487f4f72ab18bcf876f45ad5758d83a7f001b > > ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java > 262225fc202d4627652acfd77350e44b0284b3da > > ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadTable.java > bb1f4e50509e57a9d0b9e6793c1fc08baa4d2981 > ql/src/java/org/apache/hadoop/hive/ql/hooks/HookContext.java > 7b617309f6b0d8a7ce0dea80ab1f790c2651b147 > ql/src/java/org/apache/hadoop/hive/ql/hooks/LineageLogger.java > 2f764f8a29a9d41a7db013a949ffe3a8a9417d32 > ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadableHook.java PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java > 68709b4d3baf15d78e60e948ccdef3df84f28cec > ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexHandler.java > 1e577da82343a1b7361467fb662661f9c6642ec0 > ql/src/java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java > 29886ae7f97f8dae7116f4fc9a2417ab8f9dac0a > ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java > 7b067a0d45e33bc3347c43b050af933c296a9227 > > ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java > 504b0623142a6fa6cdb45a26b49f146e12ec2d7a > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java > d7a83f775abca39b219f71aff88173a14ffaee9f > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRProcContext.java > 4387c4297fee48d4c03e95d5a2fcb822ab480eeb > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java > 67739a1db9fc52a67f4f5ea7dba80fe0e95750c8 > ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java > 338c1856672f09bb7da35d2336ebb5b6f3fdc5a6 > ql/src/java/org/apache/hadoop/hive/ql/optimizer/lineage/Generator.java > e6c07713b24df719315d804f006151106eea9aed > ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java > 1fd634c928a5384b09d97322c3ea785f518d73fe > ql/src/java/org/apache/hadoop/hive/ql/parse/ExplainSemanticAnalyzer.java > 065c7e50986872cd35386feee712f3452597d643 > ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java > 0c160acf46eb1eb07c5f04091099c1024e166638 > ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java > b6f1139fe1a78283277bf4d0c5224ab1d718c634 > ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java > cd75130d7c5f0b402f1b4331c57edc611eb4b2ed > ql/src/java/org/apache/hadoop/hive/ql/parse/IndexUpdater.java > f31775ed942160da73344c4dca707da7b8c658a6 > ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java > 238fbd60572ee5f7f8f6c4d5b2abce8f66c7e495 > ql/src/java/org/apache/hadoop/hive/ql/parse/MapReduceCompiler.java > d7a56e5846d5754dec5070d8c44443543a3695e4 > > ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java > 498b6741c3f40b92ce3fb218e91e7809a17383f0 > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java > b66817f65f65b6aaf8dbc339a969b8e9e0565e9e > ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java > 7b2937032ab8dd57f8923e0a9e7aab4a92de55ee > ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java > be33f380030ea8b416a4549c3947d767bba66356 > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java > 4d2bcfa285dc08811106f3c234346efff22afd99 > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java > 604c8aee151a45cf942852a3644b5e79f779f353 > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java > 965044d9253585eeaeef50d7fe4fc4d818042df8 > ql/src/java/org/apache/hadoop/hive/ql/plan/MoveWork.java > 28a33740b30b7be0057ce91de55a0407dd2f2cbf > ql/src/java/org/apache/hadoop/hive/ql/session/LineageState.java > 056d6141d6239816699ed5f730cbd14e48d8d9bb > ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java > bb6ddc6fa4667ac0e30994d0f9ee8b969542383c > > ql/src/test/org/apache/hadoop/hive/ql/optimizer/TestGenMapRedUtilsCreateConditionalTask.java > 340689255c738ea497bcd269463b8b8bc38cf34c > ql/src/test/org/apache/hadoop/hive/ql/parse/TestGenTezWork.java > 2c28c398ca49ba661df460c9f3e6d578c785d3ce > > > Diff: https://reviews.apache.org/r/64193/diff/1/ > > > Testing > ------- > > > Thanks, > > Andrew Sherman > >