[ https://issues.apache.org/jira/browse/HIVE-17019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083774#comment-16083774 ]
Harish Jaiprakash commented on HIVE-17019: ------------------------------------------ Thanks [~sseth]. - Change the top level package from llap-debug to tez-debug? (Works with both I believe) [~ashutoshc], [~thejas] - any recommendations on whether the code gets a top level module, or goes under an existing module. This allows downloading of various debug artifacts for a tez job - logs, metrics for llap, hiveserver2 logs (soon), tez am logs, ATS data for the query (hive and tez). Will change the directory. - In the new pom.xml, dependency on hive-llap-server. 1) Is it required?, 2) Will need to exclude some dependent artifacts. See service/pom.xml llap-server dependency handling The llap status is fetched using LlapStatusServiceDriver which is part of hive-llap-server. - LogDownloadServlet - Should this throw an error as soon as the filename pattern validation fails? The filename check is to prevent any injection attack into the file name/http header, not to validate the id. - LogDownloadServlet - change to dagId/queryId validation instead Can do, but it will be sensitive to changes to the id format. Currently its passed down to ATS and nothing will be retrieved for it. - LogDownloadServlet - thread being created inside of the request handler? This should be limited outside of the request? so that only a controlled number of parallel artifact downloads can run. Creating a shared executor, does it make sense to use Guava's direct executor, which will schedule task in current thread. - LogDownloadServlet - what happens in case of aggregator failure? Exception back to the user? Jetty will handle the exception, returning 500 to the user. Not sure if exception trace is part of it. Will try and see. - LogDownloadServlet - seems to be generating the file to disk and then streaming it over. Can this be streamed over directly instead. Otherwise there's the possibility of leaking files. (Artifact.downloadIntoStream or some such?) Guessing this is complicated further by the multi-threaded artifact downloader. Alternately need to have a cleanup mechanism. For streaming directly, it would not be possible because of multithreading. If its single threaded then I can use a ZipOutputStream and add entry one at a time. Oops, sorry the finally got moved down since aggregator had to be closed before streaming the file. I'll handle it using a try finally to cleanup. - Timeout on the tests Setting timeouts on tests. - Apache header needs to be added to files where it is missing. Sorry, will add the licence header to all files. - Main - Please rename to something more indicative of what the tool does. I was planning to remove this and integrate with hive cli, --service <download_logs>. This does not work without lot of classpath fixes, or I'll have to create a script to add hive jars. - Main - Likely a follow up jira - parse using a standard library, instead of trying to parse the arguments to main directly. Will check a few libs, apache commons OptionBuilder uses a static instance in its builder. Should be ok, for a cli based invoke once app, but will look at something better on lines of python argparse. - Server - Enabling the artifact should be controlled via a config. Does not always need to be hosted in HS2 (Default disabled, at least till security can be sorted out) I'll add a config. - Is it possible to support a timeout on the downloads? (Can be a follow up jira) Sure, will do. Global or per download or both? - ArtifactAggregator - I believe this does 2 stages of dependent artifacts / downloads? Stage1 - download whatever it can. Information from this should should be adequate for stage2 downloads ? It could be more stages: Ex: given dag_id stage 1: will fetch tez ats info which is used to extract hive id, task container/node list. stage 2: will fetch hive ats info, tez container log list. stage 3: llap containers log list, tez task logs. stage 4: llap container logs. aggregator iterates through the list of sources and finds those which can download using info in the params. It schedules the sources and waits for them to complete everything and the repeats. Stop if no new sources could download or all sources are exhausted. - For the ones not implemented yet (DummyArtifact) - think it's better to just comment out the code, instead of invoking the DummyArtifacts downloader Sorry, will do. - Security - ACL enforcement required on secure clusters to make sure users can only download what they have access to. This is a must fix before this can be enabled by default. Working on this. - Security - this can work around yarn restrictions on log downloads, since the files are being accessed by the hive user. Yes this should work. Could you please add some details on cluster testing. I'll add another comment with the details of testing. > Add support to download debugging information as an archive. > ------------------------------------------------------------ > > Key: HIVE-17019 > URL: https://issues.apache.org/jira/browse/HIVE-17019 > Project: Hive > Issue Type: Bug > Reporter: Harish Jaiprakash > Assignee: Harish Jaiprakash > Attachments: HIVE-17019.01.patch > > > Given a queryId or dagId, get all information related to it: like, tez am, > task logs, hive ats data, tez ats data, slider am status, etc. Package it > into and archive. -- This message was sent by Atlassian JIRA (v6.4.14#64029)