[jira] [Commented] (HIVE-17019) Add support to download debugging information as an archive.

Harish Jaiprakash (JIRA) Wed, 12 Jul 2017 03:33:19 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-17019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083774#comment-16083774
 ]


Harish Jaiprakash commented on HIVE-17019:
------------------------------------------

Thanks [~sseth].

- Change the top level package from llap-debug to tez-debug? (Works with both I 
believe) [~ashutoshc], [~thejas] - any recommendations on whether the code gets 
a top level module, or goes under an existing module. This allows downloading 
of various debug artifacts for a tez job - logs, metrics for llap, hiveserver2 
logs (soon), tez am logs, ATS data for the query (hive and tez).

Will change the directory.

- In the new pom.xml, dependency on hive-llap-server. 1) Is it required?, 2) 
Will need to exclude some dependent artifacts. See service/pom.xml llap-server 
dependency handling

The llap status is fetched using LlapStatusServiceDriver which is part of 
hive-llap-server.

- LogDownloadServlet - Should this throw an error as soon as the filename 
pattern validation fails?

The filename check is to prevent any injection attack into the file name/http 
header, not to validate the id.

- LogDownloadServlet - change to dagId/queryId validation instead

Can do, but it will be sensitive to changes to the id format. Currently its 
passed down to ATS and nothing will be retrieved for it.

- LogDownloadServlet - thread being created inside of the request handler? This 
should be limited outside of the request? so that only a controlled number of 
parallel artifact downloads can run.

Creating a shared executor, does it make sense to use Guava's direct executor, 
which will schedule task in current thread.

- LogDownloadServlet - what happens in case of aggregator failure? Exception 
back to the user?

Jetty will handle the exception, returning 500 to the user. Not sure if 
exception trace is part of it. Will try and see.

- LogDownloadServlet - seems to be generating the file to disk and then 
streaming it over. Can this be streamed over directly instead. Otherwise 
there's the possibility of leaking files. (Artifact.downloadIntoStream or some 
such?) Guessing this is complicated further by the multi-threaded artifact 
downloader.
Alternately need to have a cleanup mechanism.

For streaming directly, it would not be possible because of multithreading. If 
its single threaded then I can use a ZipOutputStream and add entry one at a 
time.

Oops, sorry the finally got moved down since aggregator had to be closed before 
streaming the file. I'll handle it using a try finally to cleanup.

- Timeout on the tests

Setting timeouts on tests.

- Apache header needs to be added to files where it is missing.

Sorry, will add the licence header to all files.

- Main - Please rename to something more indicative of what the tool does.

I was planning to remove this and integrate with hive cli, --service 
<download_logs>. This does not work without lot of classpath fixes, or I'll 
have to create a script to add hive jars.

- Main - Likely a follow up jira - parse using a standard library, instead of 
trying to parse the arguments to main directly.

Will check a few libs, apache commons OptionBuilder uses a static instance in 
its builder. Should be ok, for a cli based invoke once app, but will look at 
something better on lines of python argparse.

- Server - Enabling the artifact should be controlled via a config. Does not 
always need to be hosted in HS2 (Default disabled, at least till security can 
be sorted out)

I'll add a config.

- Is it possible to support a timeout on the downloads? (Can be a follow up 
jira)

Sure, will do. Global or per download or both?

- ArtifactAggregator - I believe this does 2 stages of dependent artifacts / 
downloads? Stage1 - download whatever it can. Information from this should 
should be adequate for stage2 downloads ?

It could be more stages:
Ex: given dag_id
stage 1: will fetch tez ats info which is used to extract hive id, task 
container/node list.
stage 2: will fetch hive ats info, tez container log list.
stage 3: llap containers log list, tez task logs.
stage 4: llap container logs.

aggregator iterates through the list of sources and finds those which can 
download using info in the params.
It schedules the sources and waits for them to complete everything and the 
repeats.
Stop if no new sources could download or all sources are exhausted.


- For the ones not implemented yet (DummyArtifact) - think it's better to just 
comment out the code, instead of invoking the DummyArtifacts downloader

Sorry, will do.

- Security - ACL enforcement required on secure clusters to make sure users can 
only download what they have access to. This is a must fix before this can be 
enabled by default.

Working on this.

- Security - this can work around yarn restrictions on log downloads, since the 
files are being accessed by the hive user.

Yes this should work.

Could you please add some details on cluster testing.

I'll add another comment with the details of testing.


> Add support to download debugging information as an archive.
> ------------------------------------------------------------
>
>                 Key: HIVE-17019
>                 URL: https://issues.apache.org/jira/browse/HIVE-17019
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Harish Jaiprakash
>            Assignee: Harish Jaiprakash
>         Attachments: HIVE-17019.01.patch
>
>
> Given a queryId or dagId, get all information related to it: like, tez am, 
> task logs, hive ats data, tez ats data, slider am status, etc. Package it 
> into and archive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17019) Add support to download debugging information as an archive.

Reply via email to