[ https://issues.apache.org/jira/browse/HIVE-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13966752#comment-13966752 ]
Shuaishuai Nie commented on HIVE-5072: -------------------------------------- Thanks [~ekoifman] for the comments. Please see below for the answers: 0. If I understand this correctly, optionsFile should contain the details of Sqoop command to execute. But in the code it seems that the expectation is that this file is present in DFS. Thus to submit a Sqoop job via WebHcat (and use optionsFile) the user has to first upload this file to the cluster. This is an extra call for job submission and possibly extra config on the cluster side to enable the client of WebHCat to upload files. Why not just let the client upload the file WebHCat as part of the REST POST request? This seems a lot more user friendly/usable. The user scenario for the option file is user may want to reuse some part of sqoop command arguments across different commands, like connection string; username or password. In this case, user should expect the file already exist on DFS for the to use across different jobs. Since Sqoop only support option file from local file system, and Templeon may launch sqoop job on any workernode, Templeton need to add the option file used to distribute cache so that it can be used in Sqoop command. You mentioned "Why not just let the client upload the file WebHCat as part of the REST POST request", where the file located originally? If it is from local file system, it will require extra copy and extra command for each Templeton Sqoop job. 1. -d 'user.name=foo' is deprecated (i.e. user.name as a Form parameter). user.name has to be part of the query string. The test cases and examples in .pdf should be updated. 2. Formatting in ScoopDelegator doesn't follow Hive conventions 3. Server.scoop() - there is Server.checkEnableLogPrerequisite() to check 'enableLog' parameter setting. 4. I see that new parameters for Scoop tests are added in 3 places in build.xml. Only the 'test' target actually runs jobsubmission.conf. Will change the patch and documentation accordingly for 1-4 5. For the tests you added, where does the JDBC driver come from for any particular DB? The JDBC drive should come from the Sqoop installation based on which Database is used. It should located at %SQOOP_HOME%\lib folder 6. Can for Form parameter for optionsFile (Server.sqoop()) be called "optionsFile" instead of just "file"? The "file" argument does not works exactly the same as the "--options-file" in Sqoop since the "--options-file" can be only part of the command and "file" here can only be the entire command. But I think change the name to optionFile may be more explanatory for users. 7. it seems from http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#_using_options_files_to_pass_arguments that in a Sqoop command, either options-file (with command and args) or command name and all args inline can be specifed. The tests you added seem to expect only command args to be in options-file. In particular Server.sqoop() tests "command == null && optionsFile == null" but not if both options are specified. Seems like this is not expected usage. As I mentioned earlier, the optionsFile here in Server.sqoop() is not exactly works the same as the "--options-file" in Sqoop. The use of "--options-file" from Sqoop is tested in the second e2e test for Sqoop. In that test, the "--option-file" substitute part of the Sqoop command. The Templeton-Sqoop option should not allow both "command" and "optionsFile" to be defined since the "optionsFile" here supposed to be used as the entire Sqoop command. I will add the condition check for this scenario. 8. Is there anything that can be done to make the test self-contained, so that the DB table is automatically created, for example in the DB that contains the metastore data? There is not an efficient way to make the test self-contained given any database may be used for the test and even for the metastore the type of database can be different. > [WebHCat]Enable directly invoke Sqoop job through Templeton > ----------------------------------------------------------- > > Key: HIVE-5072 > URL: https://issues.apache.org/jira/browse/HIVE-5072 > Project: Hive > Issue Type: Improvement > Components: WebHCat > Affects Versions: 0.12.0 > Reporter: Shuaishuai Nie > Assignee: Shuaishuai Nie > Attachments: HIVE-5072.1.patch, HIVE-5072.2.patch, HIVE-5072.3.patch, > Templeton-Sqoop-Action.pdf > > > Now it is hard to invoke a Sqoop job through templeton. The only way is to > use the classpath jar generated by a sqoop job and use the jar delegator in > Templeton. We should implement Sqoop Delegator to enable directly invoke > Sqoop job through Templeton. -- This message was sent by Atlassian JIRA (v6.2#6252)