[ 
https://issues.apache.org/jira/browse/HIVE-20953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-20953:
----------------------------------
    Description: 
The testcase is intended to test REPL LOAD with retry. The test creates a 
partitioned table and a function in the source database and loads those to the 
replica. The first attempt to load a dump is intended to fail while loading one 
of the partitions. Based on the order in which the objects get loaded, if the 
function is queued after the table, it will not be available in replica after 
the load failure. But if it's queued before the table, it will be available in 
replica even after the load failure. The test assumes the later case, which may 
not be true always.

Hence fix the testcase to order the objects by a fixed ordering. By setting 
hive.in.repl.test.files.sorted to true, the objects are ordered by the 
directory names. This ordering is available with minimal changes for testing, 
hence we use it. With this ordering a function gets loaded before a table. So 
changed the test to not expect the function to be available after the failed 
load, but be available after the retry.

While writing that testcase, I found that even if a function fails to load, 
it's visible through show functions and also is available to be called just as 
if the failure has not happened. Digging further it was found that when 
creating a function we add it to the registry and also to the metastore. If the 
later fails, we do not clean it up from the registry and thus it remains 
visible after failure. Fixed the same.

  was:
The testcase is intended to test REPL LOAD with retry. The test creates a 
partitioned table and a function in the source database and loads those to the 
replica. The first attempt to load a dump is intended to fail while loading one 
of the partitions. Based on the order in which the objects get loaded, if the 
function is queued after the table, it will not be available in replica after 
the load failure. But if it's queued before the table, it will be available in 
replica even after the load failure. The test assumes the later case, which may 
not be true always.

Hence fix the testcase to order the objects by a fixed ordering. By setting 
hive.in.repl.test.files.sorted to true, the objects are ordered by the 
directory names. This ordering is available with minimal changes for testing, 
hence we use it. With this ordering a function gets loaded before a table. So 
changed the test to not expect the function to be available after the failed 
load, but be available after the retry.


> Fix testcase 
> TestReplicationScenariosAcrossInstances#testBootstrapReplLoadRetryAfterFailureForPartitions
>  to not depend upon the order in which objects get loaded
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-20953
>                 URL: https://issues.apache.org/jira/browse/HIVE-20953
>             Project: Hive
>          Issue Type: Bug
>          Components: Tests
>    Affects Versions: 4.0.0
>            Reporter: Ashutosh Bapat
>            Assignee: Ashutosh Bapat
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>         Attachments: HIVE-20953.01, HIVE-20953.02, HIVE-20953.02, 
> test_func_load_failure_retry.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The testcase is intended to test REPL LOAD with retry. The test creates a 
> partitioned table and a function in the source database and loads those to 
> the replica. The first attempt to load a dump is intended to fail while 
> loading one of the partitions. Based on the order in which the objects get 
> loaded, if the function is queued after the table, it will not be available 
> in replica after the load failure. But if it's queued before the table, it 
> will be available in replica even after the load failure. The test assumes 
> the later case, which may not be true always.
> Hence fix the testcase to order the objects by a fixed ordering. By setting 
> hive.in.repl.test.files.sorted to true, the objects are ordered by the 
> directory names. This ordering is available with minimal changes for testing, 
> hence we use it. With this ordering a function gets loaded before a table. So 
> changed the test to not expect the function to be available after the failed 
> load, but be available after the retry.
> While writing that testcase, I found that even if a function fails to load, 
> it's visible through show functions and also is available to be called just 
> as if the failure has not happened. Digging further it was found that when 
> creating a function we add it to the registry and also to the metastore. If 
> the later fails, we do not clean it up from the registry and thus it remains 
> visible after failure. Fixed the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to