[ 
https://issues.apache.org/jira/browse/IMPALA-12487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17912048#comment-17912048
 ] 

ASF subversion and git services commented on IMPALA-12487:
----------------------------------------------------------

Commit 1f7b9601e5a768c0b2061fef95c750ae74059b84 in impala's branch 
refs/heads/master from Sai Hemanth Gantasala
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1f7b9601e ]

IMPALA-13403: Refactor the checks of skip reloading file metadata for
ALTER_TABLE events

IMPALA-12487 adds an optimization that if an ALTER_TABLE event has
trivial changes in StorageDescriptor (e.g. removing optional field
'storedAsSubDirectories'=false which defaults to false), file
metadata reload will be skipped, no matter what changes are in the
table properties. This is problematic since some HMS clients (e.g.
Spark) could modify both the table properties and StorageDescriptor.
If there is a non-trivial changes in table properties (e.g. 'location'
change), we shouldn't skip reloading file metadata.

Testing:
- Added a unit test to verify the same

Change-Id: Ia969dd32385ac5a1a9a65890a5ccc8cd257f4b97
Reviewed-on: http://gerrit.cloudera.org:8080/21971
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Skip reloading file metadata for ALTER_TABLE events with trivial changes in 
> StorageDescriptor
> ---------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-12487
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12487
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Quanlong Huang
>            Assignee: Sai Hemanth Gantasala
>            Priority: Critical
>             Fix For: Impala 4.4.0
>
>         Attachments: ALTER_TABLE_event_with_SD_changes.png
>
>
> IMPALA-11534 skips reloading file metadata for some trivial ALTER_TABLE 
> events. However, ALTER_TABLE events that have trivial changes in 
> StorageDescriptor are not handled in IMPALA-11534. Some of them can skip 
> reloading file metadata. The thrift defination of StorageDescriptor (not all 
> of the fields are related to file metadata):
> {code:java}
> // this object holds all the information about physical storage of the data 
> belonging to a table
> struct StorageDescriptor {
>   1: list<FieldSchema> cols,  // required (refer to types defined above)
>   2: string location,         // defaults to <warehouse loc>/<db 
> loc>/tablename
>   3: string inputFormat,      // SequenceFileInputFormat (binary) or 
> TextInputFormat`  or custom format
>   4: string outputFormat,     // SequenceFileOutputFormat (binary) or 
> IgnoreKeyTextOutputFormat or custom format
>   5: bool   compressed,       // compressed or not
>   6: i32    numBuckets,       // this must be specified if there are any 
> dimension columns
>   7: SerDeInfo    serdeInfo,  // serialization and deserialization information
>   8: list<string> bucketCols, // reducer grouping columns and clustering 
> columns and bucketing columns`
>   9: list<Order>  sortCols,   // sort order of the data in each bucket
>   10: map<string, string> parameters, // any user supplied key value hash
>   11: optional SkewedInfo skewedInfo, // skewed information
>   12: optional bool   storedAsSubDirectories       // stored as 
> subdirectories or not
> } {code}
> The attached screenshot is an example comparing the before and after Table 
> object of an ALTER_TABLE event that has trivial changes in StorageDescriptor. 
> It just clears the field of 'storedAsSubDirectories:false', and that field 
> defaults to be false. So actually makes no difference in the 
> StorageDescriptor.
> I think we can compare changes in the StorageDescriptor and only reload file 
> metadata if any of these changes:
>  * 'location'
>  * 'storedAsSubDirectories'
> Note that the default of 'storedAsSubDirectories' is false so removing 
> 'storedAsSubDirectories:false' is considered as unchanged.
> CC [~hemanth619], [~csringhofer] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to