[
https://issues.apache.org/jira/browse/IMPALA-13548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joe McDonnell resolved IMPALA-13548.
------------------------------------
Fix Version/s: Impala 5.0.0
Resolution: Fixed
> Add a mode to schedule scan ranges in order of modification time
> ----------------------------------------------------------------
>
> Key: IMPALA-13548
> URL: https://issues.apache.org/jira/browse/IMPALA-13548
> Project: IMPALA
> Issue Type: Task
> Components: Backend
> Affects Versions: Impala 4.5.0
> Reporter: Joe McDonnell
> Assignee: Joe McDonnell
> Priority: Major
> Fix For: Impala 5.0.0
>
>
> When a file gets added to a table, the scheduler can have some instability in
> how it assigns scan ranges. The scheduler is walking through the scan ranges
> and handing them out in a single pass. If the new scan range is at the end of
> the list, then there is minimal disruption. Every assignment would be the
> same except the node that got the new scan range. However, if the new scan
> range is early in the list, it's assignment can change subsequent assignments
> of other scan ranges. This can cascade and result in an entirely different
> assignment.
> This is bad for the tuple cache, because it makes it difficult to get cache
> hits for a table that is ingesting data.
> If the scan ranges were ordered by modification time (ascending), then new
> scan ranges for an ingest would be at the end of the list and cause minimal
> disruption.
> We should add a mode that does this.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]