Mit Desai created YUNIKORN-3076:
-----------------------------------

             Summary: Web UI Fails to Load Applications for Certain Queues on 
Heavily Loaded Clusters
                 Key: YUNIKORN-3076
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-3076
             Project: Apache YuniKorn
          Issue Type: Bug
          Components: webapp
            Reporter: Mit Desai
            Assignee: Mit Desai
             Fix For: 1.6.3, 1.6.2, 1.6.1, 1.6.0, 1.5.2


On heavily loaded clusters, the Web UI randomly fails to load applications for 
certain queues. The allocations and resource usage are reported correctly, but 
the list of applications does not load for some queues. There is no definitive 
way to reproduce this scenario, but it has been observed frequently.

{*}Initial Assumptions{*}: Initially, it was assumed that this issue could be 
due to a large payload being exchanged between the scheduler and the web UI, 
causing network latency or client-side parsing delays for a large number of 
applications/pods. However, this does not seem to be the case, as the issue was 
observed yesterday on a queue with just 3 applications and approximately 200 
pods.

{*}Root Cause{*}: Upon further debugging, it was found that not all 
applications come back with a 'stateLog' object. When the UI rendering occurs, 
there is an unconditional access to the stateLog object, which fails for 
applications that do not have it. This causes the rendering process to fail and 
results in a blank applications page.

{*}Steps to Validate{*}:
 # When experiencing such issues in the Web UI, open the inspect panel and 
navigate to the network tab.
 # Clear any existing network items. Note: Clear the network items if you are 
moving to a different queue, as the UI will cache the applications object 
unless the page is refreshed.
 # Go to the applications tab and select the desired queue from the drop-down 
menu.
 # An 'Applications' tab should appear in the network tab, showing the payload 
it received.
 # If the UI is not loading the applications, there will be an application with 
{{applicationState=New}} that does not have a stateLog object.

Attached is a screenshot of an occurrence from one of the queues and the 
information available in the network tab.

{*}Proposed Solution{*}: Modify the UI rendering logic to handle cases where 
the stateLog object is missing, ensuring that it does not fail and give up on 
rendering the entire applications page. Implement error handling to either skip 
or provide a default value for applications without a stateLog object.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org

Reply via email to