Gokul Kolady created IMPALA-14953:
-------------------------------------
Summary: Impala AI Query Profile Analyzer
Key: IMPALA-14953
URL: https://issues.apache.org/jira/browse/IMPALA-14953
Project: IMPALA
Issue Type: New Feature
Reporter: Gokul Kolady
h3. Summary
In Impala, there are query profiles that describe all the intricate details of
a query's planning, execution, and resource usage. However, nowadays this
profile is extremely large and hard for the average user to parse through to
find their information of interest. We want to create an AI-driven query
profile analyzer that can be accessed from within the Impala Web UI. This
analyzer will ingest the given query profile as context and provide a summary
of the profile along with an analysis of performance bottlenecks and their
sources.
h3. Background & Problem Statement
In Impala, query profiles are the ultimate source of truth for diagnosing
performance issues. They describe all the intricate details of a query's
planning, execution, and resource usage (e.g., memory spills, scanner thread
wait times, join cardinality, and HDFS I/O).
However, as workloads have scaled, these profiles have become incredibly dense,
highly technical documents—often spanning thousands of lines of text or massive
JSON structures. For the average data analyst, developer, or even junior
platform administrator, parsing through this wall of metrics to find the actual
root cause of a slow or failed query is overwhelming and requires deep,
specialized domain expertise.
h3. Business Value
By integrating GenAI directly into the diagnostic workflow, we can democratize
performance tuning. Instead of relying on escalation to Level 3 support or
expert DBAs, average users will get instant, actionable insights into why their
query failed or ran slowly, and exactly how to fix it (e.g., "Add table
statistics," or "Fix data skew on the join key"). This will drastically reduce
support tickets and accelerate Mean Time To Resolution (MTTR).
h3. Proposed Solution & User Experience
We will build an AI-driven Query Profile Analyzer natively embedded within the
existing Impala Web UI. When a user views a specific query execution in the Web
UI, they will see a new "AI Analysis" panel. The system will ingest the query
profile as context and instantly generate a plain-English summary of the
execution, highlighting the primary performance bottlenecks.
h3. High-Level Acceptance Criteria
h4. UI/UX Integration
The Impala Web UI (specifically the query details page) includes a clearly
visible "AI Analysis" tab that contains a "Generate AI Analysis" button.
h4. Context Ingestion & Prompting
The backend must successfully parse the active query profile (stripping
unnecessary boilerplate to fit within standard LLM token limits if necessary)
and pass it to the AI model as context.
The system must securely handle the query text and profile data, ensuring that
PII is handled according to enterprise security standards before being sent to
the LLM.
h4. Analysis Accuracy
The AI's responses must explicitly reference the specific metrics from the
user's profile (e.g., "I see your TotalStorageWaitTime was 45 seconds...") and
map them to documented Impala behaviors.
h4. Configurable AI Backend
Administrators must have the ability to configure which LLM endpoint the
analyzer points to (e.g., an internal enterprise model or a secure external
API) via Cloudera Manager or Impala startup flags.
The feature must be an optional function in case an organization's security
policy prohibits sending diagnostic data to an AI model.
h4. Performance & Error Handling
The system must parse the profile into digestible pieces that help the LLM
retrieve information about the query without exceeded its context window limit.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]