yibo wen created CALCITE-7612:
---------------------------------

             Summary: Track whether a column origin is derived from an aggregate
                 Key: CALCITE-7612
                 URL: https://issues.apache.org/jira/browse/CALCITE-7612
             Project: Calcite
          Issue Type: Improvement
          Components: core
    Affects Versions: 1.38.0
            Reporter: yibo wen


Description:

  RelColumnOrigin currently exposes whether an output column is derived from an 
origin column via isDerived(), but it does not distinguish ordinary expression 
derivation from
  aggregate derivation.

  For example:

  SELECT a + b AS c FROM t

  and

  SELECT SUM(a) AS s FROM t

  both produce derived column origins, but downstream lineage or 
impact-analysis tools may need to distinguish whether the output column was 
derived by an aggregate call.

  Expected behavior:
  Column-origin metadata should be able to tell whether an origin is derived 
from an aggregate expression.

  Motivation:
  For column lineage, data governance, and impact analysis, aggregate-derived 
columns often need to be handled differently from ordinary expression-derived 
columns. For example,
  SUM(a), COUNT(a), AVG(a) and a + b all depend on source columns, but their 
semantic lineage is different.

  Possible design direction:
  Extend RelColumnOrigin or related metadata to expose aggregate-derived 
information. This may require discussion because RelColumnOrigin is part of 
Calcite's public metadata API.

  Open questions:
  Should aggregate derivation be represented as a new boolean flag, a 
derivation kind enum, or a separate metadata API?
  Should this information affect equals/hashCode semantics of RelColumnOrigin?
  How should aggregate calls with zero arguments, such as COUNT(*), be 
represented?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to