[jira] [Updated] (CALCITE-2959) Support collation on struct fields

Ruben Quesada Lopez (Jira) Fri, 13 Sep 2019 06:04:15 -0700


     [ 
https://issues.apache.org/jira/browse/CALCITE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ruben Quesada Lopez updated CALCITE-2959:
-----------------------------------------
    Description: 
Currently, the class {{RelFieldCollation}} is used to define _"the ordering of 
one field of a RelNode whose output is to be sorted"_. This representation can 
hold only "simple" fields. In case of struct fields, a projection needs to be 
applied in order to reference the struct field as a simple one. For example, 
given this table:
{code}
CREATE TYPE Address AS (
  street VARCHAR(20) NOT NULL, 
  zipcode VARCHAR(20) NOT NULL,
  city VARCHAR(20) NOT NULL);

CREATE TABLE Person (
  id VARCHAR(20) NOT NULL,
  name VARCHAR(20) NOT NULL,
  address Address NOT NULL);
{code}

With a SQL query such as: "{{SELECT p.name, p.address.city FROM Person p ORDER 
BY p.address.city}}" the pseudo-plan generated would look like:
{code}
Sort(1)  // --> Collation: [1]
  Project(0=$1, 1=$2.city)
    Scan(table=Person)
{code}

However, what would happen if we had a specific Scan operator that would 
guarantee us that the records would be scanned already ordered by address.city? 
Something like:
{code}
EnhancedScan(table=Person, sort=$2.city)  // --> Collation???
{code}
The collation of such an operator cannot be represented with the current 
Calcite capabilities (RelFieldCollation), because it would not be a "simple" 
field, but a struct field, i.e. we would need a new collation abstraction to 
represent it, e.g. [2.city] or [2.2]

I would like to open the discussion to see if / how we could find a solution to 
represent this case.

  was:
Currently, the class {{RelFieldCollation}} is used to define _"the ordering of 
one field of a RelNode whose output is to be sorted"_. This representation can 
hold only "simple" fields. In case of struct fields, a projection needs to be 
applied in order to reference the struct field as a simple one. For example, 
given this table:
{code}
CREATE TYPE Address AS (
  street VARCHAR(20) NOT NULL, 
  zipcode VARCHAR(20) NOT NULL,
  city VARCHAR(20) NOT NULL);

CREATE TABLE Person (
  id VARCHAR(20) NOT NULL,
  name VARCHAR(20) NOT NULL,
  address Address NOT NULL);
{code}

With a SQL query such as: "{{SELECT p.name, p.address.city FROM Person p ORDER 
BY p.address.city}}" the pseudo-plan generated would look like:
{code}
Sort(1)  // --> Collation: [1]
  Project(0=$1, 1=$2.city)
    Scan(table=Person)
{code}

However, what would happen if we had a specific Scan operator that would 
guarantee us that the records would be scanned already ordered by address.city? 
Something like:
{code}
EnhancedScan(table=Person, sort=$2.city)  // --> Collation???
{code}
The collation of such an operator cannot be represented with the current 
Calcite capabilities (RelFieldCollation), because it would not be a "simple" 
field, but a struct field, i.e. we would need a new collation abstraction to 
represent it, e.g. [2.city] or [2.2]

I would like to open the discuss to see if / how we could find a solution to 
represent this case.


> Support collation on struct fields
> ----------------------------------
>
>                 Key: CALCITE-2959
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2959
>             Project: Calcite
>          Issue Type: New Feature
>            Reporter: Ruben Quesada Lopez
>            Priority: Major
>
> Currently, the class {{RelFieldCollation}} is used to define _"the ordering 
> of one field of a RelNode whose output is to be sorted"_. This representation 
> can hold only "simple" fields. In case of struct fields, a projection needs 
> to be applied in order to reference the struct field as a simple one. For 
> example, given this table:
> {code}
> CREATE TYPE Address AS (
>   street VARCHAR(20) NOT NULL, 
>   zipcode VARCHAR(20) NOT NULL,
>   city VARCHAR(20) NOT NULL);
> CREATE TABLE Person (
>   id VARCHAR(20) NOT NULL,
>   name VARCHAR(20) NOT NULL,
>   address Address NOT NULL);
> {code}
> With a SQL query such as: "{{SELECT p.name, p.address.city FROM Person p 
> ORDER BY p.address.city}}" the pseudo-plan generated would look like:
> {code}
> Sort(1)  // --> Collation: [1]
>   Project(0=$1, 1=$2.city)
>     Scan(table=Person)
> {code}
> However, what would happen if we had a specific Scan operator that would 
> guarantee us that the records would be scanned already ordered by 
> address.city? Something like:
> {code}
> EnhancedScan(table=Person, sort=$2.city)  // --> Collation???
> {code}
> The collation of such an operator cannot be represented with the current 
> Calcite capabilities (RelFieldCollation), because it would not be a "simple" 
> field, but a struct field, i.e. we would need a new collation abstraction to 
> represent it, e.g. [2.city] or [2.2]
> I would like to open the discussion to see if / how we could find a solution 
> to represent this case.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Updated] (CALCITE-2959) Support collation on struct fields

Reply via email to