Also this is at very least a Beam bug. You can file a Beam issue if you
want, otherwise I will when I get back.

Andrew

On Fri, Jan 27, 2023 at 11:27 AM Andrew Pilloud <apill...@google.com> wrote:

> Hi Talat,
>
> I did get your test case running and added some logging to
> RexProgramBuilder.mergePrograms. There is only one merge that occurs during
> the test and it has an output type of RecordType(JavaType(int) ID,
> JavaType(class java.lang.String) V). This does seem like the correct output
> name but it doesn't match the final output name, so something is still
> different than the Beam test case. I also modified mergePrograms to
> purposely corrupt the output names, that did not cause the test to fail or
> trip the 'assert mergedProg.getOutputRowType() ==
> topProgram.getOutputRowType();' in mergePrograms. I could not find any
> Calcite unit tests for RexProgramBuilder.mergePrograms or
> CoreRules.CALC_MERGE rule so I think it is still probable that the problem
> is in this area.
>
> One minor issue I encountered. It took me a while to get your test case
> running, it doesn't appear there are any calcite gradle rules to run
> CoreQuidemTest and constructing the classpath manually was tedious. Did I
> miss something?
>
> I'm still working on this but I'm out today and Monday, it will probably
> be Wednesday before I make any more progress.
>
> Andrew
>
> On Fri, Jan 27, 2023 at 10:40 AM Talat Uyarer <
> tuya...@paloaltonetworks.com> wrote:
>
>> Hi Andrew,
>>
>> Yes This aligned also with my debugging. In My Kenn's reply you can see a
>> sql test which I wrote in Calcite. Somehow Calcite does not have this issue
>> with the 1.28 version.
>>
>> !use post
>> !set outputformat mysql
>>
>> #Test aliases with with clause
>> WITH tempTable(id, v) AS (select "hr"."emps"."empid" as id, 
>> "hr"."emps"."name" as v from "hr"."emps")
>> SELECT tempTable.id as id, tempTable.v as "value" FROM tempTable WHERE 
>> tempTable.v <> '11' ;
>> +-----+-----------+
>> | ID  | value     |
>> +-----+-----------+
>> | 100 | Bill      |
>> | 110 | Theodore  |
>> | 150 | Sebastian |
>> | 200 | Eric      |
>> +-----+-----------+
>> (4 rows)
>>
>> !ok
>>
>>
>> On Wed, Jan 25, 2023 at 6:08 PM Andrew Pilloud <apill...@google.com>
>> wrote:
>>
>>> Yes, that worked.
>>>
>>> The issue does not occur if I disable all of the following planner
>>> rules: CoreRules.FILTER_CALC_MERGE, CoreRules.PROJECT_CALC_MERGE,
>>> LogicalCalcMergeRule.INSTANCE (which wraps CoreRules.CALC_MERGE),
>>> and BeamCalcMergeRule.INSTANCE (which wraps CoreRules.CALC_MERGE).
>>>
>>> All the rules share a common call to RexProgramBuilder.mergePrograms, so
>>> I suspect the problem lies there. I spent some time looking but wasn't able
>>> to find it by code inspection, it looks like this code path is doing the
>>> right thing with names. I'll spend some time tomorrow trying to reproduce
>>> this on pure Calcite.
>>>
>>> Andrew
>>>
>>>
>>> On Tue, Jan 24, 2023 at 8:24 PM Talat Uyarer <
>>> tuya...@paloaltonetworks.com> wrote:
>>>
>>>> Hi Andrew,
>>>>
>>>> Thanks for writing a test for this use case. Without Where clause it
>>>> works as expected on our test cases also too. Please add where clause on
>>>> second select. With the below query it does not return column names. I
>>>> tested on my local also.
>>>>
>>>> WITH tempTable (id, v) AS (SELECT f_int as id, f_string as v FROM
>>>> PCOLLECTION) SELECT id AS fout_int, v AS fout_string FROM tempTable WHERE
>>>> id > 1
>>>>
>>>> Thanks
>>>>
>>>> On Tue, Jan 24, 2023 at 5:28 PM Andrew Pilloud <apill...@google.com>
>>>> wrote:
>>>>
>>>>> +dev@beam.apache.org <dev@beam.apache.org>
>>>>>
>>>>> I tried reproducing this but was not successful, the output schema was
>>>>> as expected. I added the following to BeamSqlMultipleSchemasTest.java at
>>>>> head. (I did discover that  
>>>>> PAssert.that(result).containsInAnyOrder(output)
>>>>> doesn't validate column names however.)
>>>>>
>>>>>   @Test
>>>>>   public void testSelectAs() {
>>>>>     PCollection<Row> input = pipeline.apply(create(row(1, "strstr")));
>>>>>
>>>>>     PCollection<Row> result =
>>>>>         input.apply(SqlTransform.query("WITH tempTable (id, v) AS
>>>>> (SELECT f_int as id, f_string as v FROM PCOLLECTION) SELECT id AS 
>>>>> fout_int,
>>>>> v AS fout_string FROM tempTable"));
>>>>>
>>>>>     Schema output_schema =
>>>>>
>>>>> Schema.builder().addInt32Field("fout_int").addStringField("fout_string").build();
>>>>>     assertThat(result.getSchema(), equalTo(output_schema));
>>>>>
>>>>>     Row output = Row.withSchema(output_schema).addValues(1,
>>>>> "strstr").build();
>>>>>     PAssert.that(result).containsInAnyOrder(output);
>>>>>     pipeline.run();
>>>>>   }
>>>>>
>>>>> On Tue, Jan 24, 2023 at 8:13 AM Talat Uyarer <
>>>>> tuya...@paloaltonetworks.com> wrote:
>>>>>
>>>>>> Hi Kenn,
>>>>>>
>>>>>> Thank you for replying back to my email.
>>>>>>
>>>>>> I was under the same impression about Calcite. But I wrote a test on
>>>>>> Calcite 1.28 too. It is working without issue that I see on BEAM
>>>>>>
>>>>>> Here is my test case. If you want you can also run on Calcite. Please
>>>>>> put under core/src/test/resources/sql as text file. and Run 
>>>>>> CoreQuidemTest
>>>>>> class.
>>>>>>
>>>>>> !use post
>>>>>> !set outputformat mysql
>>>>>>
>>>>>> #Test aliases with with clause
>>>>>> WITH tempTable(id, v) AS (select "hr"."emps"."empid" as id, 
>>>>>> "hr"."emps"."name" as v from "hr"."emps")
>>>>>> SELECT tempTable.id as id, tempTable.v as "value" FROM tempTable WHERE 
>>>>>> tempTable.v <> '11' ;
>>>>>> +-----+-----------+
>>>>>> | ID  | value     |
>>>>>> +-----+-----------+
>>>>>> | 100 | Bill      |
>>>>>> | 110 | Theodore  |
>>>>>> | 150 | Sebastian |
>>>>>> | 200 | Eric      |
>>>>>> +-----+-----------+
>>>>>> (4 rows)
>>>>>>
>>>>>> !ok
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 23, 2023 at 10:16 AM Kenneth Knowles <k...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Looking at the code that turns a logical CalcRel into a BeamCalcRel
>>>>>>> I do not see any obvious cause for this:
>>>>>>> https://github.com/apache/beam/blob/b3aa2e89489898f8c760294ba4dba2310ac53e70/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamCalcRule.java#L69
>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_beam_blob_b3aa2e89489898f8c760294ba4dba2310ac53e70_sdks_java_extensions_sql_src_main_java_org_apache_beam_sdk_extensions_sql_impl_rule_BeamCalcRule.java-23L69&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=KXc2qSceL6qFbFnQ_2qUOHr9mKuc6zYY8rJTNZC8p_wTcNs4M6mHQoCuoc4JfeaA&s=KjzplEf29oFB6uivvdjixpQiArWtfV-1SXpALL-ugEM&e=>
>>>>>>>
>>>>>>> I don't like to guess that upstream libraries have the bug, but in
>>>>>>> this case I wonder if the alias is lost in the Calcite optimizer rule 
>>>>>>> for
>>>>>>> merging the projects and filters into a Calc.
>>>>>>>
>>>>>>> Kenn
>>>>>>>
>>>>>>> On Mon, Jan 23, 2023 at 10:13 AM Kenneth Knowles <k...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I am not sure I understand the question, but I do see an issue.
>>>>>>>>
>>>>>>>> Context: "CalcRel" is an optimized relational operation that is
>>>>>>>> somewhat like ParDo, with a small snippet of a single-assignment DSL
>>>>>>>> embedded in it. Calcite will choose to merge all the projects and 
>>>>>>>> filters
>>>>>>>> into the node, and then generates Java bytecode to directly execute 
>>>>>>>> the DSL.
>>>>>>>>
>>>>>>>> Problem: it looks like the CalcRel has output columns with aliases
>>>>>>>> "id" and "v" where it should have output columns with aliases "id" and
>>>>>>>> "value".
>>>>>>>>
>>>>>>>> Kenn
>>>>>>>>
>>>>>>>> On Thu, Jan 19, 2023 at 6:01 PM Ahmet Altay <al...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Adding: @Andrew Pilloud <apill...@google.com> @Kenneth Knowles
>>>>>>>>> <k...@google.com>
>>>>>>>>>
>>>>>>>>> On Thu, Jan 12, 2023 at 12:31 PM Talat Uyarer via user <
>>>>>>>>> u...@beam.apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> Hi All,
>>>>>>>>>>
>>>>>>>>>> I am using Beam 2.43 with Calcite SQL with Java.
>>>>>>>>>>
>>>>>>>>>> I have a query with a WITH clause and some aliasing. Looks like
>>>>>>>>>> Beam Query optimizer after optimizing my query, it drops Select 
>>>>>>>>>> statement's
>>>>>>>>>> aliases. Can you help me to identify where the problem is ?
>>>>>>>>>>
>>>>>>>>>> This is my query
>>>>>>>>>> INFO: SQL:
>>>>>>>>>> WITH `tempTable` (`id`, `v`) AS (SELECT
>>>>>>>>>> `PCOLLECTION`.`f_nestedRow`.`f_nestedInt` AS `id`,
>>>>>>>>>> `PCOLLECTION`.`f_nestedRow`.`f_nestedString` AS `v`
>>>>>>>>>> FROM `beam`.`PCOLLECTION` AS `PCOLLECTION`) (SELECT
>>>>>>>>>> `tempTable`.`id` AS `id`, `tempTable`.`v` AS `value`
>>>>>>>>>> FROM `tempTable` AS `tempTable`
>>>>>>>>>> WHERE `tempTable`.`v` <> '11')
>>>>>>>>>>
>>>>>>>>>> This is Calcite Plan look at LogicalProject(id=[$0], value=[$1])
>>>>>>>>>> in SQL plan.
>>>>>>>>>>
>>>>>>>>>> Jan 12, 2023 12:19:08 PM
>>>>>>>>>> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner 
>>>>>>>>>> convertToBeamRel
>>>>>>>>>> INFO: SQLPlan>
>>>>>>>>>> LogicalProject(id=[$0], value=[$1])
>>>>>>>>>>   LogicalFilter(condition=[<>($1, '11')])
>>>>>>>>>>     LogicalProject(id=[$1.f_nestedInt], v=[$1.f_nestedString])
>>>>>>>>>>       BeamIOSourceRel(table=[[beam, PCOLLECTION]])
>>>>>>>>>>
>>>>>>>>>> But Beam Plan does not have a LogicalProject(id=[$0], value=[$1])
>>>>>>>>>> or similar.
>>>>>>>>>>
>>>>>>>>>> Jan 12, 2023 12:19:08 PM
>>>>>>>>>> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner 
>>>>>>>>>> convertToBeamRel
>>>>>>>>>> INFO: BEAMPlan>
>>>>>>>>>> BeamCalcRel(expr#0..1=[{inputs}], expr#2=[$t1.f_nestedInt],
>>>>>>>>>> expr#3=[$t1.f_nestedString], expr#4=['11':VARCHAR], expr#5=[<>($t3, 
>>>>>>>>>> $t4)],
>>>>>>>>>> id=[$t2], v=[$t3], $condition=[$t5])
>>>>>>>>>>   BeamIOSourceRel(table=[[beam, PCOLLECTION]])
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>

Reply via email to