Also this is at very least a Beam bug. You can file a Beam issue if you want, otherwise I will when I get back.
Andrew On Fri, Jan 27, 2023 at 11:27 AM Andrew Pilloud <apill...@google.com> wrote: > Hi Talat, > > I did get your test case running and added some logging to > RexProgramBuilder.mergePrograms. There is only one merge that occurs during > the test and it has an output type of RecordType(JavaType(int) ID, > JavaType(class java.lang.String) V). This does seem like the correct output > name but it doesn't match the final output name, so something is still > different than the Beam test case. I also modified mergePrograms to > purposely corrupt the output names, that did not cause the test to fail or > trip the 'assert mergedProg.getOutputRowType() == > topProgram.getOutputRowType();' in mergePrograms. I could not find any > Calcite unit tests for RexProgramBuilder.mergePrograms or > CoreRules.CALC_MERGE rule so I think it is still probable that the problem > is in this area. > > One minor issue I encountered. It took me a while to get your test case > running, it doesn't appear there are any calcite gradle rules to run > CoreQuidemTest and constructing the classpath manually was tedious. Did I > miss something? > > I'm still working on this but I'm out today and Monday, it will probably > be Wednesday before I make any more progress. > > Andrew > > On Fri, Jan 27, 2023 at 10:40 AM Talat Uyarer < > tuya...@paloaltonetworks.com> wrote: > >> Hi Andrew, >> >> Yes This aligned also with my debugging. In My Kenn's reply you can see a >> sql test which I wrote in Calcite. Somehow Calcite does not have this issue >> with the 1.28 version. >> >> !use post >> !set outputformat mysql >> >> #Test aliases with with clause >> WITH tempTable(id, v) AS (select "hr"."emps"."empid" as id, >> "hr"."emps"."name" as v from "hr"."emps") >> SELECT tempTable.id as id, tempTable.v as "value" FROM tempTable WHERE >> tempTable.v <> '11' ; >> +-----+-----------+ >> | ID | value | >> +-----+-----------+ >> | 100 | Bill | >> | 110 | Theodore | >> | 150 | Sebastian | >> | 200 | Eric | >> +-----+-----------+ >> (4 rows) >> >> !ok >> >> >> On Wed, Jan 25, 2023 at 6:08 PM Andrew Pilloud <apill...@google.com> >> wrote: >> >>> Yes, that worked. >>> >>> The issue does not occur if I disable all of the following planner >>> rules: CoreRules.FILTER_CALC_MERGE, CoreRules.PROJECT_CALC_MERGE, >>> LogicalCalcMergeRule.INSTANCE (which wraps CoreRules.CALC_MERGE), >>> and BeamCalcMergeRule.INSTANCE (which wraps CoreRules.CALC_MERGE). >>> >>> All the rules share a common call to RexProgramBuilder.mergePrograms, so >>> I suspect the problem lies there. I spent some time looking but wasn't able >>> to find it by code inspection, it looks like this code path is doing the >>> right thing with names. I'll spend some time tomorrow trying to reproduce >>> this on pure Calcite. >>> >>> Andrew >>> >>> >>> On Tue, Jan 24, 2023 at 8:24 PM Talat Uyarer < >>> tuya...@paloaltonetworks.com> wrote: >>> >>>> Hi Andrew, >>>> >>>> Thanks for writing a test for this use case. Without Where clause it >>>> works as expected on our test cases also too. Please add where clause on >>>> second select. With the below query it does not return column names. I >>>> tested on my local also. >>>> >>>> WITH tempTable (id, v) AS (SELECT f_int as id, f_string as v FROM >>>> PCOLLECTION) SELECT id AS fout_int, v AS fout_string FROM tempTable WHERE >>>> id > 1 >>>> >>>> Thanks >>>> >>>> On Tue, Jan 24, 2023 at 5:28 PM Andrew Pilloud <apill...@google.com> >>>> wrote: >>>> >>>>> +dev@beam.apache.org <dev@beam.apache.org> >>>>> >>>>> I tried reproducing this but was not successful, the output schema was >>>>> as expected. I added the following to BeamSqlMultipleSchemasTest.java at >>>>> head. (I did discover that >>>>> PAssert.that(result).containsInAnyOrder(output) >>>>> doesn't validate column names however.) >>>>> >>>>> @Test >>>>> public void testSelectAs() { >>>>> PCollection<Row> input = pipeline.apply(create(row(1, "strstr"))); >>>>> >>>>> PCollection<Row> result = >>>>> input.apply(SqlTransform.query("WITH tempTable (id, v) AS >>>>> (SELECT f_int as id, f_string as v FROM PCOLLECTION) SELECT id AS >>>>> fout_int, >>>>> v AS fout_string FROM tempTable")); >>>>> >>>>> Schema output_schema = >>>>> >>>>> Schema.builder().addInt32Field("fout_int").addStringField("fout_string").build(); >>>>> assertThat(result.getSchema(), equalTo(output_schema)); >>>>> >>>>> Row output = Row.withSchema(output_schema).addValues(1, >>>>> "strstr").build(); >>>>> PAssert.that(result).containsInAnyOrder(output); >>>>> pipeline.run(); >>>>> } >>>>> >>>>> On Tue, Jan 24, 2023 at 8:13 AM Talat Uyarer < >>>>> tuya...@paloaltonetworks.com> wrote: >>>>> >>>>>> Hi Kenn, >>>>>> >>>>>> Thank you for replying back to my email. >>>>>> >>>>>> I was under the same impression about Calcite. But I wrote a test on >>>>>> Calcite 1.28 too. It is working without issue that I see on BEAM >>>>>> >>>>>> Here is my test case. If you want you can also run on Calcite. Please >>>>>> put under core/src/test/resources/sql as text file. and Run >>>>>> CoreQuidemTest >>>>>> class. >>>>>> >>>>>> !use post >>>>>> !set outputformat mysql >>>>>> >>>>>> #Test aliases with with clause >>>>>> WITH tempTable(id, v) AS (select "hr"."emps"."empid" as id, >>>>>> "hr"."emps"."name" as v from "hr"."emps") >>>>>> SELECT tempTable.id as id, tempTable.v as "value" FROM tempTable WHERE >>>>>> tempTable.v <> '11' ; >>>>>> +-----+-----------+ >>>>>> | ID | value | >>>>>> +-----+-----------+ >>>>>> | 100 | Bill | >>>>>> | 110 | Theodore | >>>>>> | 150 | Sebastian | >>>>>> | 200 | Eric | >>>>>> +-----+-----------+ >>>>>> (4 rows) >>>>>> >>>>>> !ok >>>>>> >>>>>> >>>>>> On Mon, Jan 23, 2023 at 10:16 AM Kenneth Knowles <k...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Looking at the code that turns a logical CalcRel into a BeamCalcRel >>>>>>> I do not see any obvious cause for this: >>>>>>> https://github.com/apache/beam/blob/b3aa2e89489898f8c760294ba4dba2310ac53e70/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamCalcRule.java#L69 >>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_beam_blob_b3aa2e89489898f8c760294ba4dba2310ac53e70_sdks_java_extensions_sql_src_main_java_org_apache_beam_sdk_extensions_sql_impl_rule_BeamCalcRule.java-23L69&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=KXc2qSceL6qFbFnQ_2qUOHr9mKuc6zYY8rJTNZC8p_wTcNs4M6mHQoCuoc4JfeaA&s=KjzplEf29oFB6uivvdjixpQiArWtfV-1SXpALL-ugEM&e=> >>>>>>> >>>>>>> I don't like to guess that upstream libraries have the bug, but in >>>>>>> this case I wonder if the alias is lost in the Calcite optimizer rule >>>>>>> for >>>>>>> merging the projects and filters into a Calc. >>>>>>> >>>>>>> Kenn >>>>>>> >>>>>>> On Mon, Jan 23, 2023 at 10:13 AM Kenneth Knowles <k...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> I am not sure I understand the question, but I do see an issue. >>>>>>>> >>>>>>>> Context: "CalcRel" is an optimized relational operation that is >>>>>>>> somewhat like ParDo, with a small snippet of a single-assignment DSL >>>>>>>> embedded in it. Calcite will choose to merge all the projects and >>>>>>>> filters >>>>>>>> into the node, and then generates Java bytecode to directly execute >>>>>>>> the DSL. >>>>>>>> >>>>>>>> Problem: it looks like the CalcRel has output columns with aliases >>>>>>>> "id" and "v" where it should have output columns with aliases "id" and >>>>>>>> "value". >>>>>>>> >>>>>>>> Kenn >>>>>>>> >>>>>>>> On Thu, Jan 19, 2023 at 6:01 PM Ahmet Altay <al...@google.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Adding: @Andrew Pilloud <apill...@google.com> @Kenneth Knowles >>>>>>>>> <k...@google.com> >>>>>>>>> >>>>>>>>> On Thu, Jan 12, 2023 at 12:31 PM Talat Uyarer via user < >>>>>>>>> u...@beam.apache.org> wrote: >>>>>>>>> >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>> I am using Beam 2.43 with Calcite SQL with Java. >>>>>>>>>> >>>>>>>>>> I have a query with a WITH clause and some aliasing. Looks like >>>>>>>>>> Beam Query optimizer after optimizing my query, it drops Select >>>>>>>>>> statement's >>>>>>>>>> aliases. Can you help me to identify where the problem is ? >>>>>>>>>> >>>>>>>>>> This is my query >>>>>>>>>> INFO: SQL: >>>>>>>>>> WITH `tempTable` (`id`, `v`) AS (SELECT >>>>>>>>>> `PCOLLECTION`.`f_nestedRow`.`f_nestedInt` AS `id`, >>>>>>>>>> `PCOLLECTION`.`f_nestedRow`.`f_nestedString` AS `v` >>>>>>>>>> FROM `beam`.`PCOLLECTION` AS `PCOLLECTION`) (SELECT >>>>>>>>>> `tempTable`.`id` AS `id`, `tempTable`.`v` AS `value` >>>>>>>>>> FROM `tempTable` AS `tempTable` >>>>>>>>>> WHERE `tempTable`.`v` <> '11') >>>>>>>>>> >>>>>>>>>> This is Calcite Plan look at LogicalProject(id=[$0], value=[$1]) >>>>>>>>>> in SQL plan. >>>>>>>>>> >>>>>>>>>> Jan 12, 2023 12:19:08 PM >>>>>>>>>> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner >>>>>>>>>> convertToBeamRel >>>>>>>>>> INFO: SQLPlan> >>>>>>>>>> LogicalProject(id=[$0], value=[$1]) >>>>>>>>>> LogicalFilter(condition=[<>($1, '11')]) >>>>>>>>>> LogicalProject(id=[$1.f_nestedInt], v=[$1.f_nestedString]) >>>>>>>>>> BeamIOSourceRel(table=[[beam, PCOLLECTION]]) >>>>>>>>>> >>>>>>>>>> But Beam Plan does not have a LogicalProject(id=[$0], value=[$1]) >>>>>>>>>> or similar. >>>>>>>>>> >>>>>>>>>> Jan 12, 2023 12:19:08 PM >>>>>>>>>> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner >>>>>>>>>> convertToBeamRel >>>>>>>>>> INFO: BEAMPlan> >>>>>>>>>> BeamCalcRel(expr#0..1=[{inputs}], expr#2=[$t1.f_nestedInt], >>>>>>>>>> expr#3=[$t1.f_nestedString], expr#4=['11':VARCHAR], expr#5=[<>($t3, >>>>>>>>>> $t4)], >>>>>>>>>> id=[$t2], v=[$t3], $condition=[$t5]) >>>>>>>>>> BeamIOSourceRel(table=[[beam, PCOLLECTION]]) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> >>>>>>>>>