twalthr commented on code in PR #25707: URL: https://github.com/apache/flink/pull/25707#discussion_r1862101386
########## flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/ProcessTableFunction.java: ########## @@ -0,0 +1,239 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.functions; + +import org.apache.flink.annotation.PublicEvolving; +import org.apache.flink.table.annotation.ArgumentHint; +import org.apache.flink.table.annotation.DataTypeHint; +import org.apache.flink.table.annotation.FunctionHint; +import org.apache.flink.table.catalog.DataTypeFactory; +import org.apache.flink.table.types.inference.TypeInference; +import org.apache.flink.util.Collector; + +/** + * Base class for a user-defined process table function. A process table function (PTF) maps zero, + * one, or multiple tables to zero, one, or multiple rows (or structured types). Scalar arguments + * are also supported. If the output record consists of only one field, the wrapper can be omitted, + * and a scalar value can be emitted that will be implicitly wrapped into a row by the runtime. + * + * <p>PTFs are the most powerful function kind for Flink SQL and Table API. They enable implementing + * user-defined operators that can be as feature-rich as built-in operations. PTFs have access to + * Flink's managed state, event-time and timer services, underlying table changelogs, and can take + * multiple ordered and/or partitioned tables to produce a new table. + * + * <h1>Table Semantics and Virtual Processors</h1> + * + * <p>PTFs can produce a new table by consuming tables as arguments. For scalability, input tables + * are distributed into virtual processors. Each virtual processor executes a PTF instance and has + * access only to a share of the entire table. The argument declaration decides about the size of + * the share and co-location of data. Conceptually, tables can be processed either "as row" (i.e. + * with row semantics) or "as set" (i.e. with set semantics). + * + * <h2>Table Argument with Row Semantics</h2> + * + * <p>A PTF that takes a table with row semantics assumes that there is no correlation between rows + * and each row can be processed independently. The framework is free in how to distribute rows + * among virtual processors and each virtual processor has access only to the currently processed + * row. + * + * <h2>Table Argument with Set Semantics</h2> + * + * <p>A PTF that takes a table with set semantics assumes that there is a correlation between rows. + * When calling the function, the PARTITION BY clause defines the columns for correlation. The + * framework ensures that all rows belonging to same set are co-located. A PTF instance is able to + * access all rows belonging to the same set. In other words: The virtual processor is scoped under + * a key context. + * + * <h1>Basic Implementation</h1> + * + * <p>The behavior of a {@link ProcessTableFunction} can be defined by implementing a custom + * evaluation method. The evaluation method must be declared publicly, not static, and named <code> + * eval</code>. Overloading is not supported. Review Comment: Overloading does not go well with named parameters. In the end we want to support a syntax like: ``` SELECT * FROM f(name => 's', 'threshold' => 12) ``` Supporting `name == STRING || name == INT` makes the implementation highly complex. We support optional arguments, which should solve most of the use cases. E.g. `f(numbers = [1, 2, 3])` or `f(strings = ['1', '2', '3'])`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org