Hi Rong, Thanks for your explanation. What I'm wondering when implementing this interface is that, will `projectFields` be called twice in a row? (something like `source.projectFields().projectFields()`) For example if user writes "select c, d from (select a, c, d from my_table)". In this case we might have to consider whether the indices are related to the produced type or to the table schema.
Thanks. Rong Rong <walter...@gmail.com> 于2019年7月13日周六 上午1:23写道: > Hi Caizhi, > > from my understanding, the "ProjectableTableSource" interface is used for > something like predicator push-down scenarios: > where your produced output should be the same as how your SELECT statement > requires. > > For example, in the case of: > SourceSchema: {a: Int, b: Double, c: String, d: Long} > SQL: "select c, d from my_table" > > If implemented ProjectableTableSource, Flink will invoke the projectFields > method to create another TableSource that does not > return the full schema, but only the SelectedFields. (this is particularly > useful for columnar store formats like Parquet where only a subset of files > gets read into Flink). > > So, in short, when Flink invokes the override "projectFields" for your > table source. it wouldn't be passing in the argument as {1,2}, but will be > {2,3} - e.g. field "c" and "d". > > -------- > This also brings to an interesting question: in your example, you mentioned > that the table source is already having a produce type as {a: Int, c: > String, d: Long}. I am assuming you are asking this when considering > something similar to: if two SQLs are acting on the source, "SELECT a, c > ..." and "SELECT c, d ..." > > I am not 100% sure since it has been sometime since I look at the code, but > my understanding is: > the projectFields will be invoked twice. and generate 2 new instances of > the table source, with the same table schema > but one with produced type {a: Int, c: String} and one with {c: String, d: > Long}. So, there will not be a table source with {a: Int, c: String, d: > Long}. > > Thanks, > Rong > > > > On Thu, Jul 11, 2019 at 9:53 PM Caizhi Weng <tsreape...@gmail.com> wrote: > > > Hi Flink developers, > > > > When implementing `JDBCTableSource` with `ProjectableTableSource` > interface > > I'm confused by the `projectFields` method. > > > > The java doc of the `projectFields` states that (It also has a typo... > > poduced -> produced): > > > > > Creates a copy of the {@link TableSource} that projects its output to > the > > > given field indexes. > > > The field indexes relate to the physical poduced data type ({@link > > > TableSource#getProducedDataType()}) and not to the table schema ({@link > > > TableSource#getTableSchema} of the {@link TableSource}. > > > > > > So my understanding of this java doc is that, if the table schema of the > > source is {a: Int, b: Double, c: String, d: Long} and the produced data > > type of the source is {a: Int, c: String, d: Long}. Then if user writes > > "select > > c, d from my_table" then the project field indices should be {1, 2} > instead > > of {2, 3} (because they should be related to the produced type and not to > > the schema). > > > > But the implementation of `CSVTableSource` says otherwise. The field > > indices are related to the schema, not to the produced data type. > > > > I pick the implementation of `CSVTableSource` to implement JDBC table > > source (as `CSVTableSource` obviously passed all the test cases). So > which > > one is correct, my understanding on the java doc or the implementation of > > `CSVTableSource`? > > > > Thanks. > > >