Re: Confused by the `projectFields` Method in `ProjectableTableSource` Interface

Caizhi Weng Fri, 12 Jul 2019 20:34:08 -0700

Hi Rong,

Thanks for your explanation. What I'm wondering when implementing this
interface is that, will `projectFields` be called twice in a row?
(something like `source.projectFields().projectFields()`) For example if
user writes "select c, d from (select a, c, d from my_table)". In this case
we might have to consider whether the indices are related to the produced
type or to the table schema.


Thanks.

Rong Rong <walter...@gmail.com> 于2019年7月13日周六 上午1:23写道：

> Hi Caizhi,
>
> from my understanding, the "ProjectableTableSource" interface is used for
> something like predicator push-down scenarios:
> where your produced output should be the same as how your SELECT statement
> requires.
>
> For example, in the case of:
> SourceSchema: {a: Int, b: Double, c: String, d: Long}
> SQL: "select c, d from my_table"
>
> If implemented ProjectableTableSource, Flink will invoke the projectFields
> method to create another TableSource that does not
> return the full schema, but only the SelectedFields. (this is particularly
> useful for columnar store formats like Parquet where only a subset of files
> gets read into Flink).
>
> So, in short, when Flink invokes the override "projectFields" for your
> table source. it wouldn't be passing in the argument as {1,2}, but will be
> {2,3} - e.g. field "c" and "d".
>
> --------
> This also brings to an interesting question: in your example, you mentioned
> that the table source is already having a produce type as {a: Int, c:
> String, d: Long}. I am assuming you are asking this when considering
> something similar to: if two SQLs are acting on the source, "SELECT a, c
> ..." and "SELECT c, d ..."
>
> I am not 100% sure since it has been sometime since I look at the code, but
> my understanding is:
> the projectFields will be invoked twice. and generate 2 new instances of
> the table source, with the same table schema
> but one with produced type {a: Int, c: String} and one with {c: String, d:
> Long}. So, there will not be a table source with {a: Int, c: String, d:
> Long}.
>
> Thanks,
> Rong
>
>
>
> On Thu, Jul 11, 2019 at 9:53 PM Caizhi Weng <tsreape...@gmail.com> wrote:
>
> > Hi Flink developers,
> >
> > When implementing `JDBCTableSource` with `ProjectableTableSource`
> interface
> > I'm confused by the `projectFields` method.
> >
> > The java doc of the `projectFields` states that (It also has a typo...
> > poduced -> produced):
> >
> > > Creates a copy of the {@link TableSource} that projects its output to
> the
> > > given field indexes.
> > > The field indexes relate to the physical poduced data type ({@link
> > > TableSource#getProducedDataType()}) and not to the table schema ({@link
> > > TableSource#getTableSchema} of the {@link TableSource}.
> >
> >
> > So my understanding of this java doc is that, if the table schema of the
> > source is {a: Int, b: Double, c: String, d: Long} and the produced data
> > type of the source is {a: Int, c: String, d: Long}. Then if user writes
> > "select
> > c, d from my_table" then the project field indices should be {1, 2}
> instead
> > of {2, 3} (because they should be related to the produced type and not to
> > the schema).
> >
> > But the implementation of `CSVTableSource` says otherwise. The field
> > indices are related to the schema, not to the produced data type.
> >
> > I pick the implementation of `CSVTableSource` to implement JDBC table
> > source (as `CSVTableSource` obviously passed all the test cases). So
> which
> > one is correct, my understanding on the java doc or the implementation of
> > `CSVTableSource`?
> >
> > Thanks.
> >
>

Re: Confused by the `projectFields` Method in `ProjectableTableSource` Interface

Reply via email to