TableFunction is powerful: its input can be a table. So within the function
you can partition the table and apply computation over partitions. The only
question is what output do you expect. If you expect per partition result
(so TableFunction will return a table in this case), then you will likely
need a column for partition id in the output table.


-Rui

On Thu, Mar 4, 2021 at 12:40 PM Debajyoti Roy <[email protected]> wrote:

> Yes there is definitely some similarity to groupby
>
> What is this used for:
>
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/rel/logical/LogicalTableFunctionScan.html
> Can i model a mapPartitions T -> U as ^ ?
>
> On Thu, Mar 4, 2021 at 12:06 PM Rui Wang <[email protected]> wrote:
>
> > I feel like the mapPartitions can be implemented as a SELECT + GROUP BY,
> > where GROUP BY is to partition the data, then per partition computation
> is
> > handled by the SELECT.
> >
> > -Rui
> >
> > On Thu, Mar 4, 2021 at 11:56 AM Debajyoti Roy <[email protected]>
> wrote:
> >
> > > Thanks again Julian.
> > >
> > > Since, mapPartitions is really a specialized map would it be best to
> > model
> > > it as a SELECT (similar to functions inside an expression) ?
> > > Barring cases where h > h' and mapPartitions acts like a filter.
> > >
> > > On Thu, Mar 4, 2021 at 11:41 AM Julian Hyde <[email protected]>
> > > wrote:
> > >
> > > > SQL has equivalents of many functional programming idioms:
> > > >
> > > >   * map is SELECT
> > > >   * filter is WHERE
> > > >   * flatMap is similar to CROSS APPLY
> > > >
> > > > That said, SQL’s strength is that the operations are not optimized
> for
> > > any
> > > > particular physical organization of data (e.g. working on sorted or
> > > > partitioned data). mapPartitions is in this category. Of course a
> > > physical
> > > > implementation of one of SQL’s logical operators might use
> > mapPartitions.
> > > >
> > > > Julian
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > > On Mar 4, 2021, at 10:44 AM, Debajyoti Roy <[email protected]>
> > > wrote:
> > > > >
> > > > > Thanks for the responses, adding some more color below.
> > > > >
> > > > > Spark's API adopted concepts from the functional programming
> paradigm
> > > > (map,
> > > > > filter, flatmap,...) into data processing. Spark did add several
> > > > relational
> > > > > operators like join, union, select, etc. However, there are certain
> > > APIs
> > > > > that are really hard to model in terms of standard relational
> > > operators.
> > > > > Let me take one example of mapPartitions.
> > > > >
> > > > > mapPartitions( T -> U ):
> > > > > w columns and h rows can turn into totally different w' != w
> columns
> > > and
> > > > h'
> > > > > != h rows. Since processing happens per partition, this API is a
> > great
> > > > > choice for vectorized heavyweight initialization cost operations
> e.g.
> > > > batch
> > > > > inferencing.
> > > > >
> > > > > In terms of relational models, mapPartitions can be modeled just
> > like a
> > > > > function inside an expression operator. However, there can be
> > > interesting
> > > > > cases e.g. h > h' and mapMartitions starts to feel like a filter.
> Can
> > > > there
> > > > > be other challenges and opportunities in terms of planner and
> > optimizer
> > > > > because mapPartitions is definitely NOT like any other function
> > inside
> > > an
> > > > > expression as shown below:
> > > > >
> > > > > SELECT name, address, mapPartitions(id, tweet, '{threshold: 0.5}',
> > > > > 'sentiment_analysis_4', 10000) FROM my_twitter_data...
> > > > >
> > > > > So what is a better usage example for mapPartitions expressed as
> SQL
> > ?
> > > I
> > > > am
> > > > > really struggling with that part and I agree with Julian that is
> the
> > > key.
> > > > >
> > > > > Regards,
> > > > > Debajyoti Roy
> > > > >
> > > > > On Thu, Mar 4, 2021 at 12:01 AM Julian Hyde <
> [email protected]>
> > > > wrote:
> > > > >
> > > > >> I searched for mapPartitions and flatMapGroupsWithState, and it
> > looks
> > > as
> > > > >> if you are talking about Apache Spark operations. Can you give
> some
> > > > >> examples of typical queries that would use these operations?
> > > > >>
> > > > >> It’s possible that these operations accomplish things that are not
> > > > >> possible in the relational model; but I think it’s more likely
> that
> > > > these
> > > > >> are algorithms that can implement relational operations such as
> > > windowed
> > > > >> aggregate functions. If you give some examples, we can see whether
> > > they
> > > > can
> > > > >> be expressed in SQL or relational algebra.
> > > > >>
> > > > >> Julian
> > > > >>
> > > > >>
> > > > >>> On Mar 3, 2021, at 10:54 PM, Rui Wang <[email protected]>
> > wrote:
> > > > >>>
> > > > >>> Well I think the expected approach is to translate other
> operations
> > > to
> > > > >>> relational operators by yourself ;-)
> > > > >>>
> > > > >>> I think Calcite won't be the place to have extensions for such
> > > > >> translation.
> > > > >>> The main concern is that those non relational operations are
> > > > >> "non-standard".
> > > > >>>
> > > > >>> -Rui
> > > > >>>
> > > > >>> On Wed, Mar 3, 2021 at 10:12 PM Debajyoti Roy <
> [email protected]
> > >
> > > > >> wrote:
> > > > >>>
> > > > >>>> Hi All,
> > > > >>>>
> > > > >>>> For operators like filter, join, union, aggregate, window the
> > > > >>>> logical RelNode choices are obvious within
> > > > >> org.apache.calcite.rel.logical
> > > > >>>> package.
> > > > >>>>
> > > > >>>> However, for more complex applications that require operations
> > like
> > > > >>>> mapPartitions, flatMapGroupsWithState, etc. what would be some
> > > choices
> > > > >> for
> > > > >>>> logical rel node and relational expression classes in Apache
> > Calcite
> > > > >>>> (independent of engine)?
> > > > >>>>
> > > > >>>> What is a good way to model operators that are not traditionally
> > > > >> relational
> > > > >>>> and hence do not exist in Calcite (looking for hooks / extension
> > > > points
> > > > >> /
> > > > >>>> roadmaps)?
> > > > >>>>
> > > > >>>> Thanks in advance for any pointers, (disclaimer: I am new to
> > > Calcite)
> > > > >>>> Debajyoti Roy
> > > > >>>>
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> >
>

Reply via email to