AnishMahto commented on code in PR #50875:
URL: https://github.com/apache/spark/pull/50875#discussion_r2096465030


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -2105,6 +2105,12 @@ class Analyzer(override val catalogManager: 
CatalogManager) extends RuleExecutor
                   catalog, "table-valued functions")
               }
             }
+            if (u.isStreaming && !resolvedFunc.isStreaming) {

Review Comment:
   These are great questions.
   
   First of all, the intention of the STREAM keyword is to convert a batch 
relation into a streaming relation, with the primary use case being that it can 
be used to populate downstream streaming tables in pipelines using a batch 
source. If a relation is _already_ streaming, the STREAM keyword should just be 
no-op.
   
   Second, I _think_ the TVF is responsible for defining whether it's streaming 
or not on its own. The `TableFunctionRegistry` trait at least is a registry of 
function names -> `LogicalPlan`, and the `LogicalPlan` backing the TVF can 
define whether its streaming or not (ex. `Range` is [not 
streaming](https://github.com/apache/spark/blob/6ad0bd448b6e9d65cdc3bea9145e8c0bea8a4454/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala#L998)).
 And it looks like the resolution of a TVF is simply replacing it with the 
corresponding LogicalPlan mapped to that TVF in the registry, which then gets 
analyzed.
   
   With my changes as-is, for TVFs I'm not converting the LogicalPlan (that 
backs the TVF, as per the function registry) to be streaming - I am just 
validating that when STREAM is used on a TVF, the resolved LogicalPlan for the 
TVF is indeed streaming - but I don't even know if any default/builtin 
functions in the registry are defined by a streaming logical plan anyway.
   
   When I think about it, I don't actually think the pipelines module will need 
the ability to convert TVFs to streaming in the first place. In theory users 
can always define a table that is populated by a batch TVF, and then STREAM 
from that table. 
   
   Hence I decided to just remove `STREAM <tvf>` support for now, we can 
revisit adding it back later if a concrete use-case comes up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to