autophagy opened a new pull request, #26414:
URL: https://github.com/apache/flink/pull/26414

   ## What is the purpose of the change
   
   When creating a table using `TableEnvironment.from_elements`, the Table API 
skips type validation on any Row elements that were created using positional 
arguments, rather than keyword arguments. 
   
   For example, take a table with a single column, whose type is an array of 
Rows. These rows have 2 columns, `a VARCHAR` and `b BOOLEAN`. If we create a 
table with elements where one of these rows has columns with incorrect 
datatypes:
   
   ```python
   schema = DataTypes.ROW(
       [
           DataTypes.FIELD(
               "col",
               DataTypes.ARRAY(
                   DataTypes.ROW(
                       [
                           DataTypes.FIELD("a", DataTypes.STRING()),
                           DataTypes.FIELD("b", DataTypes.BOOLEAN()),
                       ]
                   )
               ),
           ),
       ]
   ) 
   elements = [(
       [("pyflink", True), ("pyflink", False), (True, "pyflink")],
   )] 
   table = self.t_env.from_elements(elements, schema)
   table_result = list(table.execute().collect())
   ```
   
   This results in a type validation error:
   
   ```
   TypeError: field a in element in array field col: VARCHAR can not accept 
object True in type <class 'bool'>
   ```
   
   In an example where we use Row instead of tuples, but with column arguments:
   
   ```
   elements = [(
       [Row(a="pyflink", b=True), Row(a="pyflink", b=False), Row(a=True, 
b="pyflink")],
   )]
   ```
   
   We also get the same type validation error. However, when we use Row with 
positional arguments:
   
   ```
   elements = [(
       [Row("pyflink", True), Row("pyflink", False), Row(True, "pyflink")],
   )]
   ```
   
   the type validation is skipped, leading to an unpickling error when 
collecting:
   
   ```
   >           data = pickle.loads(data)
   E           EOFError: Ran out of input 
   ```
   
   The type validator skips this by stating that [the order in the row could be 
different to the order of the datatype 
fields](https://github.com/apache/flink/blob/master/flink-python/pyflink/table/types.py#L2156),
 but I don't think this is true. Both rows made from tuples and lists are type 
verified positionally with the positions of the Datatype fields, and in the 
case of the `Row` class the order the row's internal values are preserved. 
Similarly, `Row` class equality in cases where both of the rows are created 
with positional arguments 
   
   
   ## Brief change log
   
     - *Change the type validation logic used by 
`TableEnvironment.from_elements` so that `Row`s constructed with positional 
arguments are not skipped.*
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   
     - *Added a test to ensure consistent type validation behaviour with rows 
constructed from tuples, lists, `Row`s with keyword arguments and `Row`s with 
positional arguments*
     - 
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
     - The serializers: (no)
     - The runtime per-record code paths (performance sensitive): (no)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
     - The S3 file system connector: (no)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (no)
     - If yes, how is the feature documented? (not applicable)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to