Hi Mark,

Added a comment to Jira to provide more clarity to Description

When encountering mixed schema rows, the current error message "{actual} is
not a valid external type for schema of {expected}" lacks sufficient detail
to identify the problematic column. This ambiguity hinders troubleshooting
and increases development time.

To enhance error clarity, we propose incorporating the source column name
into the error message. For example: "Column 'my_column' has an actual type
of {actual} which is not a valid external type for the expected schema of
{expected}."

By providing this additional context, developers can more efficiently
pinpoint and resolve schema mismatches.


HTH

Mich Talebzadeh,

Architect | Data Engineer | Data Science | Financial Crime
PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College
London <https://en.wikipedia.org/wiki/Imperial_College_London>
London, United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".


On Tue, 20 Aug 2024 at 21:59, Mark Andreev <mark.andr...@gmail.com> wrote:

> Hi,
>
> Could you review my small PR [SPARK-49044][SQL] ValidateExternalType
> should return a child in error (
> https://github.com/apache/spark/pull/47522 )?  Changes contain tests that
> verify results.
>
> TLDR: After fix error message will contain extra information: [B is not a
> valid external type for schema of string at
> getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row,
> true]), 1, f3)
> If you need more information, please let me know. If you're busy, please
> let me know the best time to reach you again.
>
> On Mon, 29 Jul 2024 at 18:15, Mark Andreev <mark.andr...@gmail.com> wrote:
>
>> Hi Spark Devs,
>>
>> Please review my PR [ https://github.com/apache/spark/pull/47522 ] that
>> relates to ticket [ https://issues.apache.org/jira/browse/SPARK-49044 ].
>>
>> Context: When we have mixed schema rows, the error message "{actual} is
>> not a valid external type for schema of {expected}" doesn't help to
>> understand the column with the problem. I suggest adding information about
>> the source column.
>>
>> Example:
>> https://github.com/mrk-andreev/example-spark-schema/blob/main/spark_4.0.0/src/test/scala/ErrorMsgSuite.scala
>>
>> Before fix: [B is not a valid external type for schema of string
>> After fix: [B is not a valid external type for schema of string at
>> getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row,
>> true]), 1, f3)
>>
>> --
>> Best regards,
>> Mark Andreev
>>
>
>
> --
> Best regards,
> Mark Andreev
>

Reply via email to