Hi Michael, I would really appreciate it if you could review my PR [ https://github.com/apache/spark/pull/47522 ], as your expertise in the SQL part of Apache Spark is invaluable. Would you mind taking a look at my changes?
On Sun, 25 Aug 2024 at 18:15, Mark Andreev <mark.andr...@gmail.com> wrote: > Thank you Bjørn. > > My PR [ https://github.com/apache/spark/pull/47522 ] was updated to be > aligned with the guideline. > > + What changes were proposed in this pull request? > + Why are the changes needed? > + Does this PR introduce any user-facing change? > + How was this patch tested? > + Was this patch authored or co-authored using generative AI tooling? > > > > On Sun, 25 Aug 2024 at 15:47, Bjørn Jørgensen <bjornjorgen...@gmail.com> > wrote: > >> Apache spark does have a template for PR's >> https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE >> >> >> søn. 25. aug. 2024 kl. 13:41 skrev Mich Talebzadeh < >> mich.talebza...@gmail.com>: >> >>> Unfortunately it is not that straight forward >>> >>> >>> 1. Committer Votes: The PR needs a sufficient number of "+1" votes >>> from *committers.* >>> 2. Review Process: Address feedback from the community and >>> committers to ensure the PR meets the necessary standards. >>> 3. Approval: Once approved by committers, the PR can be merged into >>> the main codebase. >>> >>> >>> HTH >>> >>> >>> >>> On Sun, 25 Aug 2024 at 08:17, Mark Andreev <mark.andr...@gmail.com> >>> wrote: >>> >>>> Thank you for your review. >>>> >>>> Could you explain how to merge this commit into the upstream? I don't >>>> want this PR to be abandoned. >>>> >>>> Best regards, >>>> Mark Andreev >>>> >>>> >>>> On Wed, 21 Aug 2024 at 23:08, Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> Hi Mark, >>>>> >>>>> You have already done that and have made the request for review. >>>>> >>>>> +1 for me >>>>> >>>>> Mich Talebzadeh, >>>>> >>>>> Architect | Data Engineer | Data Science | Financial Crime >>>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial >>>>> College London <https://en.wikipedia.org/wiki/Imperial_College_London> >>>>> >>>>> London, United Kingdom >>>>> >>>>> >>>>> view my Linkedin profile >>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>> >>>>> >>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>> >>>>> >>>>> >>>>> *Disclaimer:* The information provided is correct to the best of my >>>>> knowledge but of course cannot be guaranteed . It is essential to note >>>>> that, as with any advice, quote "one test result is worth one-thousand >>>>> expert opinions (Werner >>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>>>> >>>>> >>>>> On Wed, 21 Aug 2024 at 22:20, Mark Andreev <mark.andr...@gmail.com> >>>>> wrote: >>>>> >>>>>> Thank you, Mich. >>>>>> >>>>>> What is the correct procedure to request a review? >>>>>> >>>>>> On Tue, 20 Aug 2024 at 22:57, Mich Talebzadeh < >>>>>> mich.talebza...@gmail.com> wrote: >>>>>> >>>>>>> Hi Mark, >>>>>>> >>>>>>> Added a comment to Jira to provide more clarity to Description >>>>>>> >>>>>>> When encountering mixed schema rows, the current error message >>>>>>> "{actual} is not a valid external type for schema of {expected}" lacks >>>>>>> sufficient detail to identify the problematic column. This ambiguity >>>>>>> hinders troubleshooting and increases development time. >>>>>>> >>>>>>> To enhance error clarity, we propose incorporating the source column >>>>>>> name into the error message. For example: "Column 'my_column' has an >>>>>>> actual >>>>>>> type of {actual} which is not a valid external type for the expected >>>>>>> schema >>>>>>> of {expected}." >>>>>>> >>>>>>> By providing this additional context, developers can more >>>>>>> efficiently pinpoint and resolve schema mismatches. >>>>>>> >>>>>>> >>>>>>> HTH >>>>>>> >>>>>>> Mich Talebzadeh, >>>>>>> >>>>>>> Architect | Data Engineer | Data Science | Financial Crime >>>>>>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial >>>>>>> College London >>>>>>> <https://en.wikipedia.org/wiki/Imperial_College_London> >>>>>>> London, United Kingdom >>>>>>> >>>>>>> >>>>>>> view my Linkedin profile >>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>> >>>>>>> >>>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>>> >>>>>>> >>>>>>> >>>>>>> *Disclaimer:* The information provided is correct to the best of my >>>>>>> knowledge but of course cannot be guaranteed . It is essential to note >>>>>>> that, as with any advice, quote "one test result is worth one-thousand >>>>>>> expert opinions (Werner >>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>>>>>> >>>>>>> >>>>>>> On Tue, 20 Aug 2024 at 21:59, Mark Andreev <mark.andr...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Could you review my small PR [SPARK-49044][SQL] >>>>>>>> ValidateExternalType should return a child in error ( >>>>>>>> https://github.com/apache/spark/pull/47522 )? Changes contain >>>>>>>> tests that verify results. >>>>>>>> >>>>>>>> TLDR: After fix error message will contain extra information: [B >>>>>>>> is not a valid external type for schema of string at >>>>>>>> getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, >>>>>>>> true]), 1, f3) >>>>>>>> If you need more information, please let me know. If you're busy, >>>>>>>> please let me know the best time to reach you again. >>>>>>>> >>>>>>>> On Mon, 29 Jul 2024 at 18:15, Mark Andreev <mark.andr...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Spark Devs, >>>>>>>>> >>>>>>>>> Please review my PR [ https://github.com/apache/spark/pull/47522 >>>>>>>>> ] that relates to ticket [ >>>>>>>>> https://issues.apache.org/jira/browse/SPARK-49044 ]. >>>>>>>>> >>>>>>>>> Context: When we have mixed schema rows, the error message >>>>>>>>> "{actual} is not a valid external type for schema of {expected}" >>>>>>>>> doesn't >>>>>>>>> help to understand the column with the problem. I suggest adding >>>>>>>>> information about the source column. >>>>>>>>> >>>>>>>>> Example: >>>>>>>>> https://github.com/mrk-andreev/example-spark-schema/blob/main/spark_4.0.0/src/test/scala/ErrorMsgSuite.scala >>>>>>>>> >>>>>>>>> Before fix: [B is not a valid external type for schema of string >>>>>>>>> After fix: [B is not a valid external type for schema of string >>>>>>>>> at getexternalrowfield(assertnotnull(input[0, >>>>>>>>> org.apache.spark.sql.Row, >>>>>>>>> true]), 1, f3) >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best regards, >>>>>>>>> Mark Andreev >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best regards, >>>>>>>> Mark Andreev >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Best regards, >>>>>> Mark Andreev >>>>>> >>>>> >> >> -- >> Bjørn Jørgensen >> Vestre Aspehaug 4, 6010 Ålesund >> Norge >> >> +47 480 94 297 >> > > > -- > Best regards, > Mark Andreev > -- Best regards, Mark Andreev