Re: [SPARK-48463] Mllib Feature transformer failing with nested dataset (Dot notation)

Chhavi Bansal Sat, 08 Jun 2024 04:05:48 -0700

Hi Someshwar,
Thanks for the response, I have added my comments to the ticket
<https://issues.apache.org/jira/browse/SPARK-48463>.



Thanks,
Chhavi Bansal

On Thu, 6 Jun 2024 at 17:28, Someshwar Kale <skale1...@gmail.com> wrote:

> As a fix, you may consider adding a transformer to rename columns (perhaps
> replace all columns with dot to underscore) and use the renamed columns in
> your pipeline as below-
>
> val renameColumn = new 
> RenameColumn().setInputCol("location.longitude").setOutputCol("location_longitude")
> val si = new 
> StringIndexer().setInputCol("location_longitude").setOutputCol("longitutdee")
> val pipeline = new Pipeline().setStages(Array(renameColumn, si))
> pipeline.fit(flattenedDf).transform(flattenedDf).show()
>
>
> refer my comment
> <https://issues.apache.org/jira/browse/SPARK-48463?focusedCommentId=17852751&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17852751>
>  for
> elaboration.
> Thanks!!
>
> *Regards,*
> *Someshwar Kale*
>
>
>
>
>
> On Thu, Jun 6, 2024 at 3:24 AM Chhavi Bansal <meetchhavi1...@gmail.com>
> wrote:
>
>> Hello team
>> I was exploring feature transformation exposed via Mllib on nested
>> dataset, and encountered an error while applying any transformer to a
>> column with dot notation naming. I thought of raising a ticket on spark
>> https://issues.apache.org/jira/browse/SPARK-48463, where I have
>> mentioned the entire scenario.
>>
>> I wanted to get suggestions on what would be the best way to solve the
>> problem while using the dot notation. One workaround is to use`_` while
>> flattening the dataframe, but that would mean having an additional overhead
>> to convert back to `.` (dot notation ) since that’s the convention for our
>> other flattened data.
>>
>> I would be happy to make a contribution to the code if someone can shed
>> some light on how this could be solved.
>>
>>
>>
>> --
>> Thanks and Regards,
>> Chhavi Bansal
>>
>

-- 
Thanks and Regards,
Chhavi Bansal

Re: [SPARK-48463] Mllib Feature transformer failing with nested dataset (Dot notation)

Reply via email to