As a fix, you may consider adding a transformer to rename columns (perhaps replace all columns with dot to underscore) and use the renamed columns in your pipeline as below-
val renameColumn = new RenameColumn().setInputCol("location.longitude").setOutputCol("location_longitude") val si = new StringIndexer().setInputCol("location_longitude").setOutputCol("longitutdee") val pipeline = new Pipeline().setStages(Array(renameColumn, si)) pipeline.fit(flattenedDf).transform(flattenedDf).show() refer my comment <https://issues.apache.org/jira/browse/SPARK-48463?focusedCommentId=17852751&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17852751> for elaboration. Thanks!! *Regards,* *Someshwar Kale* On Thu, Jun 6, 2024 at 3:24 AM Chhavi Bansal <meetchhavi1...@gmail.com> wrote: > Hello team > I was exploring feature transformation exposed via Mllib on nested > dataset, and encountered an error while applying any transformer to a > column with dot notation naming. I thought of raising a ticket on spark > https://issues.apache.org/jira/browse/SPARK-48463, where I have mentioned > the entire scenario. > > I wanted to get suggestions on what would be the best way to solve the > problem while using the dot notation. One workaround is to use`_` while > flattening the dataframe, but that would mean having an additional overhead > to convert back to `.` (dot notation ) since that’s the convention for our > other flattened data. > > I would be happy to make a contribution to the code if someone can shed > some light on how this could be solved. > > > > -- > Thanks and Regards, > Chhavi Bansal >