Hi Rico, using SQL saves a lot of time, effort, and budget over the long term. But I guess that there are certain joys in solving self induced complexities.
Thanks for sharing your findings. Regards, Gourav Sengupta On Fri, Feb 18, 2022 at 7:26 AM Rico Bergmann <i...@ricobergmann.de> wrote: > I found the reason why it did not work: > > When returning the Spark data type I was calling new StringType(). When > changing it to DataTypes.StringType it worked. > > Greets, > Rico. > > Am 17.02.2022 um 14:13 schrieb Gourav Sengupta <gourav.sengu...@gmail.com > >: > > > Hi, > > can you please post a screen shot of the exact CAST statement that you are > using? Did you use the SQL method mentioned by me earlier? > > Regards, > Gourav Sengupta > > On Thu, Feb 17, 2022 at 12:17 PM Rico Bergmann <i...@ricobergmann.de> > wrote: > >> hi! >> >> Casting another int column that is not a partition column fails with the >> same error. >> >> The Schema before the cast (column names are anonymized): >> >> root >> >> |-- valueObject: struct (nullable = true) >> >> | |-- value1: string (nullable = true) >> >> | |-- value2: string (nullable = true) >> >> | |-- value3: timestamp (nullable = true) >> >> | |-- value4: string (nullable = true) >> >> |-- partitionColumn2: string (nullable = true) >> >> |-- partitionColumn3: timestamp (nullable = true) >> >> |-- partitionColumn1: integer (nullable = true) >> >> >> I wanted to cast partitionColumn1 to String which gives me the described >> error. >> >> >> Best, >> >> Rico >> >> >> >> Am 17.02.2022 um 09:56 schrieb ayan guha <guha.a...@gmail.com>: >> >> >> Can you try to cast any other Int field which is NOT a partition column? >> >> On Thu, 17 Feb 2022 at 7:34 pm, Gourav Sengupta < >> gourav.sengu...@gmail.com> wrote: >> >>> Hi, >>> >>> This appears interesting, casting INT to STRING has never been an issue >>> for me. >>> >>> Can you just help us with the output of : df.printSchema() ? >>> >>> I prefer to use SQL, and the method I use for casting is: CAST(<<column >>> name>> AS STRING) <<alias>>. >>> >>> Regards, >>> Gourav >>> >>> >>> >>> >>> >>> >>> On Thu, Feb 17, 2022 at 6:02 AM Rico Bergmann <i...@ricobergmann.de> >>> wrote: >>> >>>> Here is the code snippet: >>>> >>>> var df = session.read().parquet(basepath); >>>> for(Column partition : partitionColumnsList){ >>>> df = df.withColumn(partition.getName(), >>>> df.col(partition.getName()).cast(partition.getType())); >>>> } >>>> >>>> Column is a class containing Schema Information, like for example the >>>> name of the column and the data type of the column. >>>> >>>> Best, Rico. >>>> >>>> > Am 17.02.2022 um 03:17 schrieb Morven Huang <morven.hu...@gmail.com>: >>>> > >>>> > Hi Rico, you have any code snippet? I have no problem casting int to >>>> string. >>>> > >>>> >> 2022年2月17日 上午12:26,Rico Bergmann <i...@ricobergmann.de> 写道: >>>> >> >>>> >> Hi! >>>> >> >>>> >> I am reading a partitioned dataFrame into spark using automatic type >>>> inference for the partition columns. For one partition column the data >>>> contains an integer, therefor Spark uses IntegerType for this column. In >>>> general this is supposed to be a StringType column. So I tried to cast this >>>> column to StringType. But this fails with AnalysisException “cannot cast >>>> int to string”. >>>> >> >>>> >> Is this a bug? Or is it really not allowed to cast an int to a >>>> string? >>>> >> >>>> >> I’m using Spark 3.1.1 >>>> >> >>>> >> Best regards >>>> >> >>>> >> Rico. >>>> >> >>>> >> --------------------------------------------------------------------- >>>> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>> >> >>>> > >>>> > >>>> > --------------------------------------------------------------------- >>>> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>> > >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>> >>>> -- >> Best Regards, >> Ayan Guha >> >>