jardel-lima commented on issue #3879: URL: https://github.com/apache/hudi/issues/3879#issuecomment-1007434803
Hi @nsivabalan. [HERE](https://drive.google.com/file/d/1RsesivvlLUZ9dZh7WbaGJJpnqIcWNbso/view?usp=sharing) is the dataset used to replicate this problem. The file is not public, but will give access as soon as you request. Here is the code that initiate the spark session, maybe it will be useful for you: ``` spark = ( SparkSession.builder.appName("Hudi_Data_Processing_Framework") .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .config("spark.sql.hive.convertMetastoreParquet", "false") .config("spark.jars.packages","org.apache.hudi:hudi-spark3-bundle_2.12:0.9.0") .config("spark.executor.memory", "4G") .config("spark.executor.cores","2") .enableHiveSupport() .getOrCreate() ) ``` Here is the code used to load this dataset: ``` df = spark.read.load('<<dataset_path>>', encoding='utf-8', format='com.databricks.spark.csv', header=True, delimiter=';', inferSchema=True) ``` Sorry for the daley. I hope it can help you identify the problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org