jardel-lima commented on issue #3879:
URL: https://github.com/apache/hudi/issues/3879#issuecomment-1007434803


   Hi @nsivabalan. 
   
[HERE](https://drive.google.com/file/d/1RsesivvlLUZ9dZh7WbaGJJpnqIcWNbso/view?usp=sharing)
 is the dataset used to replicate this problem. The file is not public, but 
will give access as soon as you request.
   
   Here is the code that initiate the spark session, maybe it will be useful 
for you:
   ```
   spark = (
       SparkSession.builder.appName("Hudi_Data_Processing_Framework")
       .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
       .config("spark.sql.hive.convertMetastoreParquet", "false")
       
.config("spark.jars.packages","org.apache.hudi:hudi-spark3-bundle_2.12:0.9.0")
       .config("spark.executor.memory", "4G")
       .config("spark.executor.cores","2")
       .enableHiveSupport()
       .getOrCreate()
   )
   ```
   
   Here is the code used to load this dataset:
   ```
   df = spark.read.load('<<dataset_path>>',
                          encoding='utf-8',
                          format='com.databricks.spark.csv',
                          header=True,
                          delimiter=';',
                          inferSchema=True)
   ```
   Sorry for the daley. I hope it can help you identify the problem. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to