[GitHub] [hudi] jardel-lima commented on issue #3879: [SUPPORT] Incomplete Table Migration

GitBox Sat, 08 Jan 2022 02:23:19 -0800


jardel-lima commented on issue #3879:
URL: https://github.com/apache/hudi/issues/3879#issuecomment-1007434803



   Hi @nsivabalan. 
   
[HERE](https://drive.google.com/file/d/1RsesivvlLUZ9dZh7WbaGJJpnqIcWNbso/view?usp=sharing)
 is the dataset used to replicate this problem. The file is not public, but 
will give access as soon as you request.
   
   Here is the code that initiate the spark session, maybe it will be useful 
for you:
   ```
   spark = (
       SparkSession.builder.appName("Hudi_Data_Processing_Framework")
       .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
       .config("spark.sql.hive.convertMetastoreParquet", "false")
       
.config("spark.jars.packages","org.apache.hudi:hudi-spark3-bundle_2.12:0.9.0")
       .config("spark.executor.memory", "4G")
       .config("spark.executor.cores","2")
       .enableHiveSupport()
       .getOrCreate()
   )
   ```
   
   Here is the code used to load this dataset:
   ```
   df = spark.read.load('<<dataset_path>>',
                          encoding='utf-8',
                          format='com.databricks.spark.csv',
                          header=True,
                          delimiter=';',
                          inferSchema=True)
   ```
   Sorry for the daley. I hope it can help you identify the problem. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] jardel-lima commented on issue #3879: [SUPPORT] Incomplete Table Migration

Reply via email to