Hi, I have teradata table who has more than 2.5 billion records and data size is around 600 GB. I am not able to pull efficiently using spark SQL and it is been running for more than 11 hours. here is my code.
val df2 = sparkSession.read.format("jdbc") .option("url", "jdbc:teradata://PROD/DATABASE=XXXX101") .option("user", "HDFS_TD") .option("password", "CCCCC") .option("dbtable", "XXXX") .option("numPartitions", partitions) .option("driver","com.teradata.jdbc.TeraDriver") .option("partitionColumn", "itm_bloon_seq_no") .option("lowerBound", config.getInt("lowerBound")) .option("upperBound", config.getInt("upperBound")) Lower bound is 0 and upperbound is 300. Spark is using multiple executors but most of the executors are running fast and few executors are taking more time may be due to shuffling or something else. I also tried repartition on column but no luck. is there a better way to load this fast? Table in teradata is view but not the table. Thanks, Asmath