Hi,

I have teradata table who has more than 2.5 billion records and data size
is around 600 GB. I am not able to pull efficiently using spark SQL and it
is been running for more than 11 hours. here is my code.

      val df2 = sparkSession.read.format("jdbc")
.option("url", "jdbc:teradata://PROD/DATABASE=XXXX101")
.option("user", "HDFS_TD")
.option("password", "CCCCC")
.option("dbtable", "XXXX")
.option("numPartitions", partitions)
.option("driver","com.teradata.jdbc.TeraDriver")
.option("partitionColumn", "itm_bloon_seq_no")
.option("lowerBound", config.getInt("lowerBound"))
.option("upperBound", config.getInt("upperBound"))

Lower bound is 0 and upperbound is 300. Spark is using multiple executors
but most of the executors are running fast and few executors are taking
more time may be due to shuffling or something else.

I also tried repartition on column but no luck. is there a better way to
load this fast?

Table in teradata is view but not the table.

Thanks,
Asmath

Reply via email to