Sorry, last mail format was not good. 

 


println("Going to talk to mySql")

// Read table from mySQL.
val mysqlDF = spark.read.jdbc(jdbcUrl, table, properties)
println("I am back from mySql")

mysqlDF.show()

// Create a new Dataframe with column 'id' increased to avoid Duplicate
primary keys
val newDF = mysqlDF.select((col("id") + 10).as("id"), col("country"),
col("city"))
newDF.printSchema()
newDF.show()

// Insert records into the table.
newDF.write
  .mode(SaveMode.Append)
  .jdbc(jdbcUrl, table, properties)

// Write to Hive - This Creates a new table.
newDF.write.saveAsTable("cities")
newDF.show()

 

 

 

Going to talk to mySql

I am back from mySql

+---+--------------+---------+

| id|       country|     city|

+---+--------------+---------+

|  1|           USA|Palo Alto|

|  2|Czech Republic|     Brno|

|  3|           USA|Sunnyvale|

|  4|          null|     null|

+---+--------------+---------+

 

root

|-- id: long (nullable = false)

|-- country: string (nullable = true)

|-- city: string (nullable = true)

 

+---+--------------+---------+

| id|       country|     city|

+---+--------------+---------+

| 11|           USA|Palo Alto|

| 12|Czech Republic|     Brno|

| 13|           USA|Sunnyvale|

| 14|          null|     null|

+---+--------------+---------+

 

+---+--------------+---------+

| id|       country|     city|

+---+--------------+---------+

| 11|           USA|Palo Alto|

| 12|Czech Republic|     Brno|

| 13|           USA|Sunnyvale|

| 14|          null|     null|

| 24|          null|     null|

| 23|           USA|Sunnyvale|

| 22|Czech Republic|     Brno|

| 21|           USA|Palo Alto|

+---+--------------+---------+

 

Thanks,

Ravi

 

From: ryanda...@gmail.com <ryanda...@gmail.com> 
Sent: Wednesday, August 29, 2018 8:19 PM
To: user@spark.apache.org
Subject: Spark code to write to MySQL and Hive

 

Hi,

 

Can anyone help me to understand what is happening with my code ?

 

I wrote a Spark application to read from a MySQL table [that already has 4
records], Create a new DF by adding 10 to the ID field.  Then, I wanted to
write the new DF to MySQL as well as to Hive. 

 

I am surprised to see additional set of records in Hive !! I am not able to
understand how the newDF has records with IDs 21 to 24.  I know that a DF is
immutable. If so, how come it has 4 records at one point and 8 records at
later point ?

 


// Read table from mySQL.
val mysqlDF = spark.read.jdbc(jdbcUrl, table, properties)
println("I am back from mySql")



 

 

 

 

mysqlDF.show()



 

 

 

 

 

// Create a new Dataframe with column 'id' increased to avoid Duplicate
primary keys
val newDF = mysqlDF.select((col("id") + 10).as("id"), col("country"),
col("city"))
newDF.printSchema()
newDF.show()



 

 

// Insert records into the MySQL table.
newDF.write
  .mode(SaveMode.Append)
  .jdbc(jdbcUrl, table, properties)



 

// Write to Hive - This Creates a new table.
newDF.write.saveAsTable("cities")
newDF.show()

 

 

Records already existing in mySql

 

+---+--------------+---------+

| id|       country|     city|

+---+--------------+---------+

|  1|           USA|Palo Alto|

|  2|Czech Republic|     Brno|

|  3|           USA|Sunnyvale|

|  4|          null|     null|

+---+--------------+---------+

 

root

|-- id: long (nullable = false)

|-- country: string (nullable = true)

|-- city: string (nullable = true)

 

newDF.show()

 

+---+--------------+---------+

| id|       country|     city|

+---+--------------+---------+

| 11|           USA|Palo Alto|

| 12|Czech Republic|     Brno|

| 13|           USA|Sunnyvale|

| 14|          null|     null|

+---+--------------+---------+

 

+---+--------------+---------+

| id|       country|     city|

+---+--------------+---------+

| 11|           USA|Palo Alto|

| 12|Czech Republic|     Brno|

| 13|           USA|Sunnyvale|

| 14|          null|     null|

| 24|          null|     null|

| 23|           USA|Sunnyvale|

| 22|Czech Republic|     Brno|

| 21|           USA|Palo Alto|

+---+--------------+---------+

 

 

Thanks for you time. 

Ravi

Reply via email to