Re: [PR] [SPARK-51799] Support user-specified schema in `DataFrameReader` [spark-connect-swift]

via GitHub Mon, 14 Apr 2025 20:34:26 -0700


dongjoon-hyun commented on code in PR #58:
URL: 
https://github.com/apache/spark-connect-swift/pull/58#discussion_r2043483481



##########
Tests/SparkConnectTests/DataFrameReaderTests.swift:
##########
@@ -85,4 +85,27 @@ struct DataFrameReaderTests {
     })
     await spark.stop()
   }
+
+  @Test
+  func schema() async throws {
+    let spark = try await SparkSession.builder.getOrCreate()
+    let path = "../examples/src/main/resources/people.json"
+    #expect(try await spark.read.schema("age SHORT").json(path).dtypes.count 
== 1)
+    #expect(try await spark.read.schema("age SHORT").json(path).dtypes[0] == 
("age", "smallint"))
+    #expect(try await spark.read.schema("age SHORT, name 
STRING").json(path).dtypes[0] == ("age", "smallint"))
+    #expect(try await spark.read.schema("age SHORT, name 
STRING").json(path).dtypes[1] == ("name", "string"))

Review Comment:
   I thought it's supported
   
   But, according to the Apache Spark 4.0.0 RC4, it seems there are limitations.
   **spark-shell**
   ```
   $ bin/spark-shell
   WARNING: Using incubator modules: jdk.incubator.vector
   Using Spark's default log4j profile: 
org/apache/spark/log4j2-defaults.properties
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /___/ .__/\_,_/_/ /_/\_\   version 4.0.0
         /_/
   
   Using Scala version 2.13.16 (OpenJDK 64-Bit Server VM, Java 17.0.14)
   Type in expressions to have them evaluated.
   Type :help for more information.
   25/04/15 12:32:47 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   Spark context Web UI available at http://localhost:4040
   Spark context available as 'sc' (master = local[*], app id = 
local-1744687967546).
   Spark session available as 'spark'.
   
   scala> spark.read.schema("name STRING NOT 
NULL").json("examples/src/main/resources/people.json").printSchema
   warning: 1 deprecation (since 2.13.3); for details, enable `:setting 
-deprecation` or `:replay -deprecation`
   root
    |-- name: string (nullable = true)
   ```
   
   **spark-connect-shell**
   ```
   $ bin/spark-connect-shell --remote sc://localhost:15002
   25/04/15 12:28:48 INFO DefaultAllocationManagerOption: allocation manager 
type not specified, using netty as the default type
   25/04/15 12:28:48 INFO CheckAllocator: Using DefaultAllocationManager at 
memory/netty/DefaultAllocationManagerFactory.class
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /___/ .__/\_,_/_/ /_/\_\   version 4.0.0
         /_/
   
   Type in expressions to have them evaluated.
   Spark connect server version 4.0.0.
   Spark session available as 'spark'.
   
   scala> spark.read.schema("name 
STRING").json("../examples/src/main/resources/people.json").printSchema
   root
    |-- name: string (nullable = true)
   
   scala> spark.read.schema("name STRING NOT 
NULL").json("../examples/src/main/resources/people.json").printSchema
   root
    |-- name: string (nullable = true)
   
   scala> spark.read.schema("name STRING NOT 
NULL").json("../examples/src/main/resources/people.json").show()
   +-------+
   |   name|
   +-------+
   |Michael|
   |   Andy|
   | Justin|
   +-------+
   ```
   
   For that part, let me dig more, @yaooqinn .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51799] Support user-specified schema in `DataFrameReader` [spark-connect-swift]

Reply via email to