dongjoon-hyun commented on code in PR #58: URL: https://github.com/apache/spark-connect-swift/pull/58#discussion_r2043483481
########## Tests/SparkConnectTests/DataFrameReaderTests.swift: ########## @@ -85,4 +85,27 @@ struct DataFrameReaderTests { }) await spark.stop() } + + @Test + func schema() async throws { + let spark = try await SparkSession.builder.getOrCreate() + let path = "../examples/src/main/resources/people.json" + #expect(try await spark.read.schema("age SHORT").json(path).dtypes.count == 1) + #expect(try await spark.read.schema("age SHORT").json(path).dtypes[0] == ("age", "smallint")) + #expect(try await spark.read.schema("age SHORT, name STRING").json(path).dtypes[0] == ("age", "smallint")) + #expect(try await spark.read.schema("age SHORT, name STRING").json(path).dtypes[1] == ("name", "string")) Review Comment: I thought it's supported But, according to the Apache Spark 4.0.0 RC4, it seems there are limitations. **spark-shell** ``` $ bin/spark-shell WARNING: Using incubator modules: jdk.incubator.vector Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 4.0.0 /_/ Using Scala version 2.13.16 (OpenJDK 64-Bit Server VM, Java 17.0.14) Type in expressions to have them evaluated. Type :help for more information. 25/04/15 12:32:47 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1744687967546). Spark session available as 'spark'. scala> spark.read.schema("name STRING NOT NULL").json("examples/src/main/resources/people.json").printSchema warning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation` or `:replay -deprecation` root |-- name: string (nullable = true) ``` **spark-connect-shell** ``` $ bin/spark-connect-shell --remote sc://localhost:15002 25/04/15 12:28:48 INFO DefaultAllocationManagerOption: allocation manager type not specified, using netty as the default type 25/04/15 12:28:48 INFO CheckAllocator: Using DefaultAllocationManager at memory/netty/DefaultAllocationManagerFactory.class Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 4.0.0 /_/ Type in expressions to have them evaluated. Spark connect server version 4.0.0. Spark session available as 'spark'. scala> spark.read.schema("name STRING").json("../examples/src/main/resources/people.json").printSchema root |-- name: string (nullable = true) scala> spark.read.schema("name STRING NOT NULL").json("../examples/src/main/resources/people.json").printSchema root |-- name: string (nullable = true) scala> spark.read.schema("name STRING NOT NULL").json("../examples/src/main/resources/people.json").show() +-------+ | name| +-------+ |Michael| | Andy| | Justin| +-------+ ``` For that part, let me dig more, @yaooqinn . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org