Hi,
I am new to SparkSQL.
I want to read the specified columns from the parquet, not all the columns
defined in the parquet file.
For instance, the schema of the parquet file would look like this:
{
"type": "record",
"name": "ElectricPowerUsage",
"namespace": "jcascalog.parquet.example",
"fields": [
{
"name": "addressCode",
"type": [
"null",
"string"
]
},
{
"name": "timestamp",
"type": [
"null",
"long"
]
},
{
"name": "devicePowerEventList",
"type": {
"type": "array",
"items": {
"type": "record",
"name": "DevicePowerEvent",
"fields": [
{
"name": "power",
"type": [
"null",
"double"
]
},
{
"name": "deviceType",
"type": [
"null",
"int"
]
},
{
"name": "deviceId",
"type": [
"null",
"int"
]
},
{
"name": "status",
"type": [
"null",
"int"
]
}
]
}
}
}
]
}
To read just specified columns(addressCode, devicePowerEventList) from this
parquet file, the following schema defines just addressCode,
devicePowerEventList columns:
{
"type": "record",
"name": "ElectricPowerUsage",
"namespace": "jcascalog.parquet.example",
"fields": [
{
"name": "addressCode",
"type": [
"null",
"string"
]
},
{
"name": "devicePowerEventList",
"type": {
"type": "array",
"items": {
"type": "record",
"name": "DevicePowerEvent",
"fields": [
{
"name": "power",
"type": [
"null",
"double"
]
}
]
}
}
}
]
}
I have not yet found from spark docs to handle this.
- Kidong Lee.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-read-just-specified-columns-from-parquet-file-using-SparkSQL-tp15459.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]