Hi Team,
Say I have a test.json file: {"c1":[1,2,3]}
I can create a parquet file like :
var df = sqlContext.load("/tmp/test.json","json")
var df_c = df.repartition(1)
df_c.select("*").save("/tmp/testjson_spark","parquet”)
The output parquet file’s schema is like:
c1: OPTIONAL F:1
.bag: REPEATED F:1
..array: OPTIONAL INT64 R:1 D:3
Is there anyway to avoid using “.bag”, instead of, can we create the
parquet file using column type “REPEATED INT64”?
The expected data type is:
c1: REPEATED INT64 R:1 D:1
Thanks!
--
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)