Array column stored as “.bag” in parquet file instead of “REPEATED INT64"

Jim Green Thu, 27 Aug 2015 15:54:07 -0700

Hi Team,

Say I have a test.json file: {"c1":[1,2,3]}
I can create a parquet file like :
var df = sqlContext.load("/tmp/test.json","json")
var df_c = df.repartition(1)
df_c.select("*").save("/tmp/testjson_spark","parquet”)


The output parquet file’s schema is like:
c1:          OPTIONAL F:1
.bag:        REPEATED F:1
..array:     OPTIONAL INT64 R:1 D:3

Is there anyway to avoid using “.bag”, instead of, can we create the
parquet file using column type “REPEATED INT64”?
The expected data type is:
c1:          REPEATED INT64 R:1 D:1

Thanks!
-- 
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)

Array column stored as “.bag” in parquet file instead of “REPEATED INT64"

Reply via email to