I'm trying to convert XML to AVRO. But, I am getting SchemaParser exception for 'Rules' which is existing in two separate containers. Any thoughts?
XML is attached. df = sqlContext.read.format('com.databricks.spark.xml').options(rowTag='GGLResponse',attributePrefix='').load('GGL.xml') df.show() +--------------------+--------------------+---+--------------------+ | ResponseDataset| ResponseHeader|ns2| xmlns| +--------------------+--------------------+---+--------------------+ |[[[[[1,1],[SD2000...|[2016-07-26T16:28...|GGL|http://www.xxxx.c...| +--------------------+--------------------+---+--------------------+ >>> df.printSchema() root |-- ResponseDataset: struct (nullable = true) | |-- ResponseFileGGL: struct (nullable = true) | | |-- OfferSets: struct (nullable = true) | | | |-- OfferSet: struct (nullable = true) | | | | |-- OfferSetHeader: struct (nullable = true) | | | | | |-- OfferSetIdentifier: long (nullable = true) | | | | | |-- TotalOffersProcessed: long (nullable = true) | | | | |-- Offers: struct (nullable = true) | | | | | |-- Identifier: string (nullable = true) | | | | | |-- Offer: struct (nullable = true) | | | | | | |-- Rules: struct (nullable = true) | | | | | | | |-- Rule: array (nullable = true) | | | | | | | | |-- element: struct (containsNull = true) | | | | | | | | | |-- BorrowerIdentifier: long (nullable = true) | | | | | | | | | |-- RuleIdentifier: long (nullable = true) | | | | | |-- PartyRoleIdentifier: long (nullable = true) | | | | | |-- SuffixIdentifier: string (nullable = true) | | | | | |-- UCP: string (nullable = true) | | | | |-- Pool: struct (nullable = true) | | | | | |-- Identifier: string (nullable = true) | | | | | |-- PartyRoleIdentifier: long (nullable = true) | | | | | |-- Rules: struct (nullable = true) | | | | | | |-- Rule: array (nullable = true) | | | | | | | |-- element: struct (containsNull = true) | | | | | | | | |-- BIdentifier: long (nullable = true) | | | | | | | | |-- RIdentifier: long (nullable = true) | | | | | |-- SuffixIdentifier: string (nullable = true) | | | | | |-- UCP: string (nullable = true) | | |-- ResultHeader: struct (nullable = true) | | | |-- RequestDateTime: string (nullable = true) | | | |-- ResultDateTime: string (nullable = true) | |-- ResponseFileUUID: string (nullable = true) | |-- ResponseFileVersion: double (nullable = true) |-- ResponseHeader: struct (nullable = true) | |-- ResponseDateTime: string (nullable = true) | |-- SessionIdentifier: string (nullable = true) |-- ns2: string (nullable = true) |-- xmlns: string (nullable = true) df.write.format('com.databricks.spark.avro').save('ggl_avro') 16/09/08 17:07:20 INFO MemoryStore: Block broadcast_73 stored as values in memory (estimated size 233.5 KB, free 772.4 KB) 16/09/08 17:07:20 INFO MemoryStore: Block broadcast_73_piece0 stored as bytes in memory (estimated size 28.2 KB, free 800.6 KB) 16/09/08 17:07:20 INFO BlockManagerInfo: Added broadcast_73_piece0 in memory on localhost:29785 (size: 28.2 KB, free: 511.4 MB) 16/09/08 17:07:20 INFO SparkContext: Created broadcast 73 from newAPIHadoopFile at XmlFile.scala:39 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/hdp/2.4.2.0-258/spark/python/pyspark/sql/readwriter.py", line 397, in save self._jwrite.save(path) File "/usr/hdp/2.4.2.0-258/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__ File "/usr/hdp/2.4.2.0-258/spark/python/pyspark/sql/utils.py", line 45, in deco return f(*a, **kw) File "/usr/hdp/2.4.2.0-258/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o426.save. : org.apache.avro.SchemaParseException: Can't redefine: Rules at org.apache.avro.SchemaBuilder$NameContext.put(SchemaBuilder.java:936) at org.apache.avro.SchemaBuilder$NameContext.access$600(SchemaBuilder.java:884) at org.apache.avro.SchemaBuilder$NamespacedBuilder.completeSchema(SchemaBuilder.java:470) at org.apache.avro.SchemaBuilder$RecordBuilder.fields(SchemaBuilder.java:1734) ....
<GGLResponse xmlns:ns2="GGL" xmlns="http://www.xxxx.com/webservices/yyyy"> <ResponseHeader> <ResponseDateTime>2016-07-26T16:28:30.965-04:00</ResponseDateTime> <SessionIdentifier>435ererewr-fdsfdsf-dfdsfdsf</SessionIdentifier> </ResponseHeader> <ResponseDataset> <ResponseFileVersion>0.0</ResponseFileVersion> <ResponseFileUUID>43243243242343423ddd1qsq31323</ResponseFileUUID> <ResponseFileGGL> <ResultHeader> <RequestDateTime>2016-07-26T16:28:27.000</RequestDateTime> <ResultDateTime>20160726 16:28</ResultDateTime> </ResultHeader> <OfferSets> <OfferSet> <OfferSetHeader> <OfferSetIdentifier>1</OfferSetIdentifier> <TotalOffersProcessed>1</TotalOffersProcessed> </OfferSetHeader> <Pool> <Identifier>SD2000000</Identifier> <SuffixIdentifier>S</SuffixIdentifier> <PartyRoleIdentifier>3123232131231233</PartyRoleIdentifier> <UCP>4a00278e-915b-44e8-ac88-f954fbce5f2a</UCP> <Rules> <Rule> <RIdentifier>33</RIdentifier> <BIdentifier>0</BIdentifier> </Rule> <Rule> <RIdentifier>612</RIdentifier> <BIdentifier>0</BIdentifier> </Rule> </Rules> </Pool> <Offers> <Identifier>SD2000000</Identifier> <SuffixIdentifier>S</SuffixIdentifier> <PartyRoleIdentifier>3123232131231233</PartyRoleIdentifier> <UCP>4a00278e-915b-44e8-ac88-f954fbce5f2a</UCP> <Offer> <Rules> <Rule> <RuleIdentifier>1000</RuleIdentifier> <BorrowerIdentifier>0</BorrowerIdentifier> </Rule> <Rule> <RuleIdentifier>1728</RuleIdentifier> <BorrowerIdentifier>0</BorrowerIdentifier> </Rule> <Rule> <RuleIdentifier>1959</RuleIdentifier> <BorrowerIdentifier>0</BorrowerIdentifier> </Rule> <Rule> <RuleIdentifier>1991</RuleIdentifier> <BorrowerIdentifier>1</BorrowerIdentifier> </Rule> <Rule> <RuleIdentifier>1991</RuleIdentifier> <BorrowerIdentifier>2</BorrowerIdentifier> </Rule> </Rules> </Offer> </Offers> </OfferSet> </OfferSets> </ResponseFileGGL> </ResponseDataset> </GGLResponse>
--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org