I'm trying to convert XML to AVRO.  But, I am getting SchemaParser
exception for 'Rules' which is existing in two separate containers.  Any
thoughts?

XML is attached.

 df =
sqlContext.read.format('com.databricks.spark.xml').options(rowTag='GGLResponse',attributePrefix='').load('GGL.xml')
 df.show()
 +--------------------+--------------------+---+--------------------+
 |     ResponseDataset|      ResponseHeader|ns2|               xmlns|
 +--------------------+--------------------+---+--------------------+
 |[[[[[1,1],[SD2000...|[2016-07-26T16:28...|GGL|http://www.xxxx.c...|
 +--------------------+--------------------+---+--------------------+

 >>> df.printSchema()
 root
  |-- ResponseDataset: struct (nullable = true)
  |    |-- ResponseFileGGL: struct (nullable = true)
  |    |    |-- OfferSets: struct (nullable = true)
  |    |    |    |-- OfferSet: struct (nullable = true)
  |    |    |    |    |-- OfferSetHeader: struct (nullable = true)
  |    |    |    |    |    |-- OfferSetIdentifier: long (nullable = true)
  |    |    |    |    |    |-- TotalOffersProcessed: long (nullable = true)
  |    |    |    |    |-- Offers: struct (nullable = true)
  |    |    |    |    |    |-- Identifier: string (nullable = true)
  |    |    |    |    |    |-- Offer: struct (nullable = true)
  |    |    |    |    |    |    |-- Rules: struct (nullable = true)
  |    |    |    |    |    |    |    |-- Rule: array (nullable = true)
  |    |    |    |    |    |    |    |    |-- element: struct (containsNull
= true)
  |    |    |    |    |    |    |    |    |    |-- BorrowerIdentifier: long
(nullable = true)
  |    |    |    |    |    |    |    |    |    |-- RuleIdentifier: long
(nullable = true)
  |    |    |    |    |    |-- PartyRoleIdentifier: long (nullable = true)
  |    |    |    |    |    |-- SuffixIdentifier: string (nullable = true)
  |    |    |    |    |    |-- UCP: string (nullable = true)
  |    |    |    |    |-- Pool: struct (nullable = true)
  |    |    |    |    |    |-- Identifier: string (nullable = true)
  |    |    |    |    |    |-- PartyRoleIdentifier: long (nullable = true)
  |    |    |    |    |    |-- Rules: struct (nullable = true)
  |    |    |    |    |    |    |-- Rule: array (nullable = true)
  |    |    |    |    |    |    |    |-- element: struct (containsNull =
true)
  |    |    |    |    |    |    |    |    |-- BIdentifier: long (nullable =
true)
  |    |    |    |    |    |    |    |    |-- RIdentifier: long (nullable =
true)
  |    |    |    |    |    |-- SuffixIdentifier: string (nullable = true)
  |    |    |    |    |    |-- UCP: string (nullable = true)
  |    |    |-- ResultHeader: struct (nullable = true)
  |    |    |    |-- RequestDateTime: string (nullable = true)
  |    |    |    |-- ResultDateTime: string (nullable = true)
  |    |-- ResponseFileUUID: string (nullable = true)
  |    |-- ResponseFileVersion: double (nullable = true)
  |-- ResponseHeader: struct (nullable = true)
  |    |-- ResponseDateTime: string (nullable = true)
  |    |-- SessionIdentifier: string (nullable = true)
  |-- ns2: string (nullable = true)
  |-- xmlns: string (nullable = true)


df.write.format('com.databricks.spark.avro').save('ggl_avro')

16/09/08 17:07:20 INFO MemoryStore: Block broadcast_73 stored as values in
memory (estimated size 233.5 KB, free 772.4 KB)
16/09/08 17:07:20 INFO MemoryStore: Block broadcast_73_piece0 stored as
bytes in memory (estimated size 28.2 KB, free 800.6 KB)
16/09/08 17:07:20 INFO BlockManagerInfo: Added broadcast_73_piece0 in
memory on localhost:29785 (size: 28.2 KB, free: 511.4 MB)
16/09/08 17:07:20 INFO SparkContext: Created broadcast 73 from
newAPIHadoopFile at XmlFile.scala:39
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/hdp/2.4.2.0-258/spark/python/pyspark/sql/readwriter.py", line
397, in save
    self._jwrite.save(path)
  File
"/usr/hdp/2.4.2.0-258/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
line 813, in __call__
  File "/usr/hdp/2.4.2.0-258/spark/python/pyspark/sql/utils.py", line 45,
in deco
    return f(*a, **kw)
  File
"/usr/hdp/2.4.2.0-258/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py",
line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o426.save.
: org.apache.avro.SchemaParseException: Can't redefine: Rules
        at
org.apache.avro.SchemaBuilder$NameContext.put(SchemaBuilder.java:936)
        at
org.apache.avro.SchemaBuilder$NameContext.access$600(SchemaBuilder.java:884)
        at
org.apache.avro.SchemaBuilder$NamespacedBuilder.completeSchema(SchemaBuilder.java:470)
        at
org.apache.avro.SchemaBuilder$RecordBuilder.fields(SchemaBuilder.java:1734)
        ....
<GGLResponse xmlns:ns2="GGL" xmlns="http://www.xxxx.com/webservices/yyyy";>
	<ResponseHeader>
		<ResponseDateTime>2016-07-26T16:28:30.965-04:00</ResponseDateTime>
		<SessionIdentifier>435ererewr-fdsfdsf-dfdsfdsf</SessionIdentifier>
	</ResponseHeader>
	<ResponseDataset>
		<ResponseFileVersion>0.0</ResponseFileVersion>
		<ResponseFileUUID>43243243242343423ddd1qsq31323</ResponseFileUUID>
		<ResponseFileGGL>
			<ResultHeader>
				<RequestDateTime>2016-07-26T16:28:27.000</RequestDateTime>
				<ResultDateTime>20160726 16:28</ResultDateTime>
			</ResultHeader>
			<OfferSets>
				<OfferSet>
					<OfferSetHeader>
						<OfferSetIdentifier>1</OfferSetIdentifier>
						<TotalOffersProcessed>1</TotalOffersProcessed>
					</OfferSetHeader>
					<Pool>
						<Identifier>SD2000000</Identifier>
						<SuffixIdentifier>S</SuffixIdentifier>
						<PartyRoleIdentifier>3123232131231233</PartyRoleIdentifier>
						<UCP>4a00278e-915b-44e8-ac88-f954fbce5f2a</UCP>
						<Rules>
							<Rule>
								<RIdentifier>33</RIdentifier>
								<BIdentifier>0</BIdentifier>
							</Rule>
							<Rule>
								<RIdentifier>612</RIdentifier>
								<BIdentifier>0</BIdentifier>
							</Rule>
						</Rules>
					</Pool>
					<Offers>
						<Identifier>SD2000000</Identifier>
						<SuffixIdentifier>S</SuffixIdentifier>
						<PartyRoleIdentifier>3123232131231233</PartyRoleIdentifier>
						<UCP>4a00278e-915b-44e8-ac88-f954fbce5f2a</UCP>
						<Offer>
							<Rules>
								<Rule>
									<RuleIdentifier>1000</RuleIdentifier>
									<BorrowerIdentifier>0</BorrowerIdentifier>
								</Rule>
								<Rule>
									<RuleIdentifier>1728</RuleIdentifier>
									<BorrowerIdentifier>0</BorrowerIdentifier>
								</Rule>
								<Rule>
									<RuleIdentifier>1959</RuleIdentifier>
									<BorrowerIdentifier>0</BorrowerIdentifier>
								</Rule>
								<Rule>
									<RuleIdentifier>1991</RuleIdentifier>
									<BorrowerIdentifier>1</BorrowerIdentifier>
								</Rule>
								<Rule>
									<RuleIdentifier>1991</RuleIdentifier>
									<BorrowerIdentifier>2</BorrowerIdentifier>
								</Rule>
							</Rules>
						</Offer>
					</Offers>
				</OfferSet>
			</OfferSets>
		</ResponseFileGGL>
	</ResponseDataset>
</GGLResponse>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to