The classic XML configuration files like hive-site.xml have the following patterns:
<property><name></name><value></value><description></description></properly> In order to read these xml into Hive, you need to do the following: ### 1. Brings each record to one line (deletes the root element configuration tags): ### 2. Need to get rid of space between tags ### <property><name></name><value></value><description></description></properly> ### 3. Each xml record needs to be in one line! ### 4. The xml file needs the same start and end tag for each line like property tags. Since you have only one root tag for the whole document, that is not going to work. Also ### need to get rid of root tag at the beginning and end plus the xml information at the beginning (i.e ?xml version) This simple shell command does work (hive-site.xml, hdfs-site.xml etc) cat ${HIVE_HOME}/conf/hive-site.xml | tr -d '&' | tr '\n' ' ' | tr '\r' ' ' | sed 's|</property>|</property>\n|g'| sed 's/<configuration>//g'|sed 's|</configuration>||g' \ | sed -e 's/^[ ]*//g' | sed -e 's/>[ ]*/>/g' | grep -v "<?xml version=" > ${XML_FILE} replace ${XML_FILE} with any name The below code will work for hive-ste.xml drop table if exists xml_temp; -- create a load table create table xml_temp (xmlData string); -- load data from local xml file load data local inpath "${XML_FILE}" into table xml_temp; select * from xml_temp; drop table if exists hive_site_temp; create table if not exists hive_site_temp( rownum int, Pname array<string>, Pvalue array<string>, Pdescription array<string>) row format delimited fields terminated by '|' ; --desc hive_site_temp; insert overwrite table hive_site_temp select row_number() over() , xpath(xmlData,'property/name/text()') , xpath(xmlData,'property/value/text()') , xpath(xmlData,'property/description/text()') from xml_temp ; drop table if exists hive_site; create table if not exists hive_site ( rownum int, name string, value string, description string ) STORED AS PARQUET TBLPROPERTIES ("parquet.compression"="SNAPPY") ; insert overwrite table hive_site select rownum , name , value , description from hive_site_temp *lateral view explode(Pname) a as namelateral view explode(Pvalue) a as valuelateral view explode(Pdescription) a as description*; select * from hive_site order by rownum However, for hdfs-site.xml, by default there is no "description" tag. Still you can read data in no problem. However, "lateral view explode(Pdescription) a as description" will not work and table will have no rows! One way is to exclude that column from insert. However, I was wondering if explode function can check if column has any value before populating it! For example for hdfs-site.xml these are my values: 0: jdbc:hive2://rhes75:10099/default> select xpath(xmlData,'property/description/text()') as description from xml_temp; +--------------+ | description | +--------------+ | [] | | [] | | [] | | [] | | [] | | [] | | [] | | [] | | [] | +--------------+ Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.