Hi, One way of avowing null values in an xml tag when displaying the results is the use of Outer Lateral Views <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView#LanguageManualLateralView-OuterLateralViews> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView#LanguageManualLateralView-OuterLateralViews
The explanation below summarises it The user can specify the optional OUTER keyword to generate rows even when a LATERAL VIEW usually would not generate a row. This happens when the UDTF used does not generate any rows which happens easily with explode when the column to explode is empty. In this case the source row would never appear in the results. *OUTER can be used to prevent that and rows will be generated with NULL values in the columns coming from the UDTF.* So for hdfs-site where description is null one can do a test for null value for every column insert overwrite table hdfs_site select rownum , name , value , description from hdfs_site_temp lateral view outer explode(Pname) a as name lateral view outer explode(Pvalue) a as value lateral view outer explode(Pdescription) a as description ; 0: jdbc:hive2://rhes75:10099/default> select * from hdfs_site . . . . . . . . . . . . . . . . . . > order by rownum . . . . . . . . . . . . . . . . . . > ; +-------------------+----------------------------------------+----------------------------------------------------+------------------------+ | hdfs_site.rownum | hdfs_site.name | hdfs_site.value | hdfs_site.description | +-------------------+----------------------------------------+----------------------------------------------------+------------------------+ | 1 | dfs.namenode.ec.system.default.policy | RS-6-3-1024k | NULL | | 2 | dfs.safemode.threshold.pct | 0 | NULL | | 3 | hadoop.tmp.dir | /tmp | NULL | | 4 | dfs.image.transfer.timeout | 600000 | NULL | | 5 | dfs.permissions | false | NULL | | 6 | dfs.block.size | 134217728 | NULL | | 7 | dfs.balance.bandwidthPerSec | 10 | NULL | | 8 | dfs.datanode.data.dir | file:/d4T/hduser/hadoop/hadoop_store/hdfs/datanode | NULL | | 9 | dfs.namenode.name.dir | file:/d4T/hduser/hadoop/hadoop_store/hdfs/namenode | NULL | +-------------------+----------------------------------------+----------------------------------------------------+------------------------+ HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Thu, 28 Jun 2018 at 09:40, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > The classic XML configuration files like hive-site.xml have the following > patterns: > > > <property><name></name><value></value><description></description></properly> > > In order to read these xml into Hive, you need to do the following: > > ### 1. Brings each record to one line (deletes the root element > configuration tags): > ### 2. Need to get rid of space between tags > ### > <property><name></name><value></value><description></description></properly> > ### 3. Each xml record needs to be in one line! > ### 4. The xml file needs the same start and end tag for each line like > property tags. Since you have only one root tag for the whole document, > that is not going to work. Also > ### need to get rid of root tag at the beginning and end plus the xml > information at the beginning (i.e ?xml version) > > This simple shell command does work (hive-site.xml, hdfs-site.xml etc) > > cat ${HIVE_HOME}/conf/hive-site.xml | tr -d '&' | tr '\n' ' ' | tr '\r' ' > ' | sed 's|</property>|</property>\n|g'| sed 's/<configuration>//g'|sed > 's|</configuration>||g' \ > | sed -e 's/^[ ]*//g' | sed -e 's/>[ ]*/>/g' | grep > -v "<?xml version=" > ${XML_FILE} > > replace ${XML_FILE} with any name > > The below code will work for hive-ste.xml > > drop table if exists xml_temp; > -- create a load table > create table xml_temp (xmlData string); > -- load data from local xml file > load data local inpath "${XML_FILE}" into table xml_temp; > select * from xml_temp; > drop table if exists hive_site_temp; > create table if not exists hive_site_temp( > rownum int, > Pname array<string>, > Pvalue array<string>, > Pdescription array<string>) > row format delimited > fields terminated by '|' > ; > --desc hive_site_temp; > insert overwrite table hive_site_temp > select > row_number() over() > , xpath(xmlData,'property/name/text()') > , xpath(xmlData,'property/value/text()') > , xpath(xmlData,'property/description/text()') > from xml_temp > ; > drop table if exists hive_site; > create table if not exists hive_site > ( > rownum int, > name string, > value string, > description string > ) > STORED AS PARQUET > TBLPROPERTIES ("parquet.compression"="SNAPPY") > ; > insert overwrite table hive_site > select > rownum > , name > , value > , description > from > hive_site_temp > > > > *lateral view explode(Pname) a as namelateral view explode(Pvalue) a as > valuelateral view explode(Pdescription) a as description*; > select * from hive_site > order by rownum > > > However, for hdfs-site.xml, by default there is no "description" tag. > Still you can read data in no problem. However, "lateral view > explode(Pdescription) a as description" will not work and table will have > no rows! > > One way is to exclude that column from insert. However, I was wondering if > explode function can check if column has any value before populating it! > > For example for hdfs-site.xml these are my values: > > 0: jdbc:hive2://rhes75:10099/default> select > xpath(xmlData,'property/description/text()') as description from xml_temp; > +--------------+ > | description | > +--------------+ > | [] | > | [] | > | [] | > | [] | > | [] | > | [] | > | [] | > | [] | > | [] | > +--------------+ > > > Thanks > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > >