Hi Roshan,
The following snippet summarizes the delimiters for your Hive table:
        colelction.delim        \u0002              
        field.delim             \u0001              
        mapkey.delim            \u0003              
        serialization.format    \u0001   


Your fields are delimited by \u0001, collections are delimited by \u0002 and 
the delimiter between the key and value in any maps is \u0003. Can you verify 
that your XML content doesn't contain any of these characters?

If this still doesn't help, could you pick an affected row and share what the 
XML appears as in Hive and what it is expected to be?

Good luck!
Mark

Mark Grover, Business Intelligence Analyst
OANDA Corporation 

www: oanda.com www: fxtrade.com 

----- Original Message -----
From: "mperformer" <codevally.mail.l...@gmail.com>
To: user@hive.apache.org
Sent: Sunday, May 6, 2012 11:34:55 PM
Subject: Re: Data are not displayed correctly on hive tables

Hi Mark 

Many thanks for your reply. Please find the below output. 

hive> describe formatted messagetemplate; 
OK 
# col_name data_type comment 

messagetemplateid bigint None 
messagetemplatename string None 
datacol string None 
messagetemplatetype string None 
messagetype string None 
messagetemplatedescription string None 
originatingtemplateid bigint None 
edited boolean None 
userid bigint None 
projectid bigint None 
responsetemplateid bigint None 

# Detailed Table Information 
Database: default 
Owner: root 
CreateTime: Mon May 07 12:06:59 EST 2012 
LastAccessTime: UNKNOWN 
Protect Mode: None 
Retention: 0 
Location: hdfs://app6:9100/mnt/hive-test/warehouse/messagetemplate 
Table Type: MANAGED_TABLE 
Table Parameters: 
comment This is the messagetemplate table 
transient_lastDdlTime 1336356473 

# Storage Information 
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 
InputFormat: org.apache.hadoop.mapred.TextInputFormat 
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 
Compressed: No 
Num Buckets: -1 
Bucket Columns: [] 
Sort Columns: [] 
Storage Desc Params: 
colelction.delim \u0002 
field.delim \u0001 
mapkey.delim \u0003 
serialization.format \u0001 
Time taken: 3.2 seconds 

Thanks again. 

./Roshan. 


On Mon, May 7, 2012 at 1:06 PM, Mark Grover < mgro...@oanda.com > wrote: 


Could you share the output of the following command in Hive: 
describe formatted messagetemplate 

My hunch is that your Hive table is using a delimiter (e.g. '\t') that appears 
in the content of your XML. 

Mark Grover, Business Intelligence Analyst 
OANDA Corporation 

www: oanda.com www: fxtrade.com 



----- Original Message ----- 
From: "mperformer" < codevally.mail.l...@gmail.com > 
To: user@hive.apache.org 
Sent: Sunday, May 6, 2012 8:34:27 PM 
Subject: Data are not displayed correctly on hive tables 

Hi 

I am using 

• Hadoop 0.20.2 
• Hive 0.8.1 
• Sqoop 1.4.1-incubating 



in my sample project. Currently I am importing data from PostgreSQL to Hive 
table using Sqoop. My database table in PostgreSQL has 4 columns and one column 
stores a bit large XML file as TEXT data type. The same column defined in HIVE 
as string, but after that column data is not importing and shows as null; 

Table structure in PostgreSQL 

CREATE TABLE public.messagetemplate ( 
messagetemplateid BIGSERIAL, 
messagetemplatename TEXT, 
data TEXT, 
messagetemplatetype TEXT, 
CONSTRAINT pk_messagetemplate PRIMARY KEY(messagetemplateid) 
) WITHOUT OIDS; 

Table structure in Hive 

hive> desc messagetemplate; 
OK 
messagetemplateid bigint 
messagetemplatename string 
data string 
messagetemplatetype string 


The data column store the XML file as text, but during the import to hive, all 
data are imported properly (checked the files in HDFS). But using HIVE select 
statement, it only shows small part from the XML text and the rest column (last 
column) is null. 

Could someone please help me to sort this out. Thanks. 





Reply via email to