[ 
https://issues.apache.org/jira/browse/SQOOP-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216025#comment-14216025
 ] 

Daniel Lanza García commented on SQOOP-1600:
--------------------------------------------

If you want to write timestamp fields in the format that Impala reads timestamp 
in Parquet file, you should implement it because It is not implemented yet.

However you are lucky, I have already implemented it and its available in my 
GitHub repository (https://github.com/dlanza1). You should clone and compile my 
three repos and add these to the classpath.

Another easier option is cast bigint to timestamp in the following way: select 
cast((hiredate / 1000) as TIMESTAMP) from aaa;
The problem is you have to cast in every query, if you do not have a lot of 
data, you can generate a new table with the INSERT... SELECT statement using 
casting to generate a table with a column of timestamp type.

I hope it helps you.

> Exception when import data using Data Connector for Oracle with TIMESTAMP 
> column type to Parquet files
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-1600
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1600
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.6
>         Environment: Hadoop version: 2.5.0-cdh5.2.0
> Sqoop: 1.4.5
>            Reporter: Daniel Lanza García
>              Labels: Connector, Oracle, Parquet, Timestamp
>             Fix For: 1.4.6
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> A error is thrown in each mapper when a import job is run using Quest data 
> connector for Oracle (-direct argument), the source table has a column of the 
> type timestamp and the destination files are of Parquet format.
> The mapper's log show that the error is the following:
> WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : 
> org.apache.avro.UnresolvedUnionException: Not in union ["long","null"]: 
> 2012-7-1 0:4:44. 403000000
> Which means the data obtained by the mapper (by the connector) is not of the 
> same type that the schema describe in this field. As we can read in the 
> error, the problem is related with the column UTC_STAMP (the unique column in 
> the source table that store a time stamp).
> If we check the generated schema for this column, we can observe that the 
> column is of the type long and SQL data type TIMESTAMP (93), which is correct.
> Schema: {"name" : "UTC_STAMP","type" : [ "long", "null" ],"columnName" : 
> "UTC_STAMP","sqlType" : "93"}
> If we debug the method where the exception is thrown 
> (org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:605)), we 
> can see that the problem comes when the type of the data obtained by the 
> mapper is of the type String which doesn't correspond with the type described 
> by the schema (long).
> The exception is not thrown when the destination files are text files. The 
> reason is that when you import to text files, a schema is not generated.
> Solution
> In the documentation, there is a section which describe how manage data and 
> timestamps when you use the Data Connector for Oracle and Hadoop. As we can 
> read in this section, this connector has a different way to manage this type 
> of data. However, this behavior can be disabled as describe this section with 
> the below parameter.
> -Doraoop.timestamp.string=false
> Although the problem is solved with this parameter (mandatory if you are in 
> this conditions), the software should deal with this types of column and 
> doesn't throw an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to