Shaik Meeran created SQOOP-3452:
-----------------------------------

             Summary: SQOOP Upper bound value issue when import from 
MSSQLServer using TOP n
                 Key: SQOOP-3452
                 URL: https://issues.apache.org/jira/browse/SQOOP-3452
             Project: Sqoop
          Issue Type: Bug
          Components: connectors/sqlserver, op
    Affects Versions: 1.4.6
         Environment: CENTOS

SQOOP - 1.4.6

Hadoop - 2.6.0
            Reporter: Shaik Meeran
         Attachments: SqoopImportLog.txt

I am trying to import data from MSSQL Server to hadoop using SQOOP with a fixed 
number of rows in each batch using top 'N' in the query. 
Below is the code :

sqoop import --connect "jdbc:sqlserver://myserverIp;database=organization"  \
--connection-manager org.apache.sqoop.manager.SQLServerManager \
--driver com.microsoft.sqlserver.jdbc.SQLServerDriver \
--username hadoop --password hadoop123 \
--target-dir /user/hive/warehouse/organization.db/employees \
--query "SELECT TOP 5 [employee_id] \
      ,[first_name] \
      ,[last_name] \
      ,[email] \
      ,[phone_number] \
      ,[hire_date] \
      ,[job_id] \
      ,[salary] \
      ,[commission_pct] \
      ,[manager_id] \
      ,[department_id] \
  FROM [dbo].[employees] where \$CONDITIONS" \
--m 1 --incremental append --check-column employee_id --last-value 99;

when i execute, SQOOP is fetching the upper bound value with the below query

 SELECT MAX([employee_id]) FROM (SELECT TOP 5 [employee_id]       ,[first_name] 
      ,[last_name]       ,[email]       ,[phone_number]       ,[hire_date]      
 ,[job_id]       ,[salary]       ,[commission_pct]       ,[manager_id]       
,[department_id]   FROM [dbo].[employees] where (1 = 1)) 
sqoop_import_query_alias

which is returning 206, this is the highest value in employees table
But I am reading only five rows so the returned value should be max of returned 
5 rows that is 104 in this case. therefore this will be an issue when I run the 
job again.

I have posting this issue with MAX function on MSDN forum and I got some 
responses asking to use order by clause in the sub query.

[MSDN 
Issue|https://social.msdn.microsoft.com/Forums/en-US/f1b3ad8f-7936-419b-b10d-8e2c19e4d266/aggregate-max-on-sub-query-with-quottop-nquot-is-not-working-correctly?forum=transactsql]

I believe this should be a fix at SQOOP when using with MSSQL





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to