[ https://issues.apache.org/jira/browse/SQOOP-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16639637#comment-16639637 ]
Boglarka Egyed commented on SQOOP-1312: --------------------------------------- I have updated the Affects Version to include 1.4.6 and 1.4.7 too. Patch provided in [https://reviews.apache.org/r/25621/] seems to be abandoned, last update was 4 years ago, I think the fix is up for grabs now. > One of mappers does not load data from mySql if double column is used as > split key > ---------------------------------------------------------------------------------- > > Key: SQOOP-1312 > URL: https://issues.apache.org/jira/browse/SQOOP-1312 > Project: Sqoop > Issue Type: Bug > Affects Versions: 1.4.4, 1.4.6, 1.4.7 > Reporter: Jong Ho Lee > Assignee: Jong Ho Lee > Priority: Major > Attachments: splitter.patch, splitter.patch > > > When we used Sqoop to load data from mySQL using one double column as > split-key in Samsung SDS, > the last mapper did not load data from mySQL at all. > The number of mappers was sometimes increased by 1. > I think they were caused by some bugs in FloatSplitter.java > For the last split, lowClausePrefix + Double.toString(curUpper), may be > lowClausePrefix + Double.toString(curLower). > In while (curUpper < maxVal) loop, because of round-off error, > minVal + splitSize * numSplits can be smaller than maxVal. > Therefore, using for-loop would be better. > Attached is a proposed new FloatSplitter.java > {code} > /** > * Licensed to the Apache Software Foundation (ASF) under one > * or more contributor license agreements. See the NOTICE file > * distributed with this work for additional information > * regarding copyright ownership. The ASF licenses this file > * to you under the Apache License, Version 2.0 (the > * "License"); you may not use this file except in compliance > * with the License. You may obtain a copy of the License at > * > * http://www.apache.org/licenses/LICENSE-2.0 > * > * Unless required by applicable law or agreed to in writing, software > * distributed under the License is distributed on an "AS IS" BASIS, > * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > * See the License for the specific language governing permissions and > * limitations under the License. > */ > // modified by Jongho Lee at Samsung SDS. > package org.apache.sqoop.mapreduce.db; > import java.sql.ResultSet; > import java.sql.SQLException; > import java.util.ArrayList; > import java.util.List; > import org.apache.commons.logging.Log; > import org.apache.commons.logging.LogFactory; > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.mapreduce.InputSplit; > import com.cloudera.sqoop.config.ConfigurationHelper; > import com.cloudera.sqoop.mapreduce.db.DBSplitter; > import com.cloudera.sqoop.mapreduce.db.DataDrivenDBInputFormat; > /** > * Implement DBSplitter over floating-point values. > */ > public class FloatSplitter implements DBSplitter { > private static final Log LOG = LogFactory.getLog(FloatSplitter.class); > private static final double MIN_INCREMENT = 10000 * Double.MIN_VALUE; > public List<InputSplit> split(Configuration conf, ResultSet results, > String colName) throws SQLException { > LOG.warn("Generating splits for a floating-point index column. Due to > the"); > LOG.warn("imprecise representation of floating-point values in Java, > this"); > LOG.warn("may result in an incomplete import."); > LOG.warn("You are strongly encouraged to choose an integral split > column."); > List<InputSplit> splits = new ArrayList<InputSplit>(); > if (results.getString(1) == null && results.getString(2) == null) { > // Range is null to null. Return a null split accordingly. > splits.add(new DataDrivenDBInputFormat.DataDrivenDBInputSplit( > colName + " IS NULL", colName + " IS NULL")); > return splits; > } > double minVal = results.getDouble(1); > double maxVal = results.getDouble(2); > // Use this as a hint. May need an extra task if the size doesn't > // divide cleanly. > int numSplits = ConfigurationHelper.getConfNumMaps(conf); > double splitSize = (maxVal - minVal) / (double) numSplits; > if (splitSize < MIN_INCREMENT) { > splitSize = MIN_INCREMENT; > } > String lowClausePrefix = colName + " >= "; > String highClausePrefix = colName + " < "; > double curLower = minVal; > double curUpper = curLower + splitSize; > for (int i = 0; i < numSplits - 1; i++) { > // while (curUpper < maxVal) { // changed to for loop > splits.add(new DataDrivenDBInputFormat.DataDrivenDBInputSplit( > lowClausePrefix + Double.toString(curLower), > highClausePrefix + Double.toString(curUpper))); > curLower = curUpper; > curUpper += splitSize; > } > // Catch any overage and create the closed interval for the last split. > if (curLower <= maxVal || splits.size() == 1) { > splits.add(new DataDrivenDBInputFormat.DataDrivenDBInputSplit( > lowClausePrefix + Double.toString(curLower), > colName + " <= " + Double.toString(maxVal))); > // curUpper -> curLower // changed > } > if (results.getString(1) == null || results.getString(2) == null) { > // At least one extrema is null; add a null split. > splits.add(new DataDrivenDBInputFormat.DataDrivenDBInputSplit( > colName + " IS NULL", colName + " IS NULL")); > } > return splits; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)