[ https://issues.apache.org/jira/browse/SQOOP-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jarek Jarcec Cecho updated SQOOP-1273: -------------------------------------- Attachment: SQOOP-1273.patch > Multiple append jobs can easily end up sharing directories > ---------------------------------------------------------- > > Key: SQOOP-1273 > URL: https://issues.apache.org/jira/browse/SQOOP-1273 > Project: Sqoop > Issue Type: Bug > Affects Versions: 1.4.4 > Reporter: Jarek Jarcec Cecho > Assignee: Jarek Jarcec Cecho > Fix For: 1.4.5 > > Attachments: SQOOP-1273.patch > > > I've noticed at multiple user deployments that when running Sqoop in append > mode ({{--append}}) it can happen that two separate jobs will end up using > the same temporary directory. This is a disaster as those jobs will then > start interfering with each other and possibly even cause a data loss. > Currently we are using following code to generate temporary directory > ([AppendUtils.java|https://github.com/apache/sqoop/blob/trunk/src/java/org/apache/sqoop/util/AppendUtils.java#L269]): > {code} > public static Path getTempAppendDir(String tableName) { > String timeId = DATE_FORM.format(new Date(System.currentTimeMillis())); > String tempDir = TEMP_IMPORT_ROOT + Path.SEPARATOR + timeId + tableName; > return new Path(tempDir); > } > {code} > There are three different parts that we are currently using to generate the > temporary directory: > * {{TEMP_IMPORT_ROOT}}: Constant. It can be changed by the user if needed, > but as we do not have this documented, most users are using the default > constant value. > * {{timeId}} - Current time with millisecond precision. > * {{tableName}} - Name of the transferred table or {{null}} for query > ({{--query}}) based import. > The problem mainly surfaces in the {{--query}} based import when 2 out of the > 3 parts are constants and it can happen that two Sqoop jobs might get started > at the same time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)