[ https://issues.apache.org/jira/browse/SQOOP-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Attila Szabo reassigned SQOOP-2920: ----------------------------------- Assignee: Attila Szabo > sqoop performance deteriorates significantly on wide datasets; sqoop 100% on > cpu > -------------------------------------------------------------------------------- > > Key: SQOOP-2920 > URL: https://issues.apache.org/jira/browse/SQOOP-2920 > Project: Sqoop > Issue Type: Bug > Components: connectors/oracle, hive-integration, metastore > Affects Versions: 1.4.5 > Environment: - sqoop export on a very wide dataset (over 700 columns) > - sqoop export to oracle > - subset of columns is exported (using --columns argument) > - parquet files > - --table --hcatalog-database --hcatalog-table options are used > Reporter: Ruslan Dautkhanov > Assignee: Attila Szabo > Priority: Critical > Labels: columns, hive, oracle, perfomance > Attachments: jstack.zip, top - sqoop mappers hog cpu.png > > > We sqoop export from datalake to Oracle quite often. > Every time we sqoop "narrow" datasets, Oracle always have scalability issues > (3-node all-flash Oracle RAC) normally can't keep up with more than 45-55 > sqoop mappers. Map-reduce framework shows sqoop mappers are not so loaded. > On wide datasets, this picture is quite opposite. Oracle shows 95% of > sessions are bored and waiting for new INSERTs. Even when we go over hundred > of mappers. Sqoop has serious scalability issues on very wide datasets. (Our > company normally has very wide datasets) > For example, on the last sqoop export: > Started ~2.5 hours ago and 95 mappers already accumulated > CPU time spent (ms) 1,065,858,760 > (looking at this metric through map-reduce framework stats) > 1 million seconds of CPU time. > Or 11219.57 per mapper. Which is roughly 3.11 hours of CPU time per mapper. > So they are 100% on cpu. > Will also attach jstack files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)