Re: [Maria-discuss] Very slow insert with load data infile

Reindl Harald Wed, 12 Feb 2014 02:15:48 -0800

your keys *must* fit in the RAM, period
your key-buffers must be large enough


Am 12.02.2014 11:10, schrieb Pierre:
> You don't undestand, I already did this. Read the thread since the beginning :
> 
> https://lists.launchpad.net/maria-discuss/msg01338.html <= load data in empty 
> table with index
> https://lists.launchpad.net/maria-discuss/msg01361.html <= Load the data in 
> empty table THEN add the index (what
> you are suggesting).
> 
> Le 12/02/2014 11:03, Justin Swanhart a écrit :
>> Load the data THEN add the index.  This will do the unique check once 
>> instead of on every insertion.
>>
>> On bloom filter miss, select from the table and insert if it isn't there.  
>> If this is multithreaded use innodb or
>> tokudb and select .... for update, to prevent race.
>>
>> Sent from my iPhone
>>
>>> On Feb 12, 2014, at 12:49 AM, Pierre <pi...@hotmail.it> wrote:
>>>
>>> Ok I understand, but this doesn't resolve the initialization problem, I'm 
>>> sure there is a bug, or something
>>> which can be improved a lot. I can't use the UNIQUE constraint when I have 
>>> to much key which doesn't fit in RAM.
>>>
>>> Because even If have memcache/bloom filter in front, I still need to create 
>>> my initial dataset with the unique
>>> constraint,and even if my initial dataset is unique, I can't insert them in 
>>> my table because of the bug
>>> explained before.
>>>
>>> If mysql can do a sort so quickly (create index), it should be trivial to 
>>> reuse this sorted data, eliminate the
>>> duplicate and create the UNIQUE index.
>>>
>>>
>>> Le 12/02/2014 09:24, Justin Swanhart a écrit :
>>>> Hi,
>>>>
>>>> This is not a bug, but how b tree indexes work.  For them to be efficient 
>>>> they must fit in ram.
>>>>    There are buffering mechanisms that can be used for secondary indexes 
>>>> in some cases, because the
>>>> write can be done without a read, but ONLY when the index is not unique.  
>>>> It if it unique, then the
>>>> index dive is necessary and a btree traversal can take multiple random IO 
>>>> to get to the leaf node.
>>>>
>>>> Faster IO can help, so you can look into flash storage.
>>>>
>>>> Consider fronting your lookups with memcache (a hash index) or a bloom 
>>>> filter (there are many
>>>> implementations on the net) to reduce lookups.
>>>>
>>>> --Justin
>>>
>>> Le 12/02/2014 09:04, Pierre a écrit :
>>>> Using this technique I have the same Issue. It's now running for severals 
>>>> hours, I'm at a 40% and
>>>> looking at show full processlist, it's getting slower and slower. It will 
>>>> never finish.
>>>>
>>>> I think there is a bug here.
>>>>
>>>> Firstly, regardly the memory usage, It doesn't correctly use the buffer I 
>>>> did set, I think it's only
>>>> using the key_buffer_size. myisam_sort_buffer_size or 
>>>> bulk_insert_buffer_size are not used on this
>>>> task.
>>>>
>>>> So what's happening ? When the RAM is full, mariadb is doing random access 
>>>> on the disk to sort and
>>>> filter Go of data ! That's why my cpu was only used a few percent, the 
>>>> process was in i/o wait most
>>>> of the time.
>>>>
>>>> So what I am saying here is : mariadb can't crate UNIQUE keys/index if the 
>>>> rows doesn't fit in RAM.
>>>>
>>>> However if I try to create a standard index (ie non unique), it works well 
>>>> and it's done in less
>>>> than an hour.
>>>>
>>>> ALTER IGNORE TABLE mytable ADD INDEX (c1), ADD INDEX(c2);
>>>> Query OK, 349086532 rows affected (44 min 25.21 sec)
>>>> Records: 349086532  Duplicates: 0  Warnings: 0
>>>>
>>>> In this second usage case, Maria is doing a good work by using the 
>>>> myisam_sort_buffer_size. I think
>>>> it's doing something like an external/merge sort, spliting the rows in 
>>>> part that fit in RAM, sorting
>>>> them, merging them and creating index.
>>>>
>>>> It was 100% cpu most of the time, when It was not it was because mysql was 
>>>> loading the rows in RAM
>>>> from hard disk (and not doing a random access on the hard disk like in 
>>>> create unique index). So why
>>>> UNIQUE index is not behaving the same way ?
>>>>
>>>> It's easy to reproduce the bug, just create a binary file of 2 or 3 x size 
>>>> of RAM, then load data
>>>> infile and try to create a UNIQUE index on it. It will never end.
>>>>
>>>>> Am 10.02.2014 13:45, schrieb Pierre:
>>>>>> Mariadb is getting slower and slower when I'm inserting a massive amout 
>>>>>> of data. I'm trying to
>>>> insert 166 507 066
>>>>>> rows (12go) using load data infile '' into an empty table. I splitted my 
>>>>>> file in 13 parts of the
>>>> same size because
>>>>>> it was too long to insert in one shot. When I inserted more than 100M 
>>>>>> rows, it starts to be
>>>> extremely slow.
>>>>>
>>>>> he reason are the permamently updated keys
>>>>> in general for large inserts on a new table
>>>>>
>>>>> UNIQUE KEY `p1` (`p1`),
>>>>> UNIQUE KEY `p2` (`p2`),
>>>>>
>>>>> * remove keyes
>>>>> * insert data
>>>>> * add kyes
>>>>>
>>>>> https://dev.mysql.com/doc/refman/5.5/en/insert-speed.html

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Mailing list: https://launchpad.net/~maria-discuss
Post to     : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Very slow insert with load data infile

Reply via email to