your keys *must* fit in the RAM, period your key-buffers must be large enough
Am 12.02.2014 11:10, schrieb Pierre: > You don't undestand, I already did this. Read the thread since the beginning : > > https://lists.launchpad.net/maria-discuss/msg01338.html <= load data in empty > table with index > https://lists.launchpad.net/maria-discuss/msg01361.html <= Load the data in > empty table THEN add the index (what > you are suggesting). > > Le 12/02/2014 11:03, Justin Swanhart a écrit : >> Load the data THEN add the index. This will do the unique check once >> instead of on every insertion. >> >> On bloom filter miss, select from the table and insert if it isn't there. >> If this is multithreaded use innodb or >> tokudb and select .... for update, to prevent race. >> >> Sent from my iPhone >> >>> On Feb 12, 2014, at 12:49 AM, Pierre <pi...@hotmail.it> wrote: >>> >>> Ok I understand, but this doesn't resolve the initialization problem, I'm >>> sure there is a bug, or something >>> which can be improved a lot. I can't use the UNIQUE constraint when I have >>> to much key which doesn't fit in RAM. >>> >>> Because even If have memcache/bloom filter in front, I still need to create >>> my initial dataset with the unique >>> constraint,and even if my initial dataset is unique, I can't insert them in >>> my table because of the bug >>> explained before. >>> >>> If mysql can do a sort so quickly (create index), it should be trivial to >>> reuse this sorted data, eliminate the >>> duplicate and create the UNIQUE index. >>> >>> >>> Le 12/02/2014 09:24, Justin Swanhart a écrit : >>>> Hi, >>>> >>>> This is not a bug, but how b tree indexes work. For them to be efficient >>>> they must fit in ram. >>>> There are buffering mechanisms that can be used for secondary indexes >>>> in some cases, because the >>>> write can be done without a read, but ONLY when the index is not unique. >>>> It if it unique, then the >>>> index dive is necessary and a btree traversal can take multiple random IO >>>> to get to the leaf node. >>>> >>>> Faster IO can help, so you can look into flash storage. >>>> >>>> Consider fronting your lookups with memcache (a hash index) or a bloom >>>> filter (there are many >>>> implementations on the net) to reduce lookups. >>>> >>>> --Justin >>> >>> Le 12/02/2014 09:04, Pierre a écrit : >>>> Using this technique I have the same Issue. It's now running for severals >>>> hours, I'm at a 40% and >>>> looking at show full processlist, it's getting slower and slower. It will >>>> never finish. >>>> >>>> I think there is a bug here. >>>> >>>> Firstly, regardly the memory usage, It doesn't correctly use the buffer I >>>> did set, I think it's only >>>> using the key_buffer_size. myisam_sort_buffer_size or >>>> bulk_insert_buffer_size are not used on this >>>> task. >>>> >>>> So what's happening ? When the RAM is full, mariadb is doing random access >>>> on the disk to sort and >>>> filter Go of data ! That's why my cpu was only used a few percent, the >>>> process was in i/o wait most >>>> of the time. >>>> >>>> So what I am saying here is : mariadb can't crate UNIQUE keys/index if the >>>> rows doesn't fit in RAM. >>>> >>>> However if I try to create a standard index (ie non unique), it works well >>>> and it's done in less >>>> than an hour. >>>> >>>> ALTER IGNORE TABLE mytable ADD INDEX (c1), ADD INDEX(c2); >>>> Query OK, 349086532 rows affected (44 min 25.21 sec) >>>> Records: 349086532 Duplicates: 0 Warnings: 0 >>>> >>>> In this second usage case, Maria is doing a good work by using the >>>> myisam_sort_buffer_size. I think >>>> it's doing something like an external/merge sort, spliting the rows in >>>> part that fit in RAM, sorting >>>> them, merging them and creating index. >>>> >>>> It was 100% cpu most of the time, when It was not it was because mysql was >>>> loading the rows in RAM >>>> from hard disk (and not doing a random access on the hard disk like in >>>> create unique index). So why >>>> UNIQUE index is not behaving the same way ? >>>> >>>> It's easy to reproduce the bug, just create a binary file of 2 or 3 x size >>>> of RAM, then load data >>>> infile and try to create a UNIQUE index on it. It will never end. >>>> >>>>> Am 10.02.2014 13:45, schrieb Pierre: >>>>>> Mariadb is getting slower and slower when I'm inserting a massive amout >>>>>> of data. I'm trying to >>>> insert 166 507 066 >>>>>> rows (12go) using load data infile '' into an empty table. I splitted my >>>>>> file in 13 parts of the >>>> same size because >>>>>> it was too long to insert in one shot. When I inserted more than 100M >>>>>> rows, it starts to be >>>> extremely slow. >>>>> >>>>> he reason are the permamently updated keys >>>>> in general for large inserts on a new table >>>>> >>>>> UNIQUE KEY `p1` (`p1`), >>>>> UNIQUE KEY `p2` (`p2`), >>>>> >>>>> * remove keyes >>>>> * insert data >>>>> * add kyes >>>>> >>>>> https://dev.mysql.com/doc/refman/5.5/en/insert-speed.html
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp