Hi Sergei, On Tue, Dec 22, 2020 at 7:58 PM Sergei Golubchik <s...@mariadb.org> wrote: > > Hi, Aleksey! > > On Nov 03, Aleksey Midenkov wrote: > > > > the updated patches are here: > > > > https://github.com/MariaDB/server/commits/5f6bcd6cd0145513974b0dfc5b2cefba28b72f1a > > I started writing my comments there, but realized that it'd be clearer > if I reply here, more context. That is, see below. > > One comment about the new test case, you have 16K lines in the result > file, are all 16K necessary? Can you do something like > > select x, left(b,10), left(v,10) > > ?
Yes, these shorter result cases reproduce as well. Updated. > > And please, pretty please, don't use non-constant references as function > arguments. Why? With pointers we can pass NULL and we don't need NULL there, do we? > > > On Mon, Nov 2, 2020 at 11:33 PM Sergei Golubchik <s...@mariadb.org> wrote: > > > > > > > diff --git a/sql/field.cc b/sql/field.cc > > > > index bdaaecc20269..0fd40c979d2c 100644 > > > > --- a/sql/field.cc > > > > +++ b/sql/field.cc > > > > @@ -8310,6 +8310,7 @@ int Field_blob::store(const char *from,uint > > > > length,CHARSET_INFO *cs) > > > > copy_length= copier.well_formed_copy(field_charset, > > > > (char*) value.ptr(), new_length, > > > > cs, from, length); > > > > + value.length(copy_length); > > > > > > good! Could this be a distinct bug with its own test case? > > > > Do we need to spend time on that bureaucracy? There are real problems to > > solve... Tickets drain time from the management people and generally from > > anyone who sees them. > > I didn't mean opening an MDEV. Only writing a test case and extracting > it into a separate commit. > > If this is a genuine bug fix and you don't create a test case - how can > you be sure the fix will stay? It may disappear in a merge or a rebase, > manually or automatically. Or in a refactoring that "looks fine, all > tests pass". If you can create a test case - please, do. It is tested by federated.federated_partition test case. Why do we have to extract this side-fix to a separate commit? Looks anyway as a bureaucracy to me. The main bug is critical, the test case tests both bugs nicely. If the partitioning code is refactored, the test case still tests that side-fix. > > > > > Field_blob::store_length(copy_length); > > > > bmove(ptr+packlength,(uchar*) &tmp,sizeof(char*)); > > > > > > > > diff --git a/sql/ha_partition.cc b/sql/ha_partition.cc > > > > index 9d82df0235c7..7c658082d397 100644 > > > > --- a/sql/ha_partition.cc > > > > +++ b/sql/ha_partition.cc > > > > @@ -5125,14 +5128,14 @@ bool ha_partition::init_record_priority_queue() > > > > uint alloc_len; > > > > uint used_parts= bitmap_bits_set(&m_part_info->read_partitions); > > > > /* Allocate record buffer for each used partition. */ > > > > - m_priority_queue_rec_len= m_rec_length + PARTITION_BYTES_IN_POS; > > > > + m_priority_queue_rec_len= m_rec_length + ORDERED_REC_OFFSET; > > > > if (!m_using_extended_keys) > > > > m_priority_queue_rec_len += m_file[0]->ref_length; > > > > alloc_len= used_parts * m_priority_queue_rec_len; > > > > /* Allocate a key for temporary use when setting up the scan. */ > > > > alloc_len+= table_share->max_key_length; > > > > > > > > - if (!(m_ordered_rec_buffer= (uchar*)my_malloc(alloc_len, > > > > MYF(MY_WME)))) > > > > + if (!(m_ordered_rec_buffer= (uchar*) alloc_root(&m_ordered_root, > > > > alloc_len))) > > > > > > I don't see why you need a memroot here. memroot is needed when you > > > allocate later at difeerent points in time some initially unpredictable > > > amount of memory that has the same lifetime and needs to be > > > freed all at once. > > > > I know... This is just another convenient way of allocation with presumably > > minimized number of mallocs (probably pre_alloc_size should be tweaked to a > > better value). > > memroot wasn't created for that. There are approaches with much less > overhead Using memory pools like that is a nice design pattern for non-critical sections. At least it is nicer than manual pointer management. I believe I've seen the similar pattern in InnoDB with its memory pool mem_heap_t even without preallocation! It would be nice to have a special class that preallocates and manages pointers, but I do not see the reason to develop that for this specific use case. I don't believe in *much* overhead: mem_root is a simplest algorithm, a couple of ifs and one loop. Initialization call init_record_priority_queue() is not a critical section, the major execution is mostly busy by a data copy in handle_ordered_index_scan(), handle_ordered_next(). Let's see gprof stats collected for 100k SELECTs attached in stats.txt init_record_priority_queue() takes 0.3%; handle_ordered_index_scan() takes 2.3%; handle_ordered_next() takes 12.3%. Let's see unpatched stats for the same 100k SELECTs in stats_vanilla.txt init_record_priority_queue() takes 0.1%; handle_ordered_index_scan() takes 2.2%; handle_ordered_next() takes 11.4%. My version of init_record_priority_queue() lost 0.2%. From details we see that mem_root took 6 out of 13 which is 0.15%. One should keep in mind this is an artificial usage of the prio queue. In diversified real-world usage the share of such kind SELECT is say 0.1%. But even if it is 1% we divide our stats by 100. So, do you really care about 0.0015%? > > > > > > > But here it seems that you only allocate once. You could just make > > > > > > m_priority_queue_rec_len = m_rec_length + PARTITION_BYTES_IN_POS + > > > table->s->blob_fields * sizeof(Ordered_blob_storage) > > > > I don't wnat this data serialized. It is inconvenient to read the dump of > > the record. Why we should make it harder to code, read the code and debug > > while we can make it easier? If it is only for guaranteed 1 malloc, I don't > > think it's worth it. > > What do you mean "serialized"? A sort of memory organization when data is placed into a single contiguous buffer. > What do you mean "read the dump"? x command in GDB. > > Note that you can also use my_multi_malloc, that allocates many buffers > together and hides address arithmetcs from the caller. Like > > my_multi_malloc(MYF(MY_WME), > &m_ordered_rec_buffer, m_rec_length + PARTITION_BYTES_IN_POS, > &blob_storage, table->s->blob_fields * sizeof(Ordered_blob_storage), > NULL) > > (but with error checking). And then, perhaps, with placement new[] > you could construct all Ordered_blob_storage objects in one line. Sorry, I don't understand how my version is much worse. > > > > > @@ -6178,7 +6203,11 @@ int > > > > ha_partition::handle_ordered_index_scan(uchar *buf, bool reverse_order) > > > > */ > > > > error= file->read_range_first(m_start_key.key? &m_start_key: > > > > NULL, > > > > end_range, eq_range, TRUE); > > > > - memcpy(rec_buf_ptr, table->record[0], m_rec_length); > > > > + if (!error) > > > > + { > > > > + memcpy(rec_buf_ptr, table->record[0], m_rec_length); > > > > + } > > > > > > did you have a bug because of that? something didn't work? > > > > No, I just saw worthless memcpy and avoided it. > > You're right that memcopy is unnecessary in case of an error. > But most of the time there is no error. You made an exceptional code path > faster by making a common code path slower. I think it's generally > better to do the opposite, making the common, no-error code path > faster, fewer conditionals, in particular. At the cost of making the error > path slower, if needed. I'm usually aware of such things in my decisions. The slowdown is close to 0, it is not even 0.0001%, it is much lower. OTOH memcpy() slowdown is much higher and this is not a fully exceptional code path: HA_ERR_KEY_NOT_FOUND continues the loop. > > > > > @@ -6310,6 +6349,43 @@ int > > > > ha_partition::handle_ordered_index_scan_key_not_found() > > > > + if (cached) > > > > + { > > > > + cached->swap(s.blob); > > > > + s.set_read_value= set_read_value; > > > > > > is it indeed possible here for a value to be either in `value` or in > > > `read_value` ? > > > > > > When happens what? > > > > Yes, it uses `value` for plain fields and `read_value` for virtual fields. > > Oh, I see. > > This read_value was introduced in ea1b25046c81db8fdf to solve the case > of UPDATE, where a row is moved with store_record(table, record[1]) and > blob pointers aren't updated. This is very similar to what you're > fixing, perhaps read_value should go away and both bugs should have a > one unified solution that allows to "park" a record somewhere out of > record[0] and restore later, all with proper blob handling? Do you think this is necessary now? I believe there are a lot of *much* more important tasks. > > > > > > > > + } > > > > + } > > > > + } > > > > + table->move_fields(table->field, table->record[0], rec_buf); > > > > +} > > Regards, > Sergei > VP of MariaDB Server Engineering > and secur...@mariadb.org -- All the best, Aleksey Midenkov @midenok
0.02 0.13 100000/100000 ha_partition::index_init(unsigned int, bool) <cycle 3> [262] [289] 0.3 0.02 0.13 100000 ha_partition::init_record_priority_queue() [289] 0.00 0.04 800000/5602760 Sql_alloc::operator new(unsigned long, st_mem_root*) [176] 0.00 0.02 500000/15013727 alloc_root [109] 0.00 0.02 800000/800000 Ordered_blob_storage::Ordered_blob_storage() [883] 0.00 0.01 100000/100000 init_prealloc_root [1101] 0.00 0.01 100000/100003 init_queue [1120] 0.00 0.01 400000/10400882 bitmap_get_next_set [225] 0.00 0.00 100000/1700086 bitmap_get_first_set [691] 0.00 0.00 400000/218144857 _db_pargs_ [86] 0.00 0.00 100000/237505826 _db_return_ [42] 0.00 0.00 100000/237505806 _db_enter_ [43] 0.00 0.00 100000/200000 bitmap_bits_set [4743] 0.05 1.12 100000/100000 ha_partition::common_index_read(unsigned char*, bool) [82] [87] 2.3 0.05 1.12 100000 ha_partition::handle_ordered_index_scan(unsigned char*, bool) [87] 0.02 1.04 400000/400000 handler::read_range_first(st_key_range const*, st_key_range const*, bool, bool) [94] 0.01 0.02 300000/5800000 ha_partition::swap_blobs(unsigned char*, Ordered_blob_storage**, bool) [125] 0.00 0.01 100000/2900000 ha_partition::return_top_record(unsigned char*) [195] 0.00 0.01 399999/10400882 bitmap_get_next_set [225] 0.00 0.00 299999/5799999 ha_myisam::position(unsigned char const*) [456] 0.00 0.00 100000/100000 queue_fix [1668] 0.00 0.00 600000/218144857 _db_pargs_ [86] 0.00 0.00 100000/1700086 bitmap_get_first_set [691] 0.00 0.00 100000/237505826 _db_return_ [42] 0.00 0.00 100000/237505806 _db_enter_ [43] 0.00 0.00 100000/20700271 bitmap_is_set [766] 0.06 6.19 2899999/2899999 ha_partition::read_range_next() [26] [27] 12.3 0.06 6.19 2899999 ha_partition::handle_ordered_next(unsigned char*, bool) [27] 0.10 5.26 2900000/2900000 handler::read_range_next() [36] 0.02 0.26 2800000/2900000 ha_partition::return_top_record(unsigned char*) [195] 0.07 0.17 2600000/5800000 ha_partition::swap_blobs(unsigned char*, Ordered_blob_storage**, bool) [125] 0.07 0.03 2600000/2700000 _downheap [397] 0.03 0.04 300000/300000 queue_remove [486] 0.02 0.03 2900000/237505826 _db_return_ [42] 0.02 0.03 2899999/237505806 _db_enter_ [43] 0.02 0.02 2600000/5799999 ha_myisam::position(unsigned char const*) [456] 0.01 0.01 2800000/218144857 _db_pargs_ [86]
0.01 0.04 100000/100000 ha_partition::index_init(unsigned int, bool) <cycle 3> [417] [544] 0.1 0.01 0.04 100000 ha_partition::init_record_priority_queue() [544] 0.01 0.01 100000/100003 init_queue [853] 0.01 0.00 400000/10000879 bitmap_get_next_set [246] 0.00 0.01 100000/4010970 my_malloc [187] 0.00 0.00 400000/215344917 _db_pargs_ [84] 0.00 0.00 100000/236005936 _db_enter_ [42] 0.00 0.00 100000/236005953 _db_return_ [45] 0.00 0.00 100000/1600086 bitmap_get_first_set [847] 0.00 0.00 100000/200000 bitmap_bits_se 0.03 1.06 100000/100000 ha_partition::common_index_read(unsigned char*, bool) [90] [92] 2.2 0.03 1.06 100000 ha_partition::handle_ordered_index_scan(unsigned char*, bool) [92] 0.00 1.04 400000/400000 handler::read_range_first(st_key_range const*, st_key_range const*, bool, bool) [98] 0.01 0.00 400000/10000879 bitmap_get_next_set [246] 0.00 0.00 300000/5800000 ha_myisam::position(unsigned char const*) [416] 0.00 0.00 600000/215344917 _db_pargs_ [84] 0.00 0.00 100000/100000 queue_fix [1724] 0.00 0.00 100000/236005936 _db_enter_ [42] 0.00 0.00 100000/236005953 _db_return_ [45] 0.00 0.00 100000/1600086 bitmap_get_first_set [847] 0.00 0.00 100000/2900000 ha_partition::return_top_record(unsigned char*) [1207] 0.00 0.00 100000/9100265 bitmap_is_set [870] 0.13 5.46 2900000/2900000 ha_partition::read_range_next() [29] [30] 11.4 0.13 5.46 2900000 ha_partition::handle_ordered_next(unsigned char*, bool) [30] 0.05 5.19 2900000/2900000 handler::read_range_next() [35] 0.03 0.03 2600000/2700000 _downheap [516] 0.02 0.03 2900000/236005936 _db_enter_ [42] 0.01 0.03 2900000/236005953 _db_return_ [45] 0.00 0.03 2600000/5800000 ha_myisam::position(unsigned char const*) [416] 0.01 0.01 300000/300000 queue_remove [834] 0.01 0.01 2800000/215344917 _db_pargs_ [84] 0.01 0.00 2800000/2900000 ha_partition::return_top_record(unsigned char*) [1207]
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp