> -----Original Message-----
> From: Hyrum K Wright [mailto:hy...@hyrumwright.org]
> Sent: maandag 16 mei 2011 11:18
> To: Bert Huijben
> Cc: Branko Čibej; dev@subversion.apache.org
> Subject: Re: SQLite and the LIKE operator
> 
> On Mon, May 16, 2011 at 8:28 AM, Bert Huijben <b...@qqmail.nl> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Hyrum K Wright [mailto:hy...@hyrumwright.org]
> >> Sent: maandag 16 mei 2011 9:39
> >> To: Branko Čibej
> >> Cc: dev@subversion.apache.org
> >> Subject: Re: SQLite and the LIKE operator
> >>
> >> 2011/5/16 Branko Čibej <br...@e-reka.si>:
> >> > On 16.05.2011 03:13, Hyrum K Wright wrote:
> >> >> Several places in wc_db we use the following pattern to select all
> >> >> nodes with a common tree ancestor:
> >> >>  WHERE wc_id = ?1 AND (local_relpath = ?2 OR local_relpath LIKE ?3
> >> ESCAPE '#')
> >> >>
> >> >> While this works, there was some concern about whether or not
> SQLite
> >> >> was using the proper indicies when executing this query.  By examining
> >> >> the output for 'EXPLAIN QUERY PLAN' on some of the relevant SELECT
> >> >> statements, I believe it does use the indicies as intended.
> >> >>
> >> >> However, I stumbled across an alternate implementation which I
> believe
> >> >> has some merit.  Instead of the above clause, we could use:
> >> >>  WHERE wc_id = ?1 AND substr(local_relpath, 1, length(?2)) = ?2
> >
> > This also needs a table scan, as SQLite can't look through the substr to 
> > find
> that it can use the index for the result.
> 
> The SQLite query analyzer states that this executes a SEARCH, not a
> SCAN, which indicates the use of the index.
> 
> > My guess is that
> > WHERE wc_id = ?1 AND ((local_relpath = ?2) OR (local_relpath > ?2 || '/'
> AND local_relpath < ?2 || '0'))
> > is most likely the most efficient form we can get in SQLite as the constant
> string directly map to an index, but we should really create a few tests to
> verify these guesses.
> 
> I haven't done any tests, either, but I'm interested in an expression
> which doesn't require us to construct an additional parameter to the
> SQL query in C.
> 
> > I'm going to the Elego office now. See you there? :)

Okay, the numbers are in... I wrote a test on a DB with more than 1 million 
nodes.
1024 directories, with 1024 files. 1024 files added to the root dir and before 
I started there was the normal greek tree, so that is in the same db

-- STMT_SELECT_LIKE_ESCAPED
 SELECT local_relpath FROM NODES
 WHERE wc_id=?1 AND op_depth=0
   AND ((local_relpath=?2)
        OR (local_relpath LIKE ?3 ESCAPE '#'))

-- STMT_SELECT_LIKE_NO_ESCAPE
 SELECT local_relpath FROM NODES
 WHERE wc_id=?1 AND op_depth=0
   AND ((local_relpath=?2)
        OR (local_relpath LIKE ?3))

-- STMT_SELECT_SUBSTR
SELECT local_relpath FROM NODES
 WHERE wc_id=?1 AND op_depth=0
   AND ((local_relpath=?2)
        OR (SUBSTR(local_relpath, 1, LENGTH(?2)+1) = ?2 || '/'))

-- STMT_SELECT_BETWEEN
SELECT local_relpath FROM NODES
 WHERE wc_id=?1 AND op_depth=0
   AND ((local_relpath=?2)
        OR (local_relpath > ?2 || '/' AND local_relpath < ?2 || '0'))

-- STMT_SELECT_BETWEEN2
SELECT local_relpath FROM NODES
 WHERE wc_id=?1 AND op_depth=0
   AND ((local_relpath=?2)
        OR (local_relpath > ?3 AND local_relpath < ?4))

All the queries returned the same 1025 answers when I looked for all nodes at 
and below "300".

I ran the tests 100 times to remove some jitter and the results are (in 
microseconds, on my Windows 7 notebook, SQLite 3.7.5):
STMT_SELECT_LIKE_ESCAPED:  743252
STMT_SELECT_LIKE_NO_ESCAPE: 683169
STMT_SELECT_SUBSTR: 885900
STMT_SELECT_BETWEEN: 555341
STMT_SELECT_BETWEEN2: 557631

So the select with > and < is the fastest query style. (Not surprising as 
SQLite in some cases optimizes like to this).

Generating the compare keys in SQLite is slightly faster than using static c 
strings. So creating them in cwould be even slower.

I added the rough patch I used to create these results to this mail. (I don't 
intend to commit it in this style)

        Bert
> 
> Yes. :)
> 
> -Hyrum
Index: dev/build.conf
===================================================================
--- dev/build.conf      (revision 1103708)
+++ dev/build.conf      (working copy)
@@ -396,6 +396,11 @@ type = sql-header
 path = subversion/libsvn_subr
 sources = internal_statements.sql
 
+[db_test_statements]
+description = Some test sql for the db tests
+type = sql-header
+path = subversion/tests/libsvn_wc
+sources = db_test_statements.sql
 
 # ----------------------------------------------------------------------------
 #
Index: dev/subversion/tests/libsvn_wc/db-test.c
===================================================================
--- dev/subversion/tests/libsvn_wc/db-test.c    (revision 1103708)
+++ dev/subversion/tests/libsvn_wc/db-test.c    (working copy)
@@ -47,7 +47,6 @@
 
 #include "../svn_test.h"
 
-
 #define ROOT_ONE "http://example.com/one";
 #define ROOT_TWO "http://example.com/two";
 #define ROOT_THREE "http://example.com/three";
@@ -318,7 +317,13 @@ static const char * const TESTING_DATA = (
 
 WC_QUERIES_SQL_DECLARE_STATEMENTS(statements);
 
+#define SVN_DB_TESTS__SQLITE_PERFORMANCE
+#ifdef SVN_DB_TESTS__SQLITE_PERFORMANCE
+#include "db_test_statements.h"
+DB_TEST_STATEMENTS_SQL_DECLARE_STATEMENTS(test_statements);
+#endif
 
+
 static svn_error_t *
 create_fake_wc(const char *subdir, int format, apr_pool_t *scratch_pool)
 {
@@ -1579,6 +1584,214 @@ test_externals_store(apr_pool_t *pool)
   return SVN_NO_ERROR;
 }
 
+static svn_error_t *
+insert_1M_paths(void *baton, svn_sqlite__db_t *sdb, apr_pool_t *scratch_pool)
+{
+  svn_sqlite__stmt_t *stmt;
+  apr_pool_t *iterpool = svn_pool_create(scratch_pool);
+  apr_int64_t wc_id = 1;
+  int i;
+  
+  SVN_ERR(svn_sqlite__get_statement(&stmt, sdb, STMT_INSERT_STUPID_NODE));
+
+  for (i = 0; i < 1024; i++)
+    {
+      const char *dir_relpath;
+      const char *file_relpath;
+      int j;
+
+      svn_pool_clear(iterpool);
+      dir_relpath = apr_psprintf(iterpool, "%d", i);
+      file_relpath = apr_psprintf(iterpool, "%d_file", i);
+
+      SVN_ERR(svn_sqlite__bindf(stmt, "iss",
+                                wc_id,
+                                dir_relpath,
+                                ""));
+      SVN_ERR(svn_sqlite__insert(NULL, stmt));
+
+      SVN_ERR(svn_sqlite__bindf(stmt, "iss",
+                                wc_id,
+                                file_relpath,
+                                ""));
+      SVN_ERR(svn_sqlite__insert(NULL, stmt));
+
+      for (j = 0; j < 1024; j++)
+        {
+          const char *item_name = apr_psprintf(iterpool, "%d", j);
+          const char *item_relpath = svn_relpath_join(dir_relpath,
+                                                      item_name, iterpool);
+
+          SVN_ERR(svn_sqlite__bindf(stmt, "iss",
+                                    wc_id,
+                                    item_relpath,
+                                    dir_relpath));
+
+          SVN_ERR(svn_sqlite__insert(NULL, stmt));
+        }
+    }
+  return SVN_NO_ERROR;
+}
+
+static svn_error_t *
+test_sqlite_select_performance(apr_pool_t *pool)
+{
+  svn_wc__db_t *db;
+  svn_sqlite__db_t *sdb;
+  svn_sqlite__stmt_t *stmt;
+  const char *local_abspath;
+  apr_int64_t wc_id=1;
+  apr_time_t t_init, t_inserted, t_0, t_1, t_2, t_3, t_4, t_5;
+  apr_int64_t s_1=0, s_2=0, s_3=0, s_4=0, s_5=0;
+  int i;
+
+  SVN_ERR(create_open(&db, &local_abspath,
+                      "sqlite_select_performance", SVN_WC__VERSION, pool));
+
+  /*SVN_ERR(svn_wc__db_init(db, local_abspath, "", "htp://q", "uui-d", 0,
+                          svn_depth_infinity, pool));*/
+
+  SVN_DBG(("Path = %s\n", svn_dirent_local_style(local_abspath, pool)));
+  SVN_ERR(svn_sqlite__open(&sdb,
+                           svn_dirent_join(local_abspath, ".svn/wc.db", pool),
+                           svn_sqlite__mode_readwrite, test_statements,
+                           0, NULL, pool, pool));
+
+  apr_time_clock_hires(pool);
+
+  t_init = apr_time_now();
+  SVN_ERR(svn_sqlite__with_lock(sdb, insert_1M_paths, NULL, pool));
+
+  t_inserted = apr_time_now();
+
+  for(i = 0; i < 100; i++)
+  {
+    t_0 = apr_time_now();
+  {
+    int n = -1;
+    svn_boolean_t have_row;
+
+    SVN_ERR(svn_sqlite__get_statement(&stmt, sdb,
+                                      STMT_SELECT_LIKE_ESCAPED));
+    SVN_ERR(svn_sqlite__bindf(stmt, "iss", wc_id, "300", "300/%"));
+
+    do
+    {
+      n++;
+      SVN_ERR(svn_sqlite__step(&have_row, stmt));
+    }
+    while (have_row);
+
+    SVN_TEST_ASSERT(n == 1025);
+    SVN_ERR(svn_sqlite__reset(stmt));
+  }
+  t_1 = apr_time_now();
+
+  {
+    int n = -1;
+    svn_boolean_t have_row;
+
+    SVN_ERR(svn_sqlite__get_statement(&stmt, sdb,
+                                      STMT_SELECT_LIKE_NO_ESCAPE));
+    SVN_ERR(svn_sqlite__bindf(stmt, "iss", wc_id, "300", "300/%"));
+
+    do
+    {
+      n++;
+      SVN_ERR(svn_sqlite__step(&have_row, stmt));
+    }
+    while (have_row);
+
+    SVN_TEST_ASSERT(n == 1025);
+    SVN_ERR(svn_sqlite__reset(stmt));
+  }
+  t_2 = apr_time_now();
+
+  {
+    int n = -1;
+    svn_boolean_t have_row;
+
+    SVN_ERR(svn_sqlite__get_statement(&stmt, sdb,
+                                      STMT_SELECT_SUBSTR));
+    SVN_ERR(svn_sqlite__bindf(stmt, "is", wc_id, "300"));
+
+    do
+    {
+      n++;
+      SVN_ERR(svn_sqlite__step(&have_row, stmt));
+    }
+    while (have_row);
+
+    SVN_TEST_ASSERT(n == 1025);
+    SVN_ERR(svn_sqlite__reset(stmt));
+  }
+  t_3 = apr_time_now();
+
+  {
+    int n = -1;
+    svn_boolean_t have_row;
+
+    SVN_ERR(svn_sqlite__get_statement(&stmt, sdb,
+                                      STMT_SELECT_BETWEEN));
+    SVN_ERR(svn_sqlite__bindf(stmt, "is", wc_id, "300"));
+
+    do
+    {
+      n++;
+      SVN_ERR(svn_sqlite__step(&have_row, stmt));
+    }
+    while (have_row);
+
+    SVN_TEST_ASSERT(n == 1025);
+    SVN_ERR(svn_sqlite__reset(stmt));
+  }
+  t_4 = apr_time_now();
+
+  {
+    int n = -1;
+    svn_boolean_t have_row;
+
+    SVN_ERR(svn_sqlite__get_statement(&stmt, sdb,
+                                      STMT_SELECT_BETWEEN2));
+    SVN_ERR(svn_sqlite__bindf(stmt, "isss", wc_id, "300", "300/", "3000"));
+
+    do
+    {
+      n++;
+      SVN_ERR(svn_sqlite__step(&have_row, stmt));
+    }
+    while (have_row);
+
+    SVN_TEST_ASSERT(n == 1025);
+    SVN_ERR(svn_sqlite__reset(stmt));
+  }
+  t_5 = apr_time_now();
+
+  s_1 += (t_1 - t_0);
+  s_2 += (t_2 - t_1);
+  s_3 += (t_3 - t_2);
+  s_4 += (t_4 - t_3);
+  s_5 += (t_5 - t_4);
+
+  SVN_DBG(("%d: %d, %d, %d, %d, %d\n", (int)(t_inserted - t_init),
+                                   (int)(t_1 - t_0),
+                                   (int)(t_2 - t_1),
+                                   (int)(t_3 - t_2),
+                                   (int)(t_4 - t_3),
+                                   (int)(t_5 - t_4)));
+  }
+
+  SVN_DBG(("AVG: %d, %d, %d, %d, %d\n",
+                                   (int)(s_1 / 100),
+                                   (int)(s_2 / 100),
+                                   (int)(s_3 / 100),
+                                   (int)(s_4 / 100),
+                                   (int)(s_5 / 100)));
+
+  return SVN_NO_ERROR;
+}
+
+
 struct svn_test_descriptor_t test_funcs[] =
   {
     SVN_TEST_NULL,
@@ -1604,5 +1817,9 @@ struct svn_test_descriptor_t test_funcs[] =
                    "work queue processing"),
     SVN_TEST_PASS2(test_externals_store,
                    "externals store"),
+#ifdef SVN_DB_TESTS__SQLITE_PERFORMANCE
+    SVN_TEST_PASS2(test_sqlite_select_performance,
+                   "sqlite select performance scenarios"),
+#endif
     SVN_TEST_NULL
   };
Index: dev/subversion/tests/libsvn_wc/db_test_statements.sql
===================================================================
--- dev/subversion/tests/libsvn_wc/db_test_statements.sql       (revision 0)
+++ dev/subversion/tests/libsvn_wc/db_test_statements.sql       (revision 0)
@@ -0,0 +1,57 @@
+/* db_test_statements.sql -- queries use by the WC-DB test application
+ *     This is intended for use with SQLite 3
+ *
+ * ====================================================================
+ *    Licensed to the Apache Software Foundation (ASF) under one
+ *    or more contributor license agreements.  See the NOTICE file
+ *    distributed with this work for additional information
+ *    regarding copyright ownership.  The ASF licenses this file
+ *    to you under the Apache License, Version 2.0 (the
+ *    "License"); you may not use this file except in compliance
+ *    with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *    Unless required by applicable law or agreed to in writing,
+ *    software distributed under the License is distributed on an
+ *    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ *    KIND, either express or implied.  See the License for the
+ *    specific language governing permissions and limitations
+ *    under the License.
+ * ====================================================================
+ */
+
+-- STMT_INSERT_STUPID_NODE
+INSERT INTO NODES
+  (wc_id, local_relpath, parent_relpath, presence, op_depth, kind)
+VALUES (?1, ?2, ?3, 'normal', 0, 'dir')
+
+-- STMT_SELECT_LIKE_ESCAPED
+ SELECT local_relpath FROM NODES
+ WHERE wc_id=?1 AND op_depth=0
+   AND ((local_relpath=?2)
+        OR (local_relpath LIKE ?3 ESCAPE '#'))
+
+-- STMT_SELECT_LIKE_NO_ESCAPE
+ SELECT local_relpath FROM NODES
+ WHERE wc_id=?1 AND op_depth=0
+   AND ((local_relpath=?2)
+        OR (local_relpath LIKE ?3))
+
+-- STMT_SELECT_SUBSTR
+SELECT local_relpath FROM NODES
+ WHERE wc_id=?1 AND op_depth=0
+   AND ((local_relpath=?2)
+        OR (SUBSTR(local_relpath, 1, LENGTH(?2)+1) = ?2 || '/'))
+
+-- STMT_SELECT_BETWEEN
+SELECT local_relpath FROM NODES
+ WHERE wc_id=?1 AND op_depth=0
+   AND ((local_relpath=?2)
+        OR (local_relpath > ?2 || '/' AND local_relpath < ?2 || '0'))
+
+-- STMT_SELECT_BETWEEN2
+SELECT local_relpath FROM NODES
+ WHERE wc_id=?1 AND op_depth=0
+   AND ((local_relpath=?2)
+        OR (local_relpath > ?3 AND local_relpath < ?4))
\ No newline at end of file
Index: dev/subversion/include/private/svn_sqlite.h
===================================================================
--- dev/subversion/include/private/svn_sqlite.h (revision 1103708)
+++ dev/subversion/include/private/svn_sqlite.h (working copy)
@@ -113,7 +113,7 @@ svn_sqlite__set_schema_version(svn_sqlite__db_t *d
    The statements will be finalized and the SQLite database will be closed
    when RESULT_POOL is cleaned up. */
 svn_error_t *
-svn_sqlite__open(svn_sqlite__db_t **db, const char *repos_path,
+svn_sqlite__open(svn_sqlite__db_t **db, const char *path,
                  svn_sqlite__mode_t mode, const char * const statements[],
                  int latest_schema, const char * const *upgrade_sql,
                  apr_pool_t *result_pool, apr_pool_t *scratch_pool);
Index: dev/subversion/libsvn_client/externals.c
Index: dev/build/transform_sql.py
===================================================================
--- dev/build/transform_sql.py  (revision 1103708)
+++ dev/build/transform_sql.py  (working copy)
@@ -53,31 +53,31 @@ class Processor(object):
   re_include = re.compile('-- *include: *([-a-z]+)')
   re_define = re.compile('-- *define: *([A-Z_0-9]+)')
 
-  def _sub_format(self, match):
+  def _sub_format(self, match, prefix):
     vsn = match.group(1)
 
     self.close_define()
     self.output.write('#define %s_%s \\\n' % (self.var_name, match.group(1)))
     self.var_printed = True
 
-  def _sub_statement(self, match):
+  def _sub_statement(self, match, prefix):
     name = match.group(1)
 
     self.close_define()
     self.output.write('#define STMT_%s %d\n' % (match.group(1),
                                                 self.stmt_count))
-    self.output.write('#define STMT_%d \\\n' % (self.stmt_count,))
+    self.output.write('#define STMT_%s_%d \\\n' % (prefix, self.stmt_count,))
     self.var_printed = True
 
     self.stmt_count += 1
 
-  def _sub_include(self, match):
+  def _sub_include(self, match, prefix):
     filepath = os.path.join(self.dirpath, match.group(1) + '.sql')
 
     self.close_define()
-    self.process_file(open(filepath).read())
+    self.process_file(open(filepath).read(), prefix)
 
-  def _sub_define(self, match):
+  def _sub_define(self, match, prefix):
     define = match.group(1)
 
     self.output.write('  APR_STRINGIFY(%s) \\\n' % define)
@@ -97,7 +97,7 @@ class Processor(object):
         self.re_define      : self._sub_define,
       }
 
-  def process_file(self, input):
+  def process_file(self, input, prefix):
     input = self.re_comments.sub('', input)
 
     for line in input.split('\n'):
@@ -109,7 +109,7 @@ class Processor(object):
         for regex, handler in self._directives.iteritems():
           match = regex.match(line)
           if match:
-            handler(match)
+            handler(match, prefix)
             handled = True
             break
 
@@ -141,6 +141,8 @@ def main(input_filepath, output):
 
   var_name = re.sub('[-.]', '_', filename).upper()
 
+  prefix = re.sub('[^A-Z]','', var_name)
+
   output.write(
     '/* This file is automatically generated from %s.\n'
     ' * Do not edit this file -- edit the source and rerun gen-make.py */\n'
@@ -148,7 +150,7 @@ def main(input_filepath, output):
     % (filename,))
 
   proc = Processor(os.path.dirname(input_filepath), output, var_name)
-  proc.process_file(input)
+  proc.process_file(input, prefix)
 
   ### the STMT_%d naming precludes *multiple* transform_sql headers from
   ### being used within the same .c file. for now, that's more than fine.
@@ -159,7 +161,8 @@ def main(input_filepath, output):
     output.write(
       '#define %s_DECLARE_STATEMENTS(varname) \\\n' % (var_name,)
       + '  static const char * const varname[] = { \\\n'
-      + ', \\\n'.join('    STMT_%d' % (i,) for i in range(proc.stmt_count))
+      + ', \\\n'.join('    STMT_%s_%d' % (prefix, i,)
+                      for i in range(proc.stmt_count))
       + ', \\\n    NULL \\\n  }\n')
 
 

Reply via email to