Hi, Julian,

You wrote:

> Hi Markus.  I just noticed there doesn't seem to be a patch attached to your 
> last email.
I did re-send the mail with the patch attached, but maybe that duplicate 
got filtered by some (mental) spam filters.

Here's it again:

Summary of test results:
  1619 tests PASSED
  85 tests SKIPPED
  38 tests XFAILED (1 WORK-IN-PROGRESS)

[[
Optimize merge_file_trivial() by avoiding to read the files twice by using a
new comparison function which compares 3 files at once.

* subversion/include/svn_io.h
  (svn_io_filesizes_three_different_p): Add new declaration
  (svn_io_files_contents_three_same_p): Add new declaration

* subversion/libsvn_subr/io.c
  (svn_io_filesizes_three_different_p): Add new function in analogy to 
svn_io_filesizes_different_p().
  (contents_three_identical_p): Add new function in analogy to 
contents_identical_p().
  (svn_io_files_contents_three_same_p): Add new function in analogy to 
svn_io_files_contents_same_p.

* subversion/libsvn_wc/merge.c
  (merge_file_trivial): Use the new three-file comparison functions to avoid 
reading files twice.

Patch by: Markus Schaber <m.scha...@3s-software.com>
]]

I also intend to add test cases for the new as well as the existing filesizes 
and files_contents functions, but it seems I could need some help to get 
kickstarted.

Best regards

Markus Schaber
--

We software Automation.

3S-Smart Software Solutions GmbH
Markus Schaber | Developer
Memminger Str. 151 | 87439 Kempten | Germany | Tel. +49-831-54031-0 | Fax 
+49-831-54031-50

Email: m.scha...@3s-software.com | Web: http://www.3s-software.com
CoDeSys internet forum: http://forum.3s-software.com
Download CoDeSys sample projects: 
http://www.3s-software.com/index.shtml?sample_projects

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade 
register: Kempten HRB 6186 | Tax ID No.: DE 167014915

________________________________________
Von: Julian Foad [julianf...@btopenworld.com]
Gesendet: Freitag, 15. Juni 2012 11:14
Bis: Markus Schaber
Cc: dev@subversion.apache.org
Betreff: Re: AW: [PATCH]: Optimize merge_file_trivial()

Markus Schaber wrote:

> Here's the second round. I hope I did catch all issues.

Hi Markus.  I just noticed there doesn't seem to be a patch attached to your 
last email.

> As there were no comments about the names and the parameter order, I guess
> they're okay.

I think they are OK.

- Julian


> The tests still look well:
>
> Summary of test results:
>   1619 tests PASSED
>   85 tests SKIPPED
>   38 tests XFAILED (1 WORK-IN-PROGRESS)
>
> [[
> Optimize merge_file_trivial() by avoiding to read the files twice by using a
> new comparison function which compares 3 files at once.
>
> * subversion/include/svn_io.h
>   (svn_io_filesizes_three_different_p): Add new declaration
>   (svn_io_files_contents_three_same_p): Add new declaration
>
> * subversion/libsvn_subr/io.c
>   (svn_io_filesizes_three_different_p): Add new function in analogy to
> svn_io_filesizes_different_p().
>   (contents_three_identical_p): Add new function in analogy to
> contents_identical_p().
>   (svn_io_files_contents_three_same_p): Add new function in analogy to
> svn_io_files_contents_same_p.
>
> * subversion/libsvn_wc/merge.c
>   (merge_file_trivial): Use the new three-file comparison functions to avoid
> reading files twice.
>
> Patch by: Markus Schaber <m.scha...@3s-software.com>
> ]]
>
>
> Best regards
>
> Markus Schaber
> --
>
> We software Automation.
>
> 3S-Smart Software Solutions GmbH
> Markus Schaber | Developer
> Memminger Str. 151 | 87439 Kempten | Germany | Tel. +49-831-54031-0 | Fax
> +49-831-54031-50
>
> Email: m.scha...@3s-software.com | Web: http://www.3s-software.com
> CoDeSys internet forum: http://forum.3s-software.com
> Download CoDeSys sample projects:
> http://www.3s-software.com/index.shtml?sample_projects
>
> Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade
> register: Kempten HRB 6186 | Tax ID No.: DE 167014915
>
> ________________________________________
> Von: Julian Foad [julianf...@btopenworld.com]
> Gesendet: Mittwoch, 13. Juni 2012 16:52
> Bis: Markus Schaber
> Cc: dev@subversion.apache.org
> Betreff: Re: AW: [PATCH]: Optimize merge_file_trivial() (Was: Fix issue #4128)
>
> Markus Schaber wrote:
>
>>>  Please use 'TRUE' and 'FALSE' for Boolean values in the
> code and in the doc strings, not zero and non-zero.
>>
>>  I did just copy this from the 2-file functions that existed. I'll
> update the patch accordingly.
>
>
> Ah, I see.  I've fixed those in r1349889.
>
> As for auto source formatting, I have settings for the Vim text editor, but I
> don't know about VS or a stand-alone tool.
>
> - Julian
>
>
>>>  Please replace 'rsp' with 'and' or 'or' in the
> doc strings.  See
> <http://www.transblawg.eu/index.php?/archives/870-Resp.-and-other-non-existent-English-wordsNicht-existente-englische-Woerter.html> 
> :-)
>>
>>>  Please don't initialize variables that are going to be
> unconditionally initialized later: variables 'file1_h',
> 'file2_h', 'file3_h' in contents_three_identical_p().
>>
>>  I'll fix both issues, too.
>>
>>>  Please make all the indentation consistent, and put operators on the
> beginning of each continuation line indented to just inside the relevant 
> opening
> parenthesis, so instead of this:
>>
>>  Is there some reformatter script or (even better) visual studio plugin that
> implements all the formatting rules?
>>
>>  Best regards
>>
>>  Markus Schaber
>
Index: subversion/include/svn_io.h
===================================================================
--- subversion/include/svn_io.h (revision 1349990)
+++ subversion/include/svn_io.h (working copy)
@@ -607,6 +607,25 @@
                              const char *file2,
                              apr_pool_t *pool);
 
+/** Set @a *different_p12 to non-zero if @a file1 and @a file2 have different
+ * sizes, else set to zero. Do the similar for @a *different_p23 with
+ * @a file2 and @a file3, and @a *different_p13 for @a file1 and @a file3.
+ * All three of @a file1, @a file2 and @a file3 are utf8-encoded.
+ *
+ * Setting @a *different_p12 to zero does not mean the files definitely
+ * have the same size, it merely means that the sizes are not
+ * definitely different.  That is, if the size of one or both files
+ * cannot be determined, then the sizes are not known to be different,
+ * so @a *different_p12 is set to 0.
+ */
+svn_error_t *
+svn_io_filesizes_three_different_p(svn_boolean_t *different_p12,
+                             svn_boolean_t *different_p23,
+                             svn_boolean_t *different_p13,
+                             const char *file1,
+                             const char *file2,
+                             const char *file3,
+                             apr_pool_t *pool);
 
 /** Return in @a *checksum the checksum of type @a kind of @a file
  * Use @a pool for temporary allocations and to allocate @a *checksum.
@@ -642,6 +661,20 @@
                              const char *file2,
                              apr_pool_t *pool);
 
+/** Set @a *same12 to TRUE if @a file1 and @a file2 have the same
+ * contents, else set it to FALSE. Do the similar for @a *same23 
+ * with @a file2 and @a file3, and @a *same13 for @a file1 and @a 
+ * file3. Use @a pool for temporary allocations.
+ */
+svn_error_t *
+svn_io_files_contents_three_same_p(svn_boolean_t *same12,
+                             svn_boolean_t *same23,
+                             svn_boolean_t *same13,
+                             const char *file1,
+                             const char *file2,
+                             const char *file3,
+                             apr_pool_t *pool);
+
 /** Create file at utf8-encoded @a file with contents @a contents.
  * @a file must not already exist.
  * Use @a pool for memory allocations.
Index: subversion/libsvn_subr/io.c
===================================================================
--- subversion/libsvn_subr/io.c (revision 1349990)
+++ subversion/libsvn_subr/io.c (working copy)
@@ -1311,6 +1311,43 @@
 
 
 svn_error_t *
+svn_io_filesizes_three_different_p(svn_boolean_t *different_p12,
+                             svn_boolean_t *different_p23,
+                             svn_boolean_t *different_p13,
+                             const char *file1,
+                             const char *file2,
+                             const char *file3,
+                             apr_pool_t *pool)
+{
+  apr_finfo_t finfo1, finfo2, finfo3;
+  apr_status_t status1, status2, status3;
+  const char *file1_apr, *file2_apr, *file3_apr;
+
+  /* Not using svn_io_stat() because don't want to generate
+     svn_error_t objects for non-error conditions. */
+
+  SVN_ERR(cstring_from_utf8(&file1_apr, file1, pool));
+  SVN_ERR(cstring_from_utf8(&file2_apr, file2, pool));
+  SVN_ERR(cstring_from_utf8(&file3_apr, file3, pool));
+
+  /* Stat all three files */
+  status1 = apr_stat(&finfo1, file1_apr, APR_FINFO_MIN, pool);
+  status2 = apr_stat(&finfo2, file2_apr, APR_FINFO_MIN, pool);  
+  status3 = apr_stat(&finfo3, file3_apr, APR_FINFO_MIN, pool);
+
+  /* If we got an error stat'ing a file, it could be because the
+     file was removed... or who knows.  Whatever the case, we
+     don't know if the filesizes are definitely different, so
+     assume that they're not. */
+  *different_p12 = !status1 && !status2 && finfo1.size != finfo2.size;
+  *different_p23 = !status2 && !status3 && finfo2.size != finfo3.size;
+  *different_p13 = !status1 && !status3 && finfo1.size != finfo3.size;
+
+  return SVN_NO_ERROR;
+}
+
+
+svn_error_t *
 svn_io_file_checksum2(svn_checksum_t **checksum,
                       const char *file,
                       svn_checksum_kind_t kind,
@@ -4072,6 +4109,130 @@
 
 
 
+/* Do a byte-for-byte comparison of FILE1, FILE2 and FILE3. */
+static svn_error_t *
+contents_three_identical_p(svn_boolean_t *identical_p12,
+                           svn_boolean_t *identical_p23,
+                           svn_boolean_t *identical_p13,
+                           const char *file1,
+                           const char *file2,
+                           const char *file3,
+                           apr_pool_t *pool)
+{
+  svn_error_t *err;
+  apr_size_t bytes_read1, bytes_read2, bytes_read3;
+  char *buf1 = apr_palloc(pool, SVN__STREAM_CHUNK_SIZE);
+  char *buf2 = apr_palloc(pool, SVN__STREAM_CHUNK_SIZE);  
+  char *buf3 = apr_palloc(pool, SVN__STREAM_CHUNK_SIZE);
+  apr_file_t *file1_h;
+  apr_file_t *file2_h;
+  apr_file_t *file3_h;
+  svn_boolean_t eof1 = FALSE;
+  svn_boolean_t eof2 = FALSE;
+  svn_boolean_t eof3 = FALSE;
+
+  SVN_ERR(svn_io_file_open(&file1_h, file1, APR_READ, APR_OS_DEFAULT,
+                           pool));
+
+  err = svn_io_file_open(&file2_h, file2, APR_READ, APR_OS_DEFAULT,
+                         pool);
+
+  if (err)
+    return svn_error_trace(
+               svn_error_compose_create(err,
+                                        svn_io_file_close(file1_h, pool)));
+
+  err = svn_io_file_open(&file3_h, file3, APR_READ, APR_OS_DEFAULT,
+                         pool);
+
+  if (err)    
+      return svn_error_trace(
+               svn_error_compose_create(
+                    err,
+                    svn_error_compose_create(svn_io_file_close(file1_h, pool),
+                                             svn_io_file_close(file2_h, 
pool))));
+
+  /* assume TRUE, until disproved below */
+  *identical_p12 = *identical_p23 = *identical_p13 = TRUE;  
+  /* We need to read as long as no error occurs, and as long as one of the
+   * flags could still change due to a read operation */
+  while (!err 
+        && (*identical_p12 && !eof1 && !eof2) 
+        || (*identical_p23 && !eof2 && !eof3) 
+        || (*identical_p13 && !eof1 && !eof3))
+    {
+      /* As long as a file is not at the end yet, and it is still
+       * potentially identical to another file, we read the next chunk.*/
+      if (!eof1 && (identical_p12 || identical_p13))
+        {
+          err = svn_io_file_read_full2(file1_h, buf1,
+                                   SVN__STREAM_CHUNK_SIZE, &bytes_read1,
+                                   &eof1, pool);
+          if (err)
+              break;
+        }
+
+      if (!eof2 && (identical_p12 || identical_p23))
+        {
+          err = svn_io_file_read_full2(file2_h, buf2,
+                                   SVN__STREAM_CHUNK_SIZE, &bytes_read2,
+                                   &eof2, pool);
+          if (err)
+              break;
+        }
+
+      if (!eof3 && (identical_p13 || identical_p23))
+        {
+          err = svn_io_file_read_full2(file3_h, buf3,
+                                   SVN__STREAM_CHUNK_SIZE, &bytes_read3,
+                                   &eof3, pool);
+          if (err)
+              break;
+        }
+
+      /* If the files are still marked identical, and at least one of them
+       * is not at the end of file, we check whether they differ, and set
+       * their flag to false then. */
+      if (*identical_p12 
+          && !(eof1 && eof2)
+          && ((eof1 != eof2) 
+              || (bytes_read1 != bytes_read2)
+              || memcmp(buf1, buf2, bytes_read1)))
+        {
+          *identical_p12 = FALSE;
+        }
+
+      if (*identical_p23 
+          && !(eof2 && eof3) 
+          && ((eof2 != eof3)
+              || (bytes_read2 != bytes_read3)
+              || memcmp(buf2, buf3, bytes_read2)))
+        {
+          *identical_p23 = FALSE;
+        }
+
+      if (*identical_p13 
+          && !(eof1 && eof3) 
+          && ((eof1 != eof3) 
+              || (bytes_read1 != bytes_read3) 
+              || memcmp(buf1, buf3, bytes_read3)))
+        {
+          *identical_p13 = FALSE;
+        }
+    }
+
+  return svn_error_trace(
+           svn_error_compose_create(
+                err,
+                svn_error_compose_create(
+                    svn_io_file_close(file1_h, pool),
+                    svn_error_compose_create(
+                        svn_io_file_close(file2_h, pool),
+                        svn_io_file_close(file3_h, pool)))));
+}
+
+
+
 svn_error_t *
 svn_io_files_contents_same_p(svn_boolean_t *same,
                              const char *file1,
@@ -4098,6 +4259,53 @@
   return SVN_NO_ERROR;
 }
 
+svn_error_t *
+svn_io_files_contents_three_same_p(svn_boolean_t *same12,
+                             svn_boolean_t *same23,
+                             svn_boolean_t *same13,
+                             const char *file1,
+                             const char *file2,
+                             const char *file3,
+                             apr_pool_t *pool)
+{
+  svn_boolean_t diff_size12, diff_size23, diff_size13;
+
+  SVN_ERR(svn_io_filesizes_three_different_p(&diff_size12, 
+                          &diff_size23, 
+                          &diff_size13, 
+                          file1, 
+                          file2, 
+                          file3,
+                          pool));
+
+  if (diff_size12 && diff_size23 && diff_size13)
+    {
+      *same12 = *same23 = *same13 = FALSE;
+    }
+  else if (diff_size12 && diff_size23)
+    { 
+      *same12 = *same23 = FALSE;     
+      SVN_ERR(contents_identical_p(same13, file1, file3, pool));
+    }
+  else if (diff_size23 && diff_size13)
+    { 
+      *same23 = *same13 = FALSE;     
+      SVN_ERR(contents_identical_p(same12, file1, file2, pool));
+    }
+  else if (diff_size12 && diff_size13)
+    { 
+      *same12 = *same13 = FALSE;     
+      SVN_ERR(contents_identical_p(same23, file2, file3, pool));
+    }
+  else
+    {
+      SVN_ERR_ASSERT(!diff_size12 && !diff_size23 && !diff_size13);
+      SVN_ERR(contents_three_identical_p(same12, same23, same13, file1, file2, 
file3, pool));
+    }
+
+  return SVN_NO_ERROR;
+}
+
 #ifdef WIN32
 /* Counter value of file_mktemp request (used in a threadsafe way), to make
    sure that a single process normally never generates the same tempname
Index: subversion/libsvn_wc/merge.c
===================================================================
--- subversion/libsvn_wc/merge.c        (revision 1349990)
+++ subversion/libsvn_wc/merge.c        (working copy)
@@ -948,7 +948,9 @@
                    apr_pool_t *scratch_pool)
 {
   svn_skel_t *work_item;
-  svn_boolean_t same_contents = FALSE;
+  svn_boolean_t same_left_right;
+  svn_boolean_t same_right_target;
+  svn_boolean_t same_left_target;
   svn_node_kind_t kind;
   svn_boolean_t is_special;
 
@@ -961,19 +963,23 @@
       return SVN_NO_ERROR;
     }
 
+  /* Check the files */
+  SVN_ERR(svn_io_files_contents_three_same_p(&same_left_right,
+                                       &same_right_target,
+                                       &same_left_target,
+                                       left_abspath,
+                                       right_abspath,
+                                       detranslated_target_abspath,
+                                       scratch_pool));
+
   /* If the LEFT side of the merge is equal to WORKING, then we can
    * copy RIGHT directly. */
-  SVN_ERR(svn_io_files_contents_same_p(&same_contents, left_abspath,
-                                       detranslated_target_abspath,
-                                       scratch_pool));
-  if (same_contents)
+  if (same_left_target)
     {
       /* Check whether the left side equals the right side.
        * If it does, there is no change to merge so we leave the target
        * unchanged. */
-      SVN_ERR(svn_io_files_contents_same_p(&same_contents, left_abspath,
-                                           right_abspath, scratch_pool));
-      if (same_contents)
+      if (same_left_right)
         {
           *merge_outcome = svn_wc_merge_unchanged;
         }
@@ -1002,10 +1008,7 @@
        * conflicted them needlessly, while merge_text_file figured it out 
        * eventually and returned svn_wc_merge_unchanged for them, which
        * is what we do here. */
-      SVN_ERR(svn_io_files_contents_same_p(&same_contents,
-                                           detranslated_target_abspath,
-                                           right_abspath, scratch_pool));
-      if (same_contents)
+      if (same_right_target)
         {
           *merge_outcome = svn_wc_merge_unchanged;
           return SVN_NO_ERROR;

Reply via email to