On 13. 1. 26 09:57, Timofei Zhakov wrote:
On Mon, Jan 12, 2026 at 9:17 PM Branko Čibej <[email protected]> wrote:
On 12. 1. 26 18:31, Timofei Zhakov wrote:
On Mon, Jan 12, 2026 at 3:29 PM Branko Čibej <[email protected]>
wrote:
On 12. 1. 26 14:30, Timofei Zhakov wrote:
On Mon, Jan 12, 2026 at 1:22 PM Branko Čibej
<[email protected]> wrote:
On 12. 1. 26 12:52, [email protected] wrote:
Author: brane
Date: Mon Jan 12 11:52:07 2026
New Revision: 1931262
Log:
Fix warnings in the new checksum code.
* subversion/libsvn_subr/checksum.c
(svn_checksum): Remove unused variable.
* subversion/libsvn_subr/checksum_apr.c: Include <limits.h>.
(svn_checksum__sha1, svn_checksum__sha1_ctx_update): Do not
blindly cast
or implicitly narrow the data length. Check the limits first.
Modified:
subversion/trunk/subversion/libsvn_subr/checksum.c
subversion/trunk/subversion/libsvn_subr/checksum_apr.c
Modified: subversion/trunk/subversion/libsvn_subr/checksum.c
==============================================================================
--- subversion/trunk/subversion/libsvn_subr/checksum.c Mon Jan
12 11:51:21 2026 (r1931261)
+++ subversion/trunk/subversion/libsvn_subr/checksum.c Mon Jan
12 11:52:07 2026 (r1931262)
@@ -475,8 +475,6 @@ svn_checksum(svn_checksum_t **checksum,
apr_size_t len,
apr_pool_t *pool)
{
- apr_sha1_ctx_t sha1_ctx;
-
SVN_ERR(validate_kind(kind));
*checksum = svn_checksum_create(kind, pool);
Modified: subversion/trunk/subversion/libsvn_subr/checksum_apr.c
==============================================================================
--- subversion/trunk/subversion/libsvn_subr/checksum_apr.c Mon Jan
12 11:51:21 2026 (r1931261)
+++ subversion/trunk/subversion/libsvn_subr/checksum_apr.c Mon Jan
12 11:52:07 2026 (r1931262)
@@ -21,6 +21,8 @@
*
====================================================================
*/
+#include <limits.h>
+
#include "svn_private_config.h"
#ifdef SVN_CHECKSUM_BACKEND_APR
@@ -89,6 +91,10 @@ svn_checksum__sha1(unsigned char *digest
apr_size_t len)
{
apr_sha1_ctx_t sha1_ctx;
+
+ /* Do not blindly truncate the data length. */
+ SVN_ERR_ASSERT(len < UINT_MAX);
+
apr_sha1_init(&sha1_ctx);
apr_sha1_update(&sha1_ctx, data, (unsigned int)len);
apr_sha1_final(digest, &sha1_ctx);
@@ -121,7 +127,9 @@ svn_checksum__sha1_ctx_update(svn_checks
const void *data,
apr_size_t len)
{
- apr_sha1_update(&ctx->apr_ctx, data, len);
+ /* Do not blindly truncate the data length. */
+ SVN_ERR_ASSERT(len < UINT_MAX);
+ apr_sha1_update(&ctx->apr_ctx, data, (unsigned int)len);
return SVN_NO_ERROR;
}
Just a quick note, it's generally not a good idea to use
unchecked typecasts anywhere. In this case there was a
borderline chance of Subversion quietly computing
invalid hashes. The assert at least removes the "quiet"
part, although I suspect the better solution is to call
apr_sha1_update in UINT_MAX-sized chunks until all data
are consumed.
I agree. It would be great if APR was accepting size as
size_t. But it's not a big deal. Anyways, I'm pretty sure
the issue existed there before.
Maybe this function could feed it blocks of UINT_MAX if it
exceeds the limit. But I don't think we should bother to
support such edge cases, because if it happens, the issue is
on the caller side -- it should not allocate 4GB buffers in
memory.
Is that a fact? Sorry but these days 4gigs is laughably small
in certain applications. Our API promises "apr_size_t" chunks
which on 64-bit platforms means larger than 4 GiB. We can
change our API or we can fix the bug. It doesn't matter if
the bug was there before, we know it's there now.
-- Brane
Okay, it makes sense. I made a patch to support that behaviour.
What do you think?
[[[
Index: subversion/libsvn_subr/checksum_apr.c
===================================================================
--- subversion/libsvn_subr/checksum_apr.c (revision 1931267)
+++ subversion/libsvn_subr/checksum_apr.c (working copy)
@@ -91,12 +91,19 @@
apr_size_t len)
{
apr_sha1_ctx_t sha1_ctx;
+ const char *ptr = data;
- /* Do not blindly truncate the data length. */
- SVN_ERR_ASSERT(len < UINT_MAX);
+ apr_sha1_init(&sha1_ctx);
- apr_sha1_init(&sha1_ctx);
- apr_sha1_update(&sha1_ctx, data, (unsigned int)len);
+ while (len > 0)
+ {
+ unsigned int block = (len < UINT_MAX) ? (unsigned int)len
+ : UINT_MAX;
+ apr_sha1_update(&sha1_ctx, ptr, block);
+ len -= block;
+ ptr += block;
+ }
+
apr_sha1_final(digest, &sha1_ctx);
return SVN_NO_ERROR;
}
@@ -127,9 +134,17 @@
const void *data,
apr_size_t len)
{
- /* Do not blindly truncate the data length. */
- SVN_ERR_ASSERT(len < UINT_MAX);
- apr_sha1_update(&ctx->apr_ctx, data, (unsigned int)len);
+ const char *ptr = data;
+
+ while (len > 0)
+ {
+ unsigned int block = (len < UINT_MAX) ? (unsigned int)len
+ : UINT_MAX;
+ apr_sha1_update(&ctx->apr_ctx, ptr, block);
+ len -= block;
+ ptr += block;
+ }
+
return SVN_NO_ERROR;
}
]]]
I don't really like casting 'void *' to 'char *' but it seems
like the only option.
That cast isn't a problem, it already happens implicitly when
apr_sha1_update is called; this is from apr_sha1.h:
APU_DECLARE(void) apr_sha1_update(apr_sha1_ctx_t *context, const char
*input,
unsigned int inputLen);
Oh, good to know.
[...]
It would be nice, or at least it would satisfy my sense of
aesthetics, if the loop wasn't copy-pasted twice ... I'd make a
helper function that will almost certainly be inlined. As a bonus,
that helper function can take a 'const char*' argument, then you
don't need the cast nor a second variable. Something like this?
static void update_sha1(apr_sha1_ctx_t *ctx, const char* data, apr_size_t
len)
{
while (len > UINT_MAX)
{
apr_sha1_update(ctx, data, UINT_MAX);
len -= UINT_MAX;
data += UINT_MAX;
}
if (len > 0)
apr_sha1_update(ctx, data, (unsigned int)len); /* This cast is safe.
*/
}
then just call update_sha1() with no casts instead of
apr_sha1_update(). Nice.
I thought about making a function. It might be better for readability
to have everything inlined rather than adding extra abstractions. But
as I think now it is probably worth it. This is like a wrapper around
APR which allows for bigger buffers. And again it removes the risk of
copy-paste mistakes.
It's also a good argument to utilise implicit cast to char-ptr during
function invocation.
Could you please elaborate why you are suggesting re-writing the loop
in this way? I think it was much simpler before. I would prefer to
have a single invocation of the function, because it assures that no
matter what we'd get the same behaviour. Imagine mistyping parameters
in the first one (inside of the loop) which would never be actually
executed and then trying to figure out where that bug comes from (and
this all will most likely happen while performing some unbounded
operation which is also a bug).
It gets rid of this
unsigned int block = (len < UINT_MAX) ? (unsigned int)len : UINT_MAX;
happening on every operation, something the compiler can't just ignore
or optimise out. It also avoids
len -= UINT_MAX;
data += UINT_MAX;
when it isn't necessary; something that probably would be optimised out.
*shrug* Compared to the actual SHA-1 calculation, that's trivial. I just
find it cleaner to not have to recalculate the block length every time.
It's really a matter of taste.
I see that it removes the ternary operator, but it was good in
explaining what this code does; feed it UINT_MAX bytes or whatever
amount we have. On the other hand, after your rewrite, two completely
separate branches would be executed depending on whether it iterates
UINT_MAX blocks or feeds the remaining.
As I said, it's a matter of taste. I find it more obvious what's
happening when the while() condition tests against an explicit, constant
upper bound.
I'm planning to do a similar thing to the bcrypt backend. A tiny
note: long is usually 32 bit even on 64 bit platforms (on Windows).
+1. Yes, I remembered later that ULONG == DWORD == not something
I'd use for object sizes on a 64-bit platform. Same problem as
using unsigned int for the length in APR.
-- Brane
--
Timofei Zhakov