Re: [h2] Re: Implement restore points (vaguely similar to Oracle)

Enno Thieleke Sat, 09 Nov 2024 03:59:47 -0800

Hello,

thanks for the valuable feedback and time you've taken. I will try to go 
from top to bottom.

> Don't use labdas in unit tests at all unless it is strongly necessary.

Understood.

> I changed [...] to re-open connection and tests started to fail.

Correct me if I'm wrong, but for in-memory databases failures are expected, 
because they are dropped when the last connection is closed (unless 
prevented via flags in the connection string). That's one of the things my 
lambdas tried to address. However, I've replaced the lambda based tests 
with simpler, more straight forward ones which allow faster reasoning about 
the test behavior.

> This code throws OOME unless restore point creation is commented out

Thanks for pointing that out and good catch. For the time being I will not 
address it this, but I would address it once you (as a group) are convinced 
the feature goes in.

> This code throws General Error

Same as above, but an even better catch. In my opinion database compaction 
should not erase restore points. That would defeat the purpose of restore 
points (the way I intended them) entirely.

> Because SCRIPT command can't export these restore points, it must be 
documented.

Good point.

> You also need to document this functionality as experimental only.

Of course. If it goes in at all though.

> I think content of INFORMATION_SCHEMA.RESTORE_POINTS shouldn't be visible 
for regular users.

I disagree. The information schema contains all available information of a 
database and restore points are a part of that information, much like 
tables and columns and constraints. If nothing else, it simply enables 
users to realize that there are still restore points, i.e. more is kept on 
disk than might be necessary. Or maybe I misunderstood: I'm thinking of 
users who have access to the information schema and you'd like to restrict 
access to the table `restore_points` to admins?

> Don't use MVCC in its documentation, this term only confuses people.

I disagree again. MVCC is a widely used concept and restore points simply 
expose an implementation detail in H2: that it uses MVCC. I think this is 
comparable to the Oracle documentation, which is pretty thorough. Even more 
so than PostgreSQL (which is mind blowing to me). But I also get your point 
that it would be great if we were able to hide that information from users 
in case H2 ever switches to another storage which is not MVCC. But we could 
always adjust the docs if that happens.

> What is a purpose of OLDEST_DATABASE_VERSION_TO_KEEP and why this field 
isn't changed in newer restore points if older restore points were already 
dropped?

The field tracks the `database_version` of the very first restore point 
across all restore points. The reason (for better or worse) is that if one 
creates two restore points A and B, the latter one includes the creation of 
A, because they share the same storage as the rest of the database. Ergo, 
if one drops restore point A but returns to point B (at which the drop of A 
hasn't happened yet) and then returns to A, we expect the database to be in 
the state of A. And that's why we need to track the version of A (the 
oldest restore point in a chain) in every subsequent restore point. I'd be 
lieing if I said that I like this detail.

> [...] a [...] question, but why this feature is so badly needed?

It is 100% *not* badly needed. It would merely be a nice to have and, to be 
completely honest, I don't think it would ever go beyond that. However, if 
somebody wants to be able to restore an H2 database to an earlier state 
without the hassle of requiring persistent storage and copying files 
around, that's where it would shine.

> What's wrong with populating database to a desired state, shutting down, 
copying the file, and starting H2 with a file copy.

Nothing. It's just a wee bit more work.

> Why do you need this functionality for in-memory (non-persistent) 
database? If performance is the reason, you can use RAM-based file system 
instead.

Performance is not the reason. At least not when it comes to H2 directly. 
Database population tools like Liquibase on the other hand can be a 
bottleneck. And I actually tried a RAM-based filesystem. Jimfs. But I 
didn't get H2 to work on it, because it doesn't use `Paths.get(URI)` to 
look up files, but `Paths.get(String, String...)`. As for "real" RAM-based 
filesystems: That's simply too much work in my opinion.
But to answer your question why I would like this functionality for 
in-memory databases: Convenience. There's simply no other and no good or 
better reason.
But this kinda brings me back to the point I mentioned already: I think H2 
shouldn't know if it's running in-memory or not and it should work with 
Java based in-memory filesystems. And maybe it even does and I was not able 
to figure out how, but I think it doesn't.

> I am struggling to understand what is going to happen with all other 
concurrent connections and their transactions at the moment when RESTORE TO 
POINT is executed.

Gone. Simple as that. Concurrent connections are dropped. The connection 
that executes `restore to point` (which requires admin permissions) goes 
into exclusive mode and drops all other connections. It then rolls the 
storage back to the desired version and re-initializes the database 
(reloads META, rolls back transactions that were ongoing concurrently at 
the time of restore point creation it is restoring to). It is as if nothing 
ever happened after the restore point, that's being returned to. That's the 
idea at least.
Not gonna lie, it is complicated.

> I also agree with Evgenij, that implementation based on MVStore will 
prevent a lot of intermediate versions from being dropped and heedlessly 
bloat database file.

Which shouldn't be a problem once all restore points have been dropped, 
because then the MVStore GC can work its magic, or if a database is being 
restored to a point, because then at least some GC can happen.

> So if I was interested in this feature, I would implement SAVEPOINT as 
"export to script" and RESTORE as "drop all schemas and execute script". 
Only questions I have: how to lock the DB while this happens and 
queue/stall any pending sessions -- in case of multi-user access.

I like that idea *a lot*. It is much, much simpler than my approach. 
Although it's more than drop all schemas, because it would require a wiped 
database. Nothing bad though. Let's maybe go even further and simply create 
a script export and then create a new database with `INIT=RUNSCRIPT FROM 
'~/create.sql'`. That should do the trick, no? Why didn't I come up with 
it? :D
Yes, all connections would have to be dropped to the existing database, but 
it would be wiped clean anyway, so that's not an issue.

Guys, thanks a lot for the thoughts exchange. I'm still willing to spend 
time on "my approach" if you'd accept it into H2, but I'd also be fine with 
dropping it. As I said, what I've written so far was a good exercise and 
I've learned more about H2 in the past few days than I have over the last 5 
or more years.

Regards,
Enno

On Saturday, November 9, 2024 at 3:53:30 AM UTC+1 Andreas Reichel wrote:

Dear All.

On Fri, 2024-11-08 at 18:47 -0800, Andrei Tokar wrote:

What's wrong with populating database to a desired state, shutting down, 
copying the file, and starting H2 with a file copy.

Am I right to assume, that shutdown is not even needed: you can run export 
to script/backup on an open instance without interruptions, right?
(Tiny detail, but it could matter.)

So if I was interested in this feature, I would implement SAVEPOINT as 
"export to script" and RESTORE as "drop all schemas and execute script". 
Only questions I have: how to lock the DB while this happens and 
queue/stall any pending sessions -- in case of multi-user access.

Cheers
Andreas

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to h2-database+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/h2-database/fd7ce920-1596-4767-aab4-c1495287b1bfn%40googlegroups.com.

Re: [h2] Re: Implement restore points (vaguely similar to Oracle)

Reply via email to