Re: [h2] Re: Implement restore points (vaguely similar to Oracle)

Andrei Tokar Sat, 09 Nov 2024 11:18:29 -0800

Here is my quick and dirty simulation of the "restore point" with copying 
of in-memory files:


public class RestorePointSimulation
{
    public static void main(String[] args) throws Exception {
        Class.forName("org.h2.Driver");
        String goldenCopy = "test";
        // populate database with required data
        try(Connection conn = DriverManager.getConnection("jdbc:h2:memFS:" 
+ goldenCopy)) {
            try (Statement stat = conn.createStatement()) {
                stat.execute("CREATE TABLE TEST(ID INT PRIMARY KEY, V INT) 
AS SELECT X, X FROM SYSTEM_RANGE(1, 1_000)");
            }
        }

        String workingCopy = "work";

        copyDatabaseFile(goldenCopy, workingCopy); // @BeforeTest
        // run first test
        try(Connection conn = 
DriverManager.getConnection("jdbc:h2:memFS:"+workingCopy)) {
            try (Statement stat = conn.createStatement()) {
                verifyRowCount(stat, 1_000);
                stat.execute("DELETE FROM TEST");
                verifyRowCount(stat, 0);
            }
        }

        copyDatabaseFile(goldenCopy, workingCopy); // @BeforeTest
        // run second test
        try(Connection conn = 
DriverManager.getConnection("jdbc:h2:memFS:"+workingCopy)) {
            try (Statement stat = conn.createStatement()) {
                verifyRowCount(stat, 1_000);
            }
        }
    }

    private static void copyDatabaseFile(String goldenCopy, String 
workingCopy) throws IOException {
        String workFileName = "memFS:" + workingCopy + ".mv.db";
        FilePath workFilePath = FilePath.get(workFileName);
        try (OutputStream out = workFilePath.newOutputStream(false)) {
            IOUtils.copy(FilePath.get("memFS:"+ goldenCopy 
+".mv.db").newInputStream(), out);
        }
        assert workFilePath.exists();
        assert workFilePath.isRegularFile();
        assert workFilePath.size() > 0;
    }

    private static void verifyRowCount(Statement stat, int expectedCount) 
throws SQLException {
        try (ResultSet resultSet = stat.executeQuery("SELECT COUNT(*) FROM 
TEST")) {
            assert resultSet.next();
            assert resultSet.getInt(1) == expectedCount;
        }
    }
}

On Saturday, November 9, 2024 at 6:59:39 AM UTC-5 Enno Thieleke wrote:

> Hello,
>
> thanks for the valuable feedback and time you've taken. I will try to go 
> from top to bottom.
>
> > Don't use labdas in unit tests at all unless it is strongly necessary.
>
> Understood.
>
> > I changed [...] to re-open connection and tests started to fail.
>
> Correct me if I'm wrong, but for in-memory databases failures are 
> expected, because they are dropped when the last connection is closed 
> (unless prevented via flags in the connection string). That's one of the 
> things my lambdas tried to address. However, I've replaced the lambda based 
> tests with simpler, more straight forward ones which allow faster reasoning 
> about the test behavior.
>
> > This code throws OOME unless restore point creation is commented out
>
> Thanks for pointing that out and good catch. For the time being I will not 
> address it this, but I would address it once you (as a group) are convinced 
> the feature goes in.
>
> > This code throws General Error
>
> Same as above, but an even better catch. In my opinion database compaction 
> should not erase restore points. That would defeat the purpose of restore 
> points (the way I intended them) entirely.
>
> > Because SCRIPT command can't export these restore points, it must be 
> documented.
>
> Good point.
>
> > You also need to document this functionality as experimental only.
>
> Of course. If it goes in at all though.
>
> > I think content of INFORMATION_SCHEMA.RESTORE_POINTS shouldn't be 
> visible for regular users.
>
> I disagree. The information schema contains all available information of a 
> database and restore points are a part of that information, much like 
> tables and columns and constraints. If nothing else, it simply enables 
> users to realize that there are still restore points, i.e. more is kept on 
> disk than might be necessary. Or maybe I misunderstood: I'm thinking of 
> users who have access to the information schema and you'd like to restrict 
> access to the table `restore_points` to admins?
>
> > Don't use MVCC in its documentation, this term only confuses people.
>
> I disagree again. MVCC is a widely used concept and restore points simply 
> expose an implementation detail in H2: that it uses MVCC. I think this is 
> comparable to the Oracle documentation, which is pretty thorough. Even more 
> so than PostgreSQL (which is mind blowing to me). But I also get your point 
> that it would be great if we were able to hide that information from users 
> in case H2 ever switches to another storage which is not MVCC. But we could 
> always adjust the docs if that happens.
>
> > What is a purpose of OLDEST_DATABASE_VERSION_TO_KEEP and why this field 
> isn't changed in newer restore points if older restore points were already 
> dropped?
>
> The field tracks the `database_version` of the very first restore point 
> across all restore points. The reason (for better or worse) is that if one 
> creates two restore points A and B, the latter one includes the creation of 
> A, because they share the same storage as the rest of the database. Ergo, 
> if one drops restore point A but returns to point B (at which the drop of A 
> hasn't happened yet) and then returns to A, we expect the database to be in 
> the state of A. And that's why we need to track the version of A (the 
> oldest restore point in a chain) in every subsequent restore point. I'd be 
> lieing if I said that I like this detail.
>
> > [...] a [...] question, but why this feature is so badly needed?
>
> It is 100% *not* badly needed. It would merely be a nice to have and, to 
> be completely honest, I don't think it would ever go beyond that. However, 
> if somebody wants to be able to restore an H2 database to an earlier state 
> without the hassle of requiring persistent storage and copying files 
> around, that's where it would shine.
>
> > What's wrong with populating database to a desired state, shutting down, 
> copying the file, and starting H2 with a file copy.
>
> Nothing. It's just a wee bit more work.
>
> > Why do you need this functionality for in-memory (non-persistent) 
> database? If performance is the reason, you can use RAM-based file system 
> instead.
>
> Performance is not the reason. At least not when it comes to H2 directly. 
> Database population tools like Liquibase on the other hand can be a 
> bottleneck. And I actually tried a RAM-based filesystem. Jimfs. But I 
> didn't get H2 to work on it, because it doesn't use `Paths.get(URI)` to 
> look up files, but `Paths.get(String, String...)`. As for "real" RAM-based 
> filesystems: That's simply too much work in my opinion.
> But to answer your question why I would like this functionality for 
> in-memory databases: Convenience. There's simply no other and no good or 
> better reason.
> But this kinda brings me back to the point I mentioned already: I think H2 
> shouldn't know if it's running in-memory or not and it should work with 
> Java based in-memory filesystems. And maybe it even does and I was not able 
> to figure out how, but I think it doesn't.
>
> > I am struggling to understand what is going to happen with all other 
> concurrent connections and their transactions at the moment when RESTORE TO 
> POINT is executed.
>
> Gone. Simple as that. Concurrent connections are dropped. The connection 
> that executes `restore to point` (which requires admin permissions) goes 
> into exclusive mode and drops all other connections. It then rolls the 
> storage back to the desired version and re-initializes the database 
> (reloads META, rolls back transactions that were ongoing concurrently at 
> the time of restore point creation it is restoring to). It is as if nothing 
> ever happened after the restore point, that's being returned to. That's the 
> idea at least.
> Not gonna lie, it is complicated.
>
> > I also agree with Evgenij, that implementation based on MVStore will 
> prevent a lot of intermediate versions from being dropped and heedlessly 
> bloat database file.
>
> Which shouldn't be a problem once all restore points have been dropped, 
> because then the MVStore GC can work its magic, or if a database is being 
> restored to a point, because then at least some GC can happen.
>
> > So if I was interested in this feature, I would implement SAVEPOINT as 
> "export to script" and RESTORE as "drop all schemas and execute script". 
> Only questions I have: how to lock the DB while this happens and 
> queue/stall any pending sessions -- in case of multi-user access.
>
> I like that idea *a lot*. It is much, much simpler than my approach. 
> Although it's more than drop all schemas, because it would require a wiped 
> database. Nothing bad though. Let's maybe go even further and simply create 
> a script export and then create a new database with `INIT=RUNSCRIPT FROM 
> '~/create.sql'`. That should do the trick, no? Why didn't I come up with 
> it? :D
> Yes, all connections would have to be dropped to the existing database, 
> but it would be wiped clean anyway, so that's not an issue.
>
>
>
> Guys, thanks a lot for the thoughts exchange. I'm still willing to spend 
> time on "my approach" if you'd accept it into H2, but I'd also be fine with 
> dropping it. As I said, what I've written so far was a good exercise and 
> I've learned more about H2 in the past few days than I have over the last 5 
> or more years.
>
> Regards,
> Enno
>
> On Saturday, November 9, 2024 at 3:53:30 AM UTC+1 Andreas Reichel wrote:
>
> Dear All.
>
> On Fri, 2024-11-08 at 18:47 -0800, Andrei Tokar wrote:
>
> What's wrong with populating database to a desired state, shutting down, 
> copying the file, and starting H2 with a file copy.
>
>
> Am I right to assume, that shutdown is not even needed: you can run export 
> to script/backup on an open instance without interruptions, right?
> (Tiny detail, but it could matter.)
>
> So if I was interested in this feature, I would implement SAVEPOINT as 
> "export to script" and RESTORE as "drop all schemas and execute script". 
> Only questions I have: how to lock the DB while this happens and 
> queue/stall any pending sessions -- in case of multi-user access.
>
> Cheers
> Andreas
>
>

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to h2-database+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/h2-database/55d7c48a-6558-408b-9f66-9601a3423b63n%40googlegroups.com.

Re: [h2] Re: Implement restore points (vaguely similar to Oracle)

Reply via email to