DL Consulting wrote:
I did a quick search but couldn't find anything about this little problem.

I have an X4100 production machine (called monster) that has a J4200 full of 
500GB drives attached. It's running OpenSolaris 2009.06 and fully up to date.

It takes daily snapshots and sends them to another machine as a backup. The 
sending and receiving is scripted and run from a cronjob. The problem is that 
some of the snapshots disappear from monster after they've been sent to the 
backup machine.

Example:
[i]sh...@monster:/$ zfs list -t snapshot | grep local@

...
mpool/lo...@zfs-auto-snap:daily-2009-07-02-00:00                  64.5K      -  
 176K  -
mpool/lo...@zfs-auto-snap:daily-2009-07-03-00:00                  76.5K      -  
 171K  -
mpool/lo...@zfs-auto-snap:daily-2009-07-05-00:00                  59.8K      -  
 173K  -
mpool/lo...@zfs-auto-snap:daily-2009-07-06-00:00                  59.8K      -  
 173K  -

sh...@chucky[11:53:46]:/$ zfs list -t snapshot | grep local@

....
mpool/lo...@zfs-auto-snap:daily-2009-07-01-00:00                    35K      -  
  92K  -
mpool/lo...@zfs-auto-snap:daily-2009-07-02-00:00                    36K      -  
  93K  -
mpool/lo...@zfs-auto-snap:daily-2009-07-03-00:00                  43.5K      -  
  89K  -
mpool/lo...@zfs-auto-snap:daily-2009-07-04-00:00                      0      -  
  90K  -[/i]

As you can see the snapshot for 2009-07-04 exists on chucky (the backup machine)
zpool history shows the snapshot was taken:

[i]sh...@monster:/$ pfexec zpool history mpool | grep 2009-07-04

2009-07-04.00:00:02 zfs snapshot 
mpool/lo...@zfs-auto-snap:daily-2009-07-04-00:00
2009-07-04.00:00:04 zfs snapshot -r 
mpool/local/vmwaremachi...@zfs-auto-snap:daily-2009-07-04-00:00
2009-07-04.00:00:05 zfs snapshot -r 
mpool/local/cvsr...@zfs-auto-snap:daily-2009-07-04-00:00
2009-07-04.00:00:06 zfs snapshot -r 
mpool/proje...@zfs-auto-snap:daily-2009-07-04-00:00
2009-07-05.00:05:09 zfs destroy mpool/lo...@zfs-auto-snap:daily-2009-07-04-00:00
2009-07-05.00:05:12 zfs destroy 
mpool/local/cvsr...@zfs-auto-snap:daily-2009-07-04-00:00[/i]

and the script did not produce any errors:

[i]pfexec /usr/sbin/zfs send -I  
mpool/lo...@zfs-auto-snap:daily-2009-07-03-00:00 
mpool/lo...@zfs-auto-snap:daily-2009-07-04-00:00 | ssh sh...@chucky pfexec 
/usr/sbin/zfs recv  mpool/local[/i]

Actually, you can't tell from this script if an error has occurred because
you do not check the return value of zfs receive.

Any ideas?

For some reason, the receive failed.  Since receives are an all-or-nothing
event, the snapshot would not exist on the remote site.  You must check
the return codes.

But... your script should also sync with the last common snapshot, so
it shouldn't matter if a transient event caused a disruption in the snapshot
sequence.  I have written such code, and it isn't particularly hard, just a
bit tedious.
-- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to