If you’ve decided to use ZFS on your storage devices, congratulations! You’re using one of the most complex and feature-rich filesystems on the planet. And if you ever decide to store long-term data, such as family photos and videos, seriously consider ZFS. In a redundant setup, such as four mirrored hard drives, it absolutely guarantees you will never lose one bit of data due to bit rot or other forms of storage degradation over time, computer errors and so on. ZFS can self heal and recover data automatically. Complex algorithms, hashes and Merkle trees guarantee data integrity.
However, in this tutorial why ZFS is the best choice for archiving long-term data is not covered. Instead, what snapshots and clones can do for you is being discussed.
What Are ZFS Snapshots and Clones?
A snapshot is simply an exact picture of the state of your data at a certain point in time. For example, let’s say you’re working on a complex website. You store all code, databases, and images on your ZFS dataset. You change the design of the website, modify some images, change some layout dimensions and modify some code to make all this fit. If you want to revert to the previous design, you would have to revert all those changes individually. With ZFS, you can simply take a snapshot of your current design, make all the changes you want to make, and if you’re unhappy with the new design, simply roll back to the previous snapshot. And yes, it’s true, there is Git, GitHub and even some code editors that include the ability to take a snapshot and roll back. But with ZFS includes the following features as well:
- Snapshots are global. They create a snapshot of absolutely all data included in your project.
- Snapshots and rollbacks are almost instantaneous, no matter how big your project is (even if it has hundreds of gigabytes).
There’s no limit to the number of snapshots. You can have “Design 1,” “Design 2,” and “Design 3” and switch freely between them, make changes, and create a new snapshot: “Design 2 – Improved.”
Clones
While snapshots are basically frozen data states that you can return to, clones are like branches that start from a common point. To understand it better, imagine this scenario: You create a video for an advertising campaign. Then, you take a snapshot of this video (actually of the ZFS dataset where you store your video). Now, you clone this snapshot three times. You give “Clone 1” to one employee, “Clone 2” to another employee and “Clone 3” to the third employee. Now they can each work in their own individual space and make their desired changes.
Why is this useful? Videos can occupy huge amounts of disk space. High-resolution raw film can require hundreds or thousands of GB of storage. If the main video needs 500GB of storage and three people need to clone and work on divergent changes, this would require over 1500GB of storage.
With ZFS, the snapshot and three (or more) clones will require no more than 501GB of storage. Blocks of data that don’t change (all clones have this in common) are only stored once. This way, only the differences that each editor adds are stored as additional data. In a real world scenario, you may need something like 650GB of data for all three clones. It’s an efficient use of storage and resources, and data is properly isolated so that each editor can work to his heart’s content.
Of course, it’s useful for many other scenarios where you need to branch the same content in multiple different directions, even if disk space requirements are not a concern.
Commands Used to Work with ZFS Snapshots
While other Linux distributions can use this filesystem/volume manager, Ubuntu offers the best support, to date, for ZFS.
Since not all users have a whole disk available to offer ZFS, it may be useful to know that you can also create a pool on an empty partition with a command such as sudo zpool create pool_name /dev/sda3
, where /dev/sda3
is the device name of your third partition on your first disk.
After you install the proper packages and create your first ZFS dataset, this is how you create a snapshot.
First, find out the name of your ZFS dataset that you want to snapshot.
zfs list
In this example, the name of the dataset is data
and the name of the snapshot will be snap1
. Replace these values in the next command with what applies in your case. To create a snapshot, enter:
sudo zfs snapshot data@snap1
If in your case the dataset is named videos
, and you want to call your snapshot first
, the command would be:
sudo zfs snapshot videos@first
To roll back changes and restore your dataset to the exact contents it had when you took the snapshot, use:
sudo zfs rollback data@snap1
When you no longer need a snapshot, delete it with:
sudo zfs destroy data@snap1
Commands Used to Work with ZFS Clones
Assuming you have a snapshot called “data@snap1,” clone it with:
sudo zfs clone data@snap1 data/clone1
To delete a clone:
sudo zfs destroy data/clone3
And you can also snapshot clones.
sudo zfs snapshot data/clone2@snap_of_clone
In the future, when you want to remember all the snapshots and clones you have created, use:
zfs list -t all
Conclusion
This covers all basic operations you can do with ZFS snapshots and clones. It may be useful to know that each dataset has a hidden directory within called “.zfs.” With a command like ls /data/.zfs/snapshot/snap1/
, you are able to see the state of files in a snapshot. Since it acts like a regular (read-only) directory, you can also copy individual files from a snapshot in case you don’t need to revert the entire snapshot.
Our latest tutorials delivered straight to your inbox