Why ZFS on Proxmox
ZFS brings enterprise-grade features to Proxmox VE: data integrity via checksumming (detects and repairs silent corruption), transparent compression, instant snapshots and clones, and flexible RAID-like pooling. It replaces traditional volume management and filesystem layers with a single, unified stack.
Creating a Pool
A pool is built from virtual devices (vdevs). Common vdev types:
| Vdev type | Minimum disks | Usable capacity | Description |
|---|---|---|---|
| Mirror | 2 | 50% | Data mirrored across two disks |
| RAIDZ1 | 3 | 67% | Single parity (max 1 disk failure) |
| RAIDZ2 | 4 | 50% | Double parity (max 2 failures) |
| RAIDZ3 | 5 | 40% | Triple parity (max 3 failures) |
| Stripe | 1 | 100% | No redundancy (not recommended) |
Create a mirrored pool:
zpool create -o ashift=12 tank mirror /dev/sda /dev/sdb
The ashift=12 option ensures 4K sector alignment, critical for modern SSDs and HDDs. Short-stroking and partition alignment further improve performance.
Adding Cache and Log Devices
ZFS supports separate devices for performance acceleration:
- SLOG (Separate Intent Log) — Absorbs synchronous writes (NFS, database transactions). A small, power-protected SSD (NVMe or Optane) dramatically improves sync write performance. Does not affect reads.
zpool add tank log /dev/nvme0n1p1
- L2ARC (Level 2 ARC) — Caches frequently read data on an SSD. Adds read cache beyond RAM size. Useful when ARC eviction is high, but consumes CPU for indexing.
zpool add tank cache /dev/nvme0n1p2
Compression
ZFS offers transparent, block-level compression with negligible CPU overhead on modern hardware:
| Algorithm | Ratio | CPU cost | Best for |
|---|---|---|---|
| lz4 | ~2x | Very low | General purpose (default) |
| zstd | ~2–5x | Low–Medium | Archival, mixed data |
| gzip | ~3–6x | High | Maximum compression |
| zle | ~1x | Negligible | Already compressed data |
Enable compression on a dataset:
zfs set compression=lz4 tank/vmdata
ARC Tuning and atime
The ARC (Adaptive Replacement Cache) uses system RAM for data caching. By default ZFS consumes up to 50% of total RAM. Tune via kernel module parameters:
# /etc/modprobe.d/zfs.conf
options zfs zfs_arc_max=8589934592
options zfs zfs_arc_min=4294967296
Disable atime to reduce write amplification on SSDs:
zfs set atime=off tank/vmdata
Trimming SSDs
Enable automated TRIM for SSD pools to maintain performance over time:
zpool set autotrim=on tank
Manual trim is available as a one-shot operation:
zpool trim tank
Snapshots and Clones
ZFS snapshots are instantaneous and consume only blocks that change (Copy-on-Write):
zfs snapshot tank/vmdata@pre-update
zfs list -t snapshot
Rollback to a snapshot:
zfs rollback tank/vmdata@pre-update
Clones are writable snapshots, useful for testing without duplicating data:
zfs clone tank/vmdata@pre-update tank/clone-test
Send/Recv Replication
ZFS send/recv streams snapshots to another pool, enabling off-site backups and replication:
zfs send tank/vmdata@pre-update | ssh backup-server "zfs recv backup-pool/vmdata"
Incremental sends only transfer changed blocks between snapshots:
zfs send -i tank/vmdata@pre-update tank/vmdata@post-update | \
ssh backup-server "zfs recv backup-pool/vmdata"
Scrubbing and Monitoring
Regular scrubs verify checksums and repair silent corruption:
zpool scrub tank
zpool status -v
Schedule scrubs via Proxmox GUI or cron. Monitor pool health and SMART data:
smartctl -a /dev/sda
Replacing Failed Disks
Replace a failed disk in a redundant vdev:
zpool offline tank /dev/sda
zpool replace tank /dev/sda /dev/sdc
zpool online tank /dev/sdc
Common Troubleshooting
- Pool import failure —
zpool import -Dto recover devices;zpool import -ato import all. - Device offline — Check cables, then
zpool online tank /dev/sda. - Pool performance degrades — Check for fragmentation:
zpool list -v tank. Trim if SSD.
