Introduction
Git was designed for text files, not large binaries. When large files enter a Git repository, clone times increase, repository size balloons, and operations like git log slow down. Git LFS (Large File Storage) solves this by replacing large files with text pointer files in the repository while storing the actual content on a remote server.
A pointer file is a small text file that maps to content on the LFS server:
version https://git-lfs.github.com/spec/v1
oid sha256:4a7e9a5b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b
size 12345
Real-world impact: a design team’s repository went from 12GB to 140MB after migrating to LFS. Git LFS is now the standard solution for large files in Git, supported by GitHub, GitLab, Bitbucket, and self-hosted solutions.
1. LFS Architecture and Installation
Git LFS works through smudge/clean filters. When you check out a branch, the smudge filter replaces pointer files with the actual content from the LFS server. When you commit, the clean filter replaces the large file with a pointer.
# Install Git LFS
git lfs install
# Verify installation
git lfs version
Running git lfs install adds entries to your .gitconfig that configure the smudge and clean filters under filter.lfs. The new SSH-based protocol uses the git-lfs-transfer command for authenticated transfers without relying on HTTPS.
Installation varies by platform:
- Windows:
git lfs install(included with Git for Windows) - macOS:
brew install git-lfs && git lfs install - Linux:
apt install git-lfs(Debian/Ubuntu) oryum install git-lfs(RHEL/Fedora)
2. Tracking Patterns and Configuration
Configure LFS to track specific file patterns using git lfs track. This adds entries to your .gitattributes file, which must be committed for team-wide adoption.
# Track common binary file types
git lfs track "*.psd"
git lfs track "*.zip"
git lfs track "*.tar.gz"
git lfs track "*.h5"
git lfs track "*.onnx"
The resulting .gitattributes entries look like this:
*.psd filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
You can also scope patterns to specific directories or use negation patterns:
assets/**/*.png filter=lfs diff=lfs merge=lfs -text
!assets/thumbnails/*.png
View all tracked patterns with git lfs track and inspect the .gitattributes file directly. Remember: patterns must be committed for the rest of the team to inherit them.
3. File Size Limits and Bandwidth Management
LFS storage and bandwidth are metered by hosting providers:
| Provider | Free Storage | Free Bandwidth / Month |
|---|---|---|
| GitHub | 1 GB | 1 GB |
| GitLab | Varies by tier | Varies by tier |
| Bitbucket | 1.8 GB | 1.8 GB |
To estimate costs, calculate total LFS storage based on current tracked files and monthly bandwidth based on team size and file update frequency. Strategies for reducing costs include:
# Prune old LFS revisions from local cache
git lfs prune
# Enable deduplication for storage efficiency
git lfs dedup
# Limit what is fetched on clone via .lfsconfig
[lfs]
fetchinclude = "assets/**"
fetchexclude = "archive/**"
Use lfs.<url>.access for behind-the-scenes authentication configuration. The default batch transfer mode is more efficient than sequential mode for multiple files.
4. Migration from Regular Git Tracking
For existing repositories with large files already committed, use git lfs migrate to rewrite history and replace large files with LFS pointers.
# Backup first — this rewrites history
git lfs migrate import --include="*.psd,*.zip" --everything
The migration process:
- Backup the repository
- Run
git lfs migrate importwith appropriate--includepatterns - Verify with
git lfs ls-files --all - Coordinate with the team, then force push
- Everyone reclones
# Verify migrated files
git lfs ls-files --all
# Force push (coordinate with team first)
git push --force --all
Important: rewriting history changes commit hashes, requires force push, and breaks open pull requests. For a less disruptive approach, start tracking future files only with git lfs track — this avoids history rewriting at the cost of leaving past large files in the history.
Case study: a 5GB Unity game repository was reduced to 280MB after migrating historical .psd, .fbx, and .unitypackage files to LFS.
5. CI/CD Integration
Integrating LFS with CI/CD pipelines requires explicit configuration.
GitHub Actions: use the lfs: true option in actions/checkout@v4, or run git lfs pull manually after checkout.
- uses: actions/checkout@v4
with:
lfs: true
GitLab CI: set GIT_LFS_SKIP_SMUDGE to disable LFS when not needed, and run git lfs pull selectively.
variables:
GIT_LFS_SKIP_SMUDGE: "1"
before_script:
- apt-get update && apt-get install -y git-lfs
- git lfs install
- git lfs pull --include="models/**"
Optimize CI by:
- Skipping LFS pull for jobs that don’t need large files
- Using
--include/--excludeto fetch only what is needed - Caching the
.git/lfsdirectory between CI runs - Using a self-hosted runner near the LFS server to reduce bandwidth costs
6. Repository Cleanup and Maintenance
Keep LFS repositories healthy with regular maintenance commands.
# Remove old LFS revisions from local storage
git lfs prune --recent --verify-remote
# List all tracked LFS files
git lfs ls-files --all
# Find the largest files
git lfs migrate info --everything --top=10
Clean up orphaned LFS objects on the remote via GitHub’s Storage settings or equivalent controls on other platforms. Manage .gitattributes carefully across branches — merge conflicts in LFS patterns are resolved by the LFS merge driver, which selects one version. Use lfs.allowincompletepush to handle interrupted pushes gracefully.
7. Alternatives Comparison
Git LFS is not always the right choice. Consider these alternatives:
| Tool | Best For | Storage Backend | Complexity |
|---|---|---|---|
| Git LFS | Design assets, binaries | Native Git hosting | Low |
| DVC | ML datasets, model files | S3, GCS, SSH, SMB | Medium |
| Git Annex | Large media production | Any remote (S3, Glacier, etc.) | High |
| Perforce | 100GB+ game assets | Centralized server | High |
- DVC (Data Version Control): designed for ML workflows with dataset versioning, pipeline tracking, and cloud storage backends. Better than LFS for ML, but more complex for simple binary management.
- Git Annex: supports flexible storage backends, partial file access, and encryption. More powerful but significantly more complex.
- Perforce: for very large binary assets (100GB+) in game development, Perforce’s centralized model can outperform Git LFS.
8. Troubleshooting and Common Issues
| Problem | Solution |
|---|---|
git-lfs not found | Install git-lfs and run git lfs install |
| Object does not exist on server | git lfs fetch --all to sync pointers |
| LFS merge conflicts | LFS merge driver auto-resolves (picks one version) |
| Authentication errors | Configure credential.helper and lfs.<url>.access |
| Bandwidth exceeded | Upgrade plan or wait for billing cycle reset |
Conclusion
Git LFS is the de facto standard for large files in Git. Key takeaways: track patterns early, migrate carefully with team coordination, monitor storage and bandwidth quotas, optimize CI with selective fetching, and clean up regularly with git lfs prune. Establish LFS guidelines early in project setup rather than retrofitting later. For ML-specific workflows, evaluate DVC; for massive binary assets, consider Perforce or Git Annex.
