Featured image of post Git LFS: Managing Large Files in Your Repository Featured image of post Git LFS: Managing Large Files in Your Repository

Git LFS: Managing Large Files in Your Repository

Complete guide to Git LFS for large file management: setup, tracking patterns, file size limits, bandwidth costs, migration, CI/CD integration, and alternatives like DVC.

Introduction

Git was designed for text files, not large binaries. When large files enter a Git repository, clone times increase, repository size balloons, and operations like git log slow down. Git LFS (Large File Storage) solves this by replacing large files with text pointer files in the repository while storing the actual content on a remote server.

A pointer file is a small text file that maps to content on the LFS server:

version https://git-lfs.github.com/spec/v1
oid sha256:4a7e9a5b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b7b
size 12345

Real-world impact: a design team’s repository went from 12GB to 140MB after migrating to LFS. Git LFS is now the standard solution for large files in Git, supported by GitHub, GitLab, Bitbucket, and self-hosted solutions.


1. LFS Architecture and Installation

Git LFS works through smudge/clean filters. When you check out a branch, the smudge filter replaces pointer files with the actual content from the LFS server. When you commit, the clean filter replaces the large file with a pointer.

# Install Git LFS
git lfs install

# Verify installation
git lfs version

Running git lfs install adds entries to your .gitconfig that configure the smudge and clean filters under filter.lfs. The new SSH-based protocol uses the git-lfs-transfer command for authenticated transfers without relying on HTTPS.

Installation varies by platform:

  • Windows: git lfs install (included with Git for Windows)
  • macOS: brew install git-lfs && git lfs install
  • Linux: apt install git-lfs (Debian/Ubuntu) or yum install git-lfs (RHEL/Fedora)

2. Tracking Patterns and Configuration

Configure LFS to track specific file patterns using git lfs track. This adds entries to your .gitattributes file, which must be committed for team-wide adoption.

# Track common binary file types
git lfs track "*.psd"
git lfs track "*.zip"
git lfs track "*.tar.gz"
git lfs track "*.h5"
git lfs track "*.onnx"

The resulting .gitattributes entries look like this:

*.psd filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text

You can also scope patterns to specific directories or use negation patterns:

assets/**/*.png filter=lfs diff=lfs merge=lfs -text
!assets/thumbnails/*.png

View all tracked patterns with git lfs track and inspect the .gitattributes file directly. Remember: patterns must be committed for the rest of the team to inherit them.


3. File Size Limits and Bandwidth Management

LFS storage and bandwidth are metered by hosting providers:

ProviderFree StorageFree Bandwidth / Month
GitHub1 GB1 GB
GitLabVaries by tierVaries by tier
Bitbucket1.8 GB1.8 GB

To estimate costs, calculate total LFS storage based on current tracked files and monthly bandwidth based on team size and file update frequency. Strategies for reducing costs include:

# Prune old LFS revisions from local cache
git lfs prune

# Enable deduplication for storage efficiency
git lfs dedup

# Limit what is fetched on clone via .lfsconfig
[lfs]
  fetchinclude = "assets/**"
  fetchexclude = "archive/**"

Use lfs.<url>.access for behind-the-scenes authentication configuration. The default batch transfer mode is more efficient than sequential mode for multiple files.


4. Migration from Regular Git Tracking

For existing repositories with large files already committed, use git lfs migrate to rewrite history and replace large files with LFS pointers.

# Backup first — this rewrites history
git lfs migrate import --include="*.psd,*.zip" --everything

The migration process:

  1. Backup the repository
  2. Run git lfs migrate import with appropriate --include patterns
  3. Verify with git lfs ls-files --all
  4. Coordinate with the team, then force push
  5. Everyone reclones
# Verify migrated files
git lfs ls-files --all

# Force push (coordinate with team first)
git push --force --all

Important: rewriting history changes commit hashes, requires force push, and breaks open pull requests. For a less disruptive approach, start tracking future files only with git lfs track — this avoids history rewriting at the cost of leaving past large files in the history.

Case study: a 5GB Unity game repository was reduced to 280MB after migrating historical .psd, .fbx, and .unitypackage files to LFS.


5. CI/CD Integration

Integrating LFS with CI/CD pipelines requires explicit configuration.

GitHub Actions: use the lfs: true option in actions/checkout@v4, or run git lfs pull manually after checkout.

- uses: actions/checkout@v4
  with:
    lfs: true

GitLab CI: set GIT_LFS_SKIP_SMUDGE to disable LFS when not needed, and run git lfs pull selectively.

variables:
  GIT_LFS_SKIP_SMUDGE: "1"

before_script:
  - apt-get update && apt-get install -y git-lfs
  - git lfs install
  - git lfs pull --include="models/**"

Optimize CI by:

  • Skipping LFS pull for jobs that don’t need large files
  • Using --include/--exclude to fetch only what is needed
  • Caching the .git/lfs directory between CI runs
  • Using a self-hosted runner near the LFS server to reduce bandwidth costs

6. Repository Cleanup and Maintenance

Keep LFS repositories healthy with regular maintenance commands.

# Remove old LFS revisions from local storage
git lfs prune --recent --verify-remote

# List all tracked LFS files
git lfs ls-files --all

# Find the largest files
git lfs migrate info --everything --top=10

Clean up orphaned LFS objects on the remote via GitHub’s Storage settings or equivalent controls on other platforms. Manage .gitattributes carefully across branches — merge conflicts in LFS patterns are resolved by the LFS merge driver, which selects one version. Use lfs.allowincompletepush to handle interrupted pushes gracefully.


7. Alternatives Comparison

Git LFS is not always the right choice. Consider these alternatives:

ToolBest ForStorage BackendComplexity
Git LFSDesign assets, binariesNative Git hostingLow
DVCML datasets, model filesS3, GCS, SSH, SMBMedium
Git AnnexLarge media productionAny remote (S3, Glacier, etc.)High
Perforce100GB+ game assetsCentralized serverHigh
  • DVC (Data Version Control): designed for ML workflows with dataset versioning, pipeline tracking, and cloud storage backends. Better than LFS for ML, but more complex for simple binary management.
  • Git Annex: supports flexible storage backends, partial file access, and encryption. More powerful but significantly more complex.
  • Perforce: for very large binary assets (100GB+) in game development, Perforce’s centralized model can outperform Git LFS.

8. Troubleshooting and Common Issues

ProblemSolution
git-lfs not foundInstall git-lfs and run git lfs install
Object does not exist on servergit lfs fetch --all to sync pointers
LFS merge conflictsLFS merge driver auto-resolves (picks one version)
Authentication errorsConfigure credential.helper and lfs.<url>.access
Bandwidth exceededUpgrade plan or wait for billing cycle reset

Conclusion

Git LFS is the de facto standard for large files in Git. Key takeaways: track patterns early, migrate carefully with team coordination, monitor storage and bandwidth quotas, optimize CI with selective fetching, and clean up regularly with git lfs prune. Establish LFS guidelines early in project setup rather than retrofitting later. For ML-specific workflows, evaluate DVC; for massive binary assets, consider Perforce or Git Annex.