Open-source version control for game development
It's time for a free and open source collaboration system for game developers. Current closed-source systems, like Perforce and Plastic/Unity DevOps, are expensive, complex, and limit their users. Current open-source solutions, like Git or SVN, are difficult to use and do not scale with large projects containing large files. Game developers want to focus on their game, not their version control.
Jamsync is a free and open source version control system that uses modern Content Defined Chunking strategies to efficiently track changes to large files and large projects. Currently it's around 2x-10x faster than Git. You can try out a hosted alpha version on this site by logging in or check out the AGPL-licensed source on Github.
You can join the Discord for updates or to get support. If you have any questions or a problem I can help solve, please email me at [email protected].
Problem
Third generation version control systems have enabled code collaboration and sharing on a massive scale. Most notably Git in combination with hosting platforms like Github have improved our ability to reuse and share code. However, game developers have mostly been removed from this movement due to Git's poor support for large files and large projects. There are several problems that will also continue to get worse.
-
Files will get larger
Game developers are continuing to create larger files with more data. Developers want to version their data and assets along with their source code and not have to interact with another layer on top of their current system.
-
Repositories will get larger
Game developers are continuing to create more complex projects with more files, distributed across their team. Monorepos have become the de-facto way to manage large amounts of code in other industries but even those systems have little support for this. Large companies like Google and Facebook have created their own internal solutions to monorepo problems with Google's Piper and Facebook's Sapling but there are almost no options for companies outside these two.
-
Closed-source systems restrict sharing and innovation
Perforce is widely used in the professional games industry, but it's expensive, closed-source, and still has many issues. Game developers, as well as other developers in other industries, are looking for easy ways to collaborate with other people without having to go through a sales person.
-
Build and deploy times will approach zero
Long feedback cycles and deploy times crush engineering productivity. Eventually, the most productive companies will find ways to minimize the time to get feedback and deploy -- companies like Vercel have already begun this process for the web, but we're still a few years away from having truly "instant" previews for other industries. When build and deploy times approach zero, there will be little separation between local and remote development environments. Developers will begin to expect to deploy and collaborate in "realtime", rather than waiting to commit and push their changes.
-
Networks will get much faster and reliable
Current version control systems do not take advantage of how fast our networks can be. Downloading an entire monorepo with years of history for each file no longer makes sense and developers are nearly always online.
Fourth Generation Version Control
-
Seamless large file support
Git and other systems currently rely on an additional layer to support large files. Ultimately, large files are no different than small files, and our version control system should be able to scale efficiently between the two.
-
Monorepo support
Versioning and deploying a monorepo efficiently should be built into the version control system. Changes in one project (like adding a large binary) should not slow down the rest of the company. There needs to be support for a large amount of files (>100 million), built-in permissions, file locking, and dependency resolution to enable instant deploys and collaboration.
-
Efficient realtime syncing between local and remote
If build times approach zero, developers should be able to collaborate and ship code in real time, rather than waiting for other developers to manually commit push up their code. With syncing built into the version control system, we'll be able to know if merge conflicts occur and constantly merge remote changes into the local version.
-
Fully-featured virtual file system
To enable efficient local development and fast deploys, our version control system should be able to fetch files as-needed over a network, rather than cloning an entire repository every time.
-
Direct API Access
Many systems, such as game engines, CAD software, or file storage solutions, need ways to store and version large amounts of data. We should enable these systems to integrate into the version control system through an open API.
Jamsync
Jamsync is an in-development fourth generation version control system that is being built for game developers. You can currently do things most that you would expect from a current version control system, like pulling, pushing, branching, and merging, but it's not quite ready for production use -- expect some bugs. The core algorithm has been implemented but there's more work remaining to build out features that developers expect from a full collaboration platform. Please join the Discord and star the repo in Github to follow development.
Terminology
Since Jamsync works a little differently than most version control systems, it's necessary to define some words since they may have slightly different meaning than other systems.
- Mainline - The production history of the project. Made up of a series of "commits" that represent good versions of the project.
- Branch - A workspace for developers to make changes in. Developers will make "changes" in their branch and merge into the "mainline" when approved/ready. "Changes" will be tracked while in the branch, but will be squashed into a single "commit" when merged into the mainline. Eventually, changes will be able to be synced live between local developer machine and their branch.
- Change - A snapshot of a branch while developers are working on their project, made by doing a `jam push`.
- Commit - A snapshot of the production version of the project, made by merging in a "branch".
- Merge - Occurs when a branch is squashed and committed to the "mainline".
Benchmarks
This section compares Jamsync upload and download speed for a directory to Git. Note that these numbers are not final and future features will give Jamsync ways to make typical workflows faster, like directory mounting over NFS. Also, these are raw file measurements, meaning no previous versions are uploaded or downloaded (which is to Git's advantage).
- Git Source - 4287 files, 77MB
-
Upload Download Git 19.583s 47.616s Jamsync 8.357s 4.265s - Linux Source - 78351 files, 1.4G
-
Upload Download Git 8m32.365s 5m18.194s Jamsync 56.401s 28.531s - Celeba Dataset - 202599 files, 1.8G
-
Upload Download Git 1hr6m46s 1m52.665s Jamsync 11m0.868s 4m39.101s
Algorithm
The idea behind Jamsync based off of the rsync algorithm and Content Defined Chunking (CDC). If you haven't read these, I would highly recommend them!
How Jamsync uses Rsync and CDC
The main idea behind Jamsync is that we can store the operations sent by the sender in an rsync-like stream to track changes to a file. This means we treat rsync operations like a delta chain that we can use later to regenerate the file. The storage of deltas and their usage to regenerate a file is similar to the Mercurial concept of a Revlog. However, the advantage of using rsync blocks is that we can efficiently store changes to, and regenerate, arbitrarily large files since these blocks can be streamed and regenerated independently.
Data pointers
In each block, we can store the location of the last data block to regenerate the file efficiently. By using blocks instead of an xdelta approach, we can store pointers in each block find the last actual data block to use in the file, rather than regenerating the file through a delta chain which Mercurial does. Mercurial essentially caches the entire file at certain points and uses this later to have a smaller regeneration length.
Branches
A chain of changes, formed by the process above, can be used to regenerate every file in a project. Branches can be automatically rebased on top of the mainline. This means that every branch will always be up-to-date. If conflicts occur during the rebase, a branch will need manual merging.
Limitations
The goal is to be able to handle over 100M files and over 1TB-sized files in a single repository. We're not there yet in the current implementation (~1M files with 16GB-sized files) but should be there in the next couple months.
Implementation
Jamsync is being written from scratch in Golang and uses mattn/go-sqlite3 to store projects and change information. gRPC and Protocol buffers are used for service definitions and data serialization.
Acknowledgements
This awesome site theme is made by @panr and adapted to this site.