Jamsync

About

Jamsync is an open-source version control system that enables software engineers to develop and deploy faster. We're currently under development but feel free to preview the system on this site. The AGPL-licensed source and client binaries are available to download here.

Algorithm

The idea behind Jamsync is the same as rsync. In fact, Jamsync uses jbreiding/rsync-go for now under the hood. If you havent read the rsync algorithm paper, it's highly recommended -- it's very approachable for anyone with a computer science background and has some great information.

Rsync

Essentially, rsync allows a file on two separate computers, a sender and receiver, to be synced efficiently over a network. It does this by chopping a file into blocks and reusing existing blocks on the receiver to regenerate a file. A rolling hash is used to scan over the file on the sender and operations are sent to the receiver to either reuse a block (if the rolling hash matches) or to use new data.

More detailed steps are below:

  1. Sender and receiver establish a connection
  2. Receiver hashes their existing blocks of a file with a rolling and strong hash and sends the hashes to the sender
  3. Sender receives hashes and uses the rolling hash to scan over the file
    1. If the rolling hash matches a block on the receiver, the strong hash is used to verify the integrity and an operation is sent to reuse the block
    2. If a block does not match a block on the receiver, an operation containing the data is sent to the receiver
  4. Receiver receives operations in a stream to regenerate the file by either reusing existing blocks or writing new data

How Jamsync uses Rsync

The main idea behind Jamsync is that we can store the operations sent by the sender to track changes to a file. This means we treat rsync operations like a delta chain that we can use later to regenerate the file. The storage of deltas and their usage to regenerate a file is similar to the Mercurial concept of a Revlog. However, the advantage of using rsync blocks is that we can efficiently store changes to, and regenerate, arbitrarily large files since these blocks can be streamed and regenerated independently.

Changes and Conflicts

A chain of changes, formed by the process above, can be used to regerate every file in a project. Branches off of the main chain can be used to prevent conflicts from occuring when editing the same part of a file; however, whenever a branch is merged in, every other branch is automatically rebased on top. This means that every branch will always be up-to-date. If conflicts occur during the rebase, a branch will be marked as "stale" and will need manual merging.

Limitations

The goal is to be able to handle over 10M files and over 1TB-sized files in a single repository. We're not there yet in the current implementation (probably ~10K files with 1GB-sized files) but should be there in the next couple months.

Implementation

Jamsync is being written from scratch in Go and uses mattn/go-sqlite3 to store projects and change information. gRPC and Protocol buffers are used for service definitions and data serialization.

Current state

This site is a preview version of the system. The features here will be available over the next few months.

Developers

Jamsync is being developed by Zachary Geier. Please send me an email if you have any thoughts at [email protected].