next up previous contents
Next: Building darcs Up: Darcs 0.9.12 David's advanced Previous: Contents   Contents

Subsections

Introduction

Darcs is a revision control system, along the lines of CVS or arch. That means that it keeps track of various revisions and branches of your project, allows for changes to propogate from one branch to another. Darcs is intended to be an ``advanced'' revision control system. Darcs has two particularly distinctive features which differ from other revision control systems: 1) each copy of the source is a fully functional branch, and 2) underlying darcs is a consistent and powerful theory of patches.

Every source tree a branch

The primary simplifying notion of darcs is that every copy of your source code is a full repository. This is dramatically different from CVS, in which the normal usage is for there to be one central repository from which source code will be checked out. It is closer to the notion of arch, since the `normal' use of arch is for each developer to create his own repository. However, darcs makes it even easier, since simply checking out the code is all it takes to create a new repository. This has several advantages, since you can harness the full power of darcs in any scratch copy of your code, without committing your possibly destabilizing changes to a central repository.

Theory of patches

The development of a simplified theory of patches is what originally motivated me to create darcs. This patch formalism means that darcs patches have a set of properties, which make possible manipulations that couldn't be done in other revision control systems. First, every patch is invertible. Secondly, sequential patches (i.e. patches that are created in sequence, one after the other) can be reordered, although this reordering can fail, which means the second patch is dependent on the first. Thirdly, patches which are in parallel (i.e. both patches were created by modifying identical trees) can be merged, and the result of a set of merges is independent of the order in which the merges are performed. This last property is critical to darcs' philosophy, as it means that a particular version of a source tree is fully defined by the list of patches that are in it. i.e. there is no issue regarding the order in which merges are performed. For a more thorough discussion of darcs' theory of patches, see Appendix A.

A simple advanced tool

Besides being ``advanced'' as discussed above, darcs is actually also quite simple. Versioning tools can be seen as three layers. At the foundation is the ability to manipulate changes. On top of that must be placed some kind of database system to keep track of the changes. Finally, at the very top is some some sort of distribution system for getting changes from one place to another.

Really, only the first of these three layers is of particular interest to me, so the other two are done as simply as possible. At the database layer, darcs just has an ordered list of patches along with the patches themselves, each stored as an individual file. Darcs' distribution system is strongly inspired by that of arch. Like arch, darcs uses a dumb server, typically apache or just a local or network file system. Unlike arch, darcs currently has no write ability to a remote file system. This means that darcs currently only supports for ``pulling'' of patches from a remote repository to a local one, but not ``pushing'' of patches. While this does simplify matters by eliminating issues of user permissions, it isn't really adequate, as it doesn't address the needs of users who lack a server with a permanent net connection to host their repositories. I do have plans for supporting a push mechanism (which, by the way will be accelerated if I hear there is a demand for such a thing, as I personally have no use for it).

Keeping track of changes rather than versions

In the last paragraph, I explained revision control systems in terms of three layers. One can also look at them as having two distinct uses. One is to provide a history of previous versions. The other is to keep track of changes that are made to the repository, and to allow these changes to be merged and moved from one repository to another. These two uses are distinct, and almost orthogonal, in the sense that a tool can support one of the two uses optimally while providing no support for the other. Darcs is not intended to maintain a history of versions, although it is possible to kludge together such a revision history, either by making each new patch depend on all previous patches, or by tagging regularly. In a sense, this is what the tag feature is for, but the intention is that tagging will be used only to mark particularly notable versions (e.g. released versions, or perhaps versions that pass a time consuming test suite).

As I understand them (and I certainly may be wrong), previous revision control systems originated with their purpose being to keep track of a history of versions, with the ability to merge changes being added as it was seen that this would be desirable. But the fundamental object remained the versions themselves.

In such a system, a patch (I am using patch here to mean an encapsulated set of changes) is uniquely determined by two trees. Merging changes that are in two trees consists of finding a common parent tree, computing the diffs of each tree with their parent, and then cleverly combining those two diffs and applying the combined diff to the parent tree, possibly at some point in the process allowing human intervention, to allow for fixing up problems in the merge such as conflicts.

Finding this parent tree poses problems. This is where DAGs (Directed Acyclic Graphs) come in. A DAG is a convenient way to assure that two trees (tree here meaning a source tree) have just one closest parent. This means that need keep track of the relationships of the trees, a most imposing task, as the number of trees becomes large! It also may put an artificial constraint on the users, by not allowing them to create cyclic loops of relationships between their versions. Since I don't particularly understand these DAGs (or, for that matter, other revision control systems), I'll leave it here and just trust those who have gone before me that they are difficult.

In the world of darcs, the source tree is not the fundamental object, but rather the patch is the fundamental object. Rather than a patch being defined in terms of the difference between two trees, a tree is defined as the result of applying a given set of patches to an empty tree. Moreover, these patches may be reordered (unless there are dependencies between the patches involved) without changing the tree. Thus there is no need to find a common parent when performing a merge. Or, if you like, their common parent is defined by the set of common patches, and may not correspond to any version in the version history (if we kept track of a history).

One useful consequence of darcs' patch-oriented philosophy is that since a patch need not be uniquely defined by a pair of trees (old and new), we can have several ways of representing the same change, which differ only in how they commute and what the result of merging them is. Of course, creating such a patch will require some sort of user input. This is a Good Thing, since the user creating the patch should be the one forced to think about what they really want to change, rather than the user merging the patch. An example of this is the token replace patch (See Section A.5). This feature make it possible to create a patch, for example, which changes every instance of the variable ``stupidly_named_var'' with ``better_var_name'', while leaving ``other_stupidly_named_var'' untouched. When this patch is merged with any other patch involving the ``stupidly_named_var'', that instance will also be modified to ``better_var_name''. This is in contrast to a more conventional merging method which would not only fail to change new instances of the variable, but would also involves conflicts when merging with any patch that modifies lines containing the variable. By more using additional information about the programmer's intent, darcs is thus able to make the process of changing a variable name the trivial task that it really is, which is really just a trivial search and replace, modulo tokenizing the code appropriately.

The patch formalism discussed in Appendix A is what makes darcs' approach possible. In order for a tree to consist of a set of patches, there must be a deterministic merge of any set patches, regardless of the order in which they must be merged. This requires that one be able to reorder patches. While I don't know that the patches are required to be invertible as well, my implementation certainly requires inveribility. In particular, invertibility is required to make use of Theorem 2, which is used extensively in the manipulation of merges.

To summarize, I believe that darcs has a solid theoretical foundation leagues beyond what anyone else has developed. On the down side, darcs is currently slow, possibly still buggy--certainly still buggy in its entirety, but I mean to say that its core may still be buggy. Moreover it is lacking an abundance of features. However, I believe that the theory behind darcs will be the foundation of the next generation of revision control system.


next up previous contents
Next: Building darcs Up: Darcs 0.9.12 David's advanced Previous: Contents   Contents
David Roundy 2003-07-30