Go 1.5 and newer includes support for vendoring. Vendoring is a way of managing dependencies where instead of relying on the install or build process to find the dependent libraries (as with building with C or using system installation directories with a dynamic language), or using some kind of “virtual installation” (eg, python’s virtualenv, perlbrew, etc), you can include the modules you are installing under a path in your library’s source tree.
In Go 1.5, you needed to set an environment variable
(GO15VENDOREXPERIMENT=1
) to enable the feature. In newer go
versions, you need to set GO15VENDOREXPERIMENT=0
to disable the
feature: one might conclude that the experiment was successful, or at
least requiring further exploration.
Since Go came out, there have been an abundance of systems written to solve the vendoring and dependency management problem. Here I’d like to put forward an alternative system that’s so simple that in the simplest cases, it doesn’t even need separate tooling.
Git Submodules
Submodules in git were designed as a solution to this problem for
projects managed using git. The way they work is that in your tree,
you check in a commit object. This represents a checked out
repository at that location. It also has a file called the
.gitmodules
file, which specifies where the repositories can be
cloned from.
Adding a new vendor dependency
To add a new vendor dependency, I can use git submodule add
; don’t
worry, I’ll explain what the options all mean:
$ git submodule add --name github.com/lib/pq \
git@github.com:lib/pq vendor/github.com/lib/pq
Cloning into 'vendor/github.com/lib/pq'...
remote: Counting objects: 1377, done.
remote: Total 1377 (delta 0), reused 0 (delta 0), pack-reused 1377
Receiving objects: 100% (1377/1377), 598.03 KiB | 162.00 KiB/s, done.
Resolving deltas: 100% (841/841), done.
Checking connectivity... done.
$
What this will do is clone the repository at git@github.com:lib/pq
to the path vendor/github.com/lib/pq
. The --name
part is
somewhat important. It doesn’t matter what you use here, but it
shouldn’t change over the lifetime of your project.
You can see what is due to commit with git status
and commit normally:
$ git status
On branch master
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
modified: .gitmodules
new file: vendor/github.com/lib/pq
$ git commit -m "Vendor github.com/lib/pq"
[readme-update 5e2afb1] Add dependency on go-randomdata
2 files changed, 4 insertions(+)
create mode 160000 vendor/github.com/Pallinder/go-randomdata
$
Under the hood: git submodule mechanics
Git’s better understood with reference to its very simple inner
workings, so let’s look at the .gitmodules
file that gave us:
[submodule "github.com/lib/pq"]
path = vendor/github.com/lib/pq
url = git@github.com:lib/pq
OK, that’s just recorded the options we gave. Now let’s look at the checkout:
$ cd vendor/github.com/lib/pq/
$ ls -a
. bench_test.go encode.go ssl_test.go
.. buf.go encode_test.go url.go
.git certs error.go url_test.go
.gitignore conn.go hstore user_posix.go
.travis.yml conn_test.go listen_example user_windows.go
CONTRIBUTING.md copy.go notify.go
LICENSE.md copy_test.go notify_test.go
README.md doc.go oid
$
The .git
path is a regular file, not a directory! Let’s look at it:
$ cat .git
gitdir: ../../../../.git/modules/github.com/lib/pq
Sure enough, if you follow that path, you’ll arrive at the git repository for this path:
$ cd ../../../../.git/modules/github.com/lib/pq
$ ls -a
. HEAD description hooks info objects refs
.. config gitdir index logs packed-refs
$
The path under modules
is exactly what you specified to --name
on the git submodule add
command. This path is the local,
symbolic name for the dependency. Even if you switch to another fork
of a dependency, or move the checkout to a different location in your
tree, it’s worth keeping this the same.
The initial clone
With vendored submodules, when you first clone, you’ll need to either
use --recursive
, or use git submodule init
to clone all the
dependent versions of modules; here it is, assuming that you are
testing using a branch called git-vendoring
instead of master
:
$ git clone --recursive -b git-vendoring git@github.com:cutesyname/yourproject
Cloning into 'yourproject'...
remote: Counting objects: 35331, done.
remote: Compressing objects: 100% (12/12), done.
remote: Total 35331 (delta 8), reused 3 (delta 3), pack-reused 35316
Receiving objects: 100% (35331/35331), 22.23 MiB | 300.00 KiB/s, done.
Resolving deltas: 100% (24082/24082), done.
Checking connectivity... done.
Submodule 'github.com/lib/pq' (git@github.com:lib/pq) registered for path 'vendor/github.com/lib/pq'
Cloning into 'vendor/github.com/lib/pq'...
remote: Counting objects: 1377, done.
remote: Total 1377 (delta 0), reused 0 (delta 0), pack-reused 1377
Receiving objects: 100% (1377/1377), 598.03 KiB | 509.00 KiB/s, done.
Resolving deltas: 100% (841/841), done.
Checking connectivity... done.
Submodule path 'vendor/github.com/lib/pq': checked out 'dc50b6ad2d3ee836442cf3389009c7cd1e64bb43'
$
This new clone also has the directory checked out and useful.
Switching branches (with different dependency versions)
As you switch branches, the dependencies are generally not
automatically switched. However, you can easily see that this is the
case with git status
, and switch to the recorded one using git
submodule update
:
$ git checkout olderversion
On branch olderversion
Your branch is up-to-date with 'origin/olderversion'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: vendor/github.com/lib/pq (new commits)
no changes added to commit (use "git add" and/or "git commit -a")
$ git submodule update
Submodule path 'vendor/github.com/lib/pq': checked out '5e3230b4aee4ae51bfd11634f6592e12936b6145'
$ git status
On branch olderversion
Your branch is up-to-date with 'origin/olderversion'.
nothing to commit, working directory clean
$
There’s currently no way to make this automatic, but adding a git
submodule update
command to a post-checkout
hook should be
relatively safe. Continuous Integration builds where you don’t care
about changes in your local checkout should probably use git
submodule update --force
.
One of the great things about this approach is that the above commands will typically execute in well under a second if run again on the same checkout, which works well with (for example) Circle CI build directory caching.
Updating a vendored dependency
To see how up to date your dependencies are, first fetch them all
using git submodule foreach
, and then use git submodule status
:
$ git submodule foreach git fetch
Entering 'vendor/github.com/lib/pq'
$ git submodule status
dc50b6ad2d3ee836442cf3389009c7cd1e64bb43 vendor/github.com/lib/pq (go1.0-cutoff-56-gdc50b6a)
$
Want to switch to a release version? Check it out, and git add
the dependency:
$ cd vendor/github.com/lib/pq
$ git checkout go1.0-cutoff
Previous HEAD position was dc50b6a... Also send prepared statements' parameters over in binary
HEAD is now at 5da8732... Add Jonathan Rudenberg to the list of contributors
$ cd ..
$ git add pq
$ git commit -m "Pin libpq at 'go1.0-cutoff' tag"
[vendor-experiment ab799ab] Pin libpq at 'go1.0-cutoff' tag
1 file changed, 1 insertion(+), 1 deletion(-)
$
Investigating changes by submodule updates
If you set diff.submodule
to log
, then you can see the changes
in submodules when you use git log -p
:
$ git config --global diff.submodule log
$ git log -1 -p | head
commit ab799abfc61ccdddfe4cd8d1cad4327ffd6a9cc7
Author: Sam Vilain <sam@vilain.net>
Date: Mon Jan 25 15:53:44 2016 -0800
Pin libpq at 'go1.0-cutoff' tag
Submodule vendor/github.com/lib/pq dc50b6a..5da8732 (rewind):
< Also send prepared statements' parameters over in binary
< Add Chris Gilling to the list of contributors
< Implement driver option binary_parameters
$
Really, git log
is just scratching the surface of what could be
done here (as of Git 2.7.0, anyway). There’s enough information here
for GUIs to do clever things like overlay the submodule project
history with the superproject. It’s easy to imagine improvements to
this, such as options showing a git diff --stat
of the submodule,
etc. Of course, vendoring-specific tooling could also do this, but by
using submodules you benefit from any generic, non go-vendoring based
software that is written.
Switching forks
If you have a dependency which has its fork changed, then just change
the URL in the .gitmodules
file, and use git submodule sync
to
fix up the remote; then you can git submodule update
as before and
git add
the dependency which contains the fix you need.
This isn’t seamless; the git submodule sync
command will need to
be issued by people who switch branches to a version with a new fork
the first time (but not thereafter, so long as your fork also tracks
the original fork). But it does work, and you have resilience from
things like the original repository disappearing: if the version your
project needs is already in the clone, it does not need to use the
network at all.
Adding missing dependencies
Otherwise, you can spot your local dependencies which are not vendored
by looking in your $GOROOT
for modules which were checked out.
You can also use go build -a -v
to call out any dependencies which
are not already vendored:
$ eval "$(go env)" # set GOROOT
go build -a -v 2>&1 | grep -v 'github.com/cutesyname/yourproject' | while read dir
do
[ -d vendor/$dir -o -d $GOROOT/src/$dir ] ||
echo "$dir is not stdlib, nor vendored"
done
At least one person on the project has to know not to use go get
but instead to use the go submodule add
command.
If the dependencies also have dependencies, all you have to do is fork
the upstream, add vendoring to the project as you did to your own, and
then make a pull request against the original upstream. (Just
kidding. Go finds your dependencies’ imported modules if they are in
your vendor/
tree.)
In Summary…
Using git submodule
alone without any extra scripts or tooling is
not currently for the feint of heart, but unlike the dim and distant
past of the future, does basically work.
Historically in the early days of git, a lot of people used to write wrappers around core git functionality - things like making branches, fetching, updating branches and copying. The cogito tool was an early example of this. While cogito moved the needle forward and invented many concepts and features added to core git - remote tracking branches and history rewriting to name but two - most of these wrappers are just dead end script serving only to illustrate the conceptual model of authoring software that the writer posesses. They rarely add anything, and have tended to have become less and less necessary as people become more familiar with distributed version control and as git usability features have been added.
The lesson is, if a git feature sucks, but works, then use it anyway and hopefully someone will eventually contribute code to core to make it better. Perhaps that person will be you!