1# echo "Hi."
Hi.
This tutorial is an in-depth look at how Git works, performing
a lot of sometimes unusual steps to walk through interesting
details. You will have to pay attention closely, or you will
get lost on the way. But do not despair; you can run this
tutorial on your computer, at the speed you want, skip to
any step you want, and investigate the state of things in
another terminal window at all times.
In fact, you are looking at an HTML file generated from the output of that tutorial. (that's why there is that "echo Hi" thing above: the hack that the tutorial script is only allows comments after commands. :) )
The code of the tutorial is here: github.com/bakkenbaeck/a-random-walk-through-git - clone it and run it on your machine!
This tutorial is NOT for absolute beginners, nor is it a
collection of "cooking recipies". Recipies will not help you
understanding the broad picture, nor will they get you out
of tricky situations.
Some deeper understanding by experimentation and investigation
will, though. So let's get started.
2# echo "Terms"
Terms
First, a quick recap of Git-related terms.
tree: set of files (filenames, perms, pointers to subtrees/file blobs, NOT timedates)
commit: metadata (time, author, pointer to tree, possibly pointer to parent commit(s))
HEAD: last commit hash/parent of next commit (local only, modified by, e.g., git checkout)
index/staging/cache: HEAD plus "changes to be committed" (local only, modified by, e.g., git add/reset, stored in .git)
working directory/WIP: index plus "changes not added for commit" (plain files, local only, modified by, e.g., git checkout/reset --hard)
Ok? Then let's init a Git repository... and have a look at the
files in the .git/ folder.
3# git init --initial-branch=master . && git config --local user.name "Ijon Tichy" && git config --local user.email "ijon@beteigeuze.space" && rm -rf .git/hooks/ && find .git -type f
Initialized empty Git repository in example/.git/ .git/info/exclude .git/description .git/HEAD .git/config
Then, let's commit a README.
4# echo "This is not a README yet" > README && git add README && git commit -m "first commit"
[master (root-commit) 6ecf002] first commit 1 file changed, 1 insertion(+) create mode 100644 README
What files were created by the commit in the .git/ folder?
5# find . -type f
./.git/info/exclude ./.git/description ./.git/refs/heads/master ./.git/HEAD ./.git/objects/5b/6c6cb672dc1c3e3f38da4cc819c07da510fb59 ./.git/objects/b3/5c99875f5758f64e9348c05dac14848a046f59 ./.git/objects/6e/cf00219d83579a35e3a1daae2615f753c0ec0f ./.git/config ./.git/index ./.git/COMMIT_EDITMSG ./.git/logs/HEAD ./.git/logs/refs/heads/master ./README
Now there are three objects: commit, tree, blob (file). What file type do the Git object files use?
6# file .git/objects/*/*
.git/objects/5b/6c6cb672dc1c3e3f38da4cc819c07da510fb59: zlib compressed data .git/objects/6e/cf00219d83579a35e3a1daae2615f753c0ec0f: zlib compressed data .git/objects/b3/5c99875f5758f64e9348c05dac14848a046f59: zlib compressed data
All internal blobs get compressed. Saves space and keeps grep clean. Yay!
More details on these files later.
7# cat .git/refs/heads/master
6ecf00219d83579a35e3a1daae2615f753c0ec0f
This is the hash of HEAD of the master branch.
8# cat .git/HEAD
ref: refs/heads/master
This is a pointer to current HEAD (or a hash when in "detached HEAD" state).
9# cat .git/logs/refs/heads/master
0000000000000000000000000000000000000000 6ecf00219d83579a35e3a1daae2615f753c0ec0f Ijon Tichy <ijon@beteigeuze.space> 1620831874 +0200 commit (initial): first commit
This is the reflog of master HEAD (cf. git reflog).
It is not part of repo but for local convenience only.
We'll look at it later.
10# file .git/index
.git/index: Git index, version 2, 1 entries
That's the file Git uses to keep track of the current index (local only).
It is basically an uncommitted commit, or rather the 'tree' part of that.
This file is one of the few Git files that is a bit magic, mostly because
of speed optimization considerations: In order for "git status" to be able
to run really fast, some data additional to the data kept in the actual repo
has to be available. This is why .git/index is not just a standard tree
object (which doesn't have the additional metadata).
We will not go into details here. Further reading:
https://github.com/git/git/blob/master/Documentation/technical/index-format.txt
https://mirrors.edge.kernel.org/pub/software/scm/git/docs/technical/racy-git.txt
https://stackoverflow.com/questions/4084921/what-does-the-git-index-contain-exactly
11# git log
commit 6ecf00219d83579a35e3a1daae2615f753c0ec0f
Author: Ijon Tichy <ijon@beteigeuze.space>
Date:   Wed May 12 17:04:33 2021 +0200
    first commit
Note the commit hash. It's basically
sha1sum(commit metadata including pointer to hash of tree)
12# sleep 1 && git commit --amend -m "first commit"
[master 6ecf002] first commit Date: Wed May 12 17:04:33 2021 +0200 1 file changed, 1 insertion(+) create mode 100644 README
We just amended the last commit but didn't actually change anything:
same commit message, author, tree, and time.
But the commit hash has changed. Why?
13# git log --pretty=fuller
commit 6ecf00219d83579a35e3a1daae2615f753c0ec0f
Author:     Ijon Tichy <ijon@beteigeuze.space>
AuthorDate: Wed May 12 17:04:33 2021 +0200
Commit:     Ijon Tichy <ijon@beteigeuze.space>
CommitDate: Wed May 12 17:04:34 2021 +0200
    first commit
Because there's more metadata than git log shows by default.
There's an author date and a commit date. Amending a commit
keeps the author date but updates the commit date.
Note that Git has separate author and committer to account
for the traditional Linux email based patch workflow.
Authors would send in patches by mail, maintainers pick up
patches and commit (or reject).
14# GIT_COMMITTER_DATE="Jan 1 12:00 2000 +0000" git commit --amend --date="Jan 1 12:00 2000 +0000" -m "first commit"
[master c8d9b9c] first commit Date: Sat Jan 1 12:00:00 2000 +0000 1 file changed, 1 insertion(+) create mode 100644 README
rewrite last commit with fixed times (--date sets author date)
15# GIT_COMMITTER_DATE="Jan 1 12:00 2000 +0000" git commit --amend --date="Jan 1 12:00 2000 +0000" -m "first commit"
[master c8d9b9c] first commit Date: Sat Jan 1 12:00:00 2000 +0000 1 file changed, 1 insertion(+) create mode 100644 README
THAT works: commit hash stays the same.
16# git log --pretty=fuller
commit c8d9b9c01eea11fb1032903b0dd2bea3eeb46f48
Author:     Ijon Tichy <ijon@beteigeuze.space>
AuthorDate: Sat Jan 1 12:00:00 2000 +0000
Commit:     Ijon Tichy <ijon@beteigeuze.space>
CommitDate: Sat Jan 1 12:00:00 2000 +0000
    first commit
17# export GIT_COMMITTER_DATE="Jan 1 12:00 2000 +0000" && export GIT_AUTHOR_DATE="Jan 1 12:00 2000 +0000"
Let us fix dates so that we have deterministic hashes.
For the purposes if this demo only; don't do this at home.
18# file .git/objects/*/*
.git/objects/5b/6c6cb672dc1c3e3f38da4cc819c07da510fb59: zlib compressed data .git/objects/6e/cf00219d83579a35e3a1daae2615f753c0ec0f: zlib compressed data .git/objects/b3/5c99875f5758f64e9348c05dac14848a046f59: zlib compressed data .git/objects/c8/d9b9c01eea11fb1032903b0dd2bea3eeb46f48: zlib compressed data
That's one tree (we didn't change files so far), one file, three commits (original, hash test, fixed time).
19# git branch test
20# file .git/objects/*/*
.git/objects/5b/6c6cb672dc1c3e3f38da4cc819c07da510fb59: zlib compressed data .git/objects/6e/cf00219d83579a35e3a1daae2615f753c0ec0f: zlib compressed data .git/objects/b3/5c99875f5758f64e9348c05dac14848a046f59: zlib compressed data .git/objects/c8/d9b9c01eea11fb1032903b0dd2bea3eeb46f48: zlib compressed data
Just creating a new branch doesn't create any new trees or commits or blobs.
21# cat .git/HEAD
ref: refs/heads/master
Right, we're still on master.
22# git checkout test
Switched to branch 'test'
23# cat .git/HEAD
ref: refs/heads/test
24# cat .git/refs/heads/test
c8d9b9c01eea11fb1032903b0dd2bea3eeb46f48
note that file is all that Git needs to handle (local) branches
25# file .git/objects/c8/d9b9c01eea11fb1032903b0dd2bea3eeb46f48
.git/objects/c8/d9b9c01eea11fb1032903b0dd2bea3eeb46f48: zlib compressed data
we have an object with that commit hash, let's have a look
26# git cat-file -t c8d9b9c01eea11fb1032903b0dd2bea3eeb46f48
commit
cat-file is low level Git ('plumbing'); -t prints the object type...
27# git cat-file -p c8d9b9c01eea11fb1032903b0dd2bea3eeb46f48
tree b35c99875f5758f64e9348c05dac14848a046f59 author Ijon Tichy <ijon@beteigeuze.space> 946728000 +0000 committer Ijon Tichy <ijon@beteigeuze.space> 946728000 +0000 first commit
...and -p pretty prints that object's content. Let's look at the referenced tree.
28# git cat-file -t b35c99875f5758f64e9348c05dac14848a046f59
tree
well that was obvious
29# git cat-file -p b35c99875f5758f64e9348c05dac14848a046f59
100644 blob 5b6c6cb672dc1c3e3f38da4cc819c07da510fb59 README
Note file metadata (file mode bits, filename) is found in the tree's data.
There's no file date: git checkout etc. always writes with current date as many
tools (GNU make etc.) rely on file dates for their operation, e.g., make
only rebuilds artifacts if the artifact filedate is older than the source
file date - so checking out older project versions (with 'correct' old file
dates) would not trigger rebuilds.
Let's look at the referenced blob.
30# git cat-file -t 5b6c6cb672dc1c3e3f38da4cc819c07da510fb59
blob
31# git cat-file -p 5b6c6cb672dc1c3e3f38da4cc819c07da510fb59
This is not a README yet
But how much magic does cat-file do?
32# zlib-flate -uncompress < .git/objects/5b/6c6cb672dc1c3e3f38da4cc819c07da510fb59 | hexdump -C
00000000 62 6c 6f 62 20 32 35 00 54 68 69 73 20 69 73 20 |blob 25.This is | 00000010 6e 6f 74 20 61 20 52 45 41 44 4d 45 20 79 65 74 |not a README yet| 00000020 0a |.| 00000021
It really is just zlib compressed type+length header, null byte, data. No magic!
33# zlib-flate -uncompress < .git/objects/5b/6c6cb672dc1c3e3f38da4cc819c07da510fb59 | sha1sum
5b6c6cb672dc1c3e3f38da4cc819c07da510fb59 -
...and the object filename really is just its hash.
34# zlib-flate -uncompress < .git/objects/b3/5c99875f5758f64e9348c05dac14848a046f59 | hexdump -C
00000000 74 72 65 65 20 33 34 00 31 30 30 36 34 34 20 52 |tree 34.100644 R| 00000010 45 41 44 4d 45 00 5b 6c 6c b6 72 dc 1c 3e 3f 38 |EADME.[ll.r..>?8| 00000020 da 4c c8 19 c0 7d a5 10 fb 59 |.L...}...Y| 0000002a
Same for the tree object. The 'garbage' in the ASCII representation is actually the README's blob hash in binary.
35# echo "The hard way" > test.txt
Let's create a commit that adds this new file just using Git plumbing commands (git add etc. are 'porcelain').
36# git hash-object -w test.txt
3b85187168e709784298f3f62ea2aed5f496e5eb
hash-object calculates the hash of the file (and, with -w, adds it to Git objects).
So we have the blob, but no corresponding tree or commit yet. Actually, that file
is not even staged...
37# git update-index --add --cacheinfo 100644 3b85187168e709784298f3f62ea2aed5f496e5eb test.txt
hash-object and update-index are the plumbing of git add. The 'cacheinfo' parameter contains file permissions.
38# git ls-files --stage
100644 5b6c6cb672dc1c3e3f38da4cc819c07da510fb59 0 README 100644 3b85187168e709784298f3f62ea2aed5f496e5eb 0 test.txt
This is the content of the .git/index file.
39# git status
On branch test Changes to be committed: (use "git restore --staged <file>..." to unstage) new file: test.txt
It worked! test.txt is a "new file". However, we still have no dedicated tree object yet - it's still all in the index.
40# git write-tree
9240cdb2b8598f50cb8b66328b5c31d077d14470
This took the index and created a tree object from it. We still need the commit object.
41# echo "a commit, done the hard way" | git commit-tree 9240cdb2b8598f50cb8b66328b5c31d077d14470 -p c8d9b9c01eea11fb1032903b0dd2bea3eeb46f48
5350fa43e7e3a6263c85e47d24b3351f84be9a22
We have to reference the parent here.
42# git cat-file -p 5350fa43e7e3a6263c85e47d24b3351f84be9a22
tree 9240cdb2b8598f50cb8b66328b5c31d077d14470 parent c8d9b9c01eea11fb1032903b0dd2bea3eeb46f48 author Ijon Tichy <ijon@beteigeuze.space> 946728000 +0000 committer Ijon Tichy <ijon@beteigeuze.space> 946728000 +0000 a commit, done the hard way
Looks fine!
43# git log
commit c8d9b9c01eea11fb1032903b0dd2bea3eeb46f48
Author: Ijon Tichy <ijon@beteigeuze.space>
Date:   Sat Jan 1 12:00:00 2000 +0000
    first commit
...but the new commit doesn't show up in the log yet since our HEAD is still the previous commit, and .git/refs/heads/master still needs to get updated.
44# echo 5350fa43e7e3a6263c85e47d24b3351f84be9a22 > .git/refs/heads/test
45# git log --format=fuller
commit 5350fa43e7e3a6263c85e47d24b3351f84be9a22
Author:     Ijon Tichy <ijon@beteigeuze.space>
AuthorDate: Sat Jan 1 12:00:00 2000 +0000
Commit:     Ijon Tichy <ijon@beteigeuze.space>
CommitDate: Sat Jan 1 12:00:00 2000 +0000
    a commit, done the hard way
commit c8d9b9c01eea11fb1032903b0dd2bea3eeb46f48
Author:     Ijon Tichy <ijon@beteigeuze.space>
AuthorDate: Sat Jan 1 12:00:00 2000 +0000
Commit:     Ijon Tichy <ijon@beteigeuze.space>
CommitDate: Sat Jan 1 12:00:00 2000 +0000
    first commit
Great! This concludes a 'manual' commit using Git plumbing commands.
You can see that going full manual, i.e., creating the files
needed to represent a commit in the .git/objects directory just
using echo etc., would not be a big problem either.
But isn't what we saw so far horribly inefficient once it comes to
file changes? No diffs are saved ever, and each file version gets
compressed to a new object file?
That's right, but there's another layer of object storage in Git
called 'packfiles'.
Let's create a new empty branch for testing that.
46# git checkout --orphan packfile_demo && git rm --cached -r . && rm *
Switched to a new branch 'packfile_demo' rm 'README' rm 'test.txt'
Then, let's create a large file.
47# for i in {1..10000}; do echo $i >> largefile.txt; done && tail -v largefile.txt && git add largefile.txt && git commit -m "a large file"
==> largefile.txt <== 9991 9992 9993 9994 9995 9996 9997 9998 9999 10000 [packfile_demo (root-commit) cec918b] a large file 1 file changed, 10000 insertions(+) create mode 100644 largefile.txt
There's our large file (10000 numbered lines).
48# find .git/objects -type f && du -h --max-depth=0 .git/objects
.git/objects/5b/6c6cb672dc1c3e3f38da4cc819c07da510fb59 .git/objects/b3/5c99875f5758f64e9348c05dac14848a046f59 .git/objects/6e/cf00219d83579a35e3a1daae2615f753c0ec0f .git/objects/c8/d9b9c01eea11fb1032903b0dd2bea3eeb46f48 .git/objects/3b/85187168e709784298f3f62ea2aed5f496e5eb .git/objects/92/40cdb2b8598f50cb8b66328b5c31d077d14470 .git/objects/53/50fa43e7e3a6263c85e47d24b3351f84be9a22 .git/objects/98/12045fd898ce41f5a4019dc2c1e4fff5884566 .git/objects/55/33519bed1c0129ebd0909a43686f9b735d0e29 .git/objects/ce/c918bfb2ed2e03c8add9c9b2f6529cae1216e5 56K .git/objects
Note we have just a handful of files in the objects Git directory
that take up little space.
Let's add stuff to the one large file and commit the change;
repeat that a hundred times.
49# { for i in {1..100}; do echo "Adding more... $i" >> largefile.txt; git commit -m "adding to largefile.txt, $i" largefile.txt; done } | tail --l 15
1 file changed, 1 insertion(+) [packfile_demo a5cd302] adding to largefile.txt, 94 1 file changed, 1 insertion(+) [packfile_demo 04265c8] adding to largefile.txt, 95 1 file changed, 1 insertion(+) [packfile_demo 2cb1a0d] adding to largefile.txt, 96 1 file changed, 1 insertion(+) [packfile_demo c80d763] adding to largefile.txt, 97 1 file changed, 1 insertion(+) [packfile_demo 245005d] adding to largefile.txt, 98 1 file changed, 1 insertion(+) [packfile_demo e9fa7d0] adding to largefile.txt, 99 1 file changed, 1 insertion(+) [packfile_demo ddd7a4e] adding to largefile.txt, 100 1 file changed, 1 insertion(+)
Now, let's have a look at the Git internal objects.
50# echo -n "Number of files in objects dir: " && find .git/objects -type f | wc -l && du -h --max-depth=0 .git/objects
Number of files in objects dir: 310 2,8M .git/objects
That storage ballooned quite a bit.
Modifying and committing one file 100 times resulted in
100*3 (commit, tree, blob) files, and we have 100
near-identical (compressed) copies of the large file
in object storage now.
51# git gc
garbage collection (which is a bit of a misnomer as it includes repacking) takes the individual object files and repacks them into packfiles, storing only differences for object files that are similar.
52# find .git/objects -type f && du -h --max-depth=0 .git/objects
.git/objects/pack/pack-512fa896d13897b83753e4401f2204d0c4908516.pack .git/objects/pack/pack-512fa896d13897b83753e4401f2204d0c4908516.rev .git/objects/pack/pack-512fa896d13897b83753e4401f2204d0c4908516.mtimes .git/objects/pack/pack-512fa896d13897b83753e4401f2204d0c4908516.idx .git/objects/pack/pack-cab01d1c43db861740e4cd83ad7e2351b42f449a.pack .git/objects/pack/pack-cab01d1c43db861740e4cd83ad7e2351b42f449a.rev .git/objects/pack/pack-cab01d1c43db861740e4cd83ad7e2351b42f449a.idx .git/objects/info/packs .git/objects/info/commit-graph 156K .git/objects
The objects directory is much smaller again.
53# find .git/refs -type f
But where did our branch references go?
54# cat .git/packed-refs
# pack-refs with: peeled fully-peeled sorted c8d9b9c01eea11fb1032903b0dd2bea3eeb46f48 refs/heads/master ddd7a4eaded2729e3bd83f4bf74b549d39460402 refs/heads/packfile_demo 5350fa43e7e3a6263c85e47d24b3351f84be9a22 refs/heads/test
Similar to the object packfile format, Git may
manage references in an optimized manner.
Some projects have thousands of branches (and tags),
and managing those in individual files is a waste.
See git-pack-refs for details.
Do the plumbing commands (cat-file etc.) still work?
55# git cat-file -p ddd7a4e
tree aaee22c35ff84b15b28b1baa0ef121c9bb217b69 parent e9fa7d0e1ad0936501b478ec962e84b8412cac82 author Ijon Tichy <ijon@beteigeuze.space> 946728000 +0000 committer Ijon Tichy <ijon@beteigeuze.space> 946728000 +0000 adding to largefile.txt, 100
The packfile layer is transparent to plumbing commands,
e.g., cat-file will work as before, accessing packfiles
instead of plain object files if necessary.
If you want to know more about packfiles:
https://git-scm.com/book/en/v2/Git-Internals-Packfiles
Up to something completely different.
Some notes on the differences between
git checkout, git reset --soft, git reset (--mixed), git reset --hard...
56# git checkout master && git status && head -v .git/HEAD
Switched to branch 'master' On branch master nothing to commit, working tree clean ==> .git/HEAD <== ref: refs/heads/master
checkout updates index and working directory.
checkout does not alter any branch HEAD (just .git/HEAD).
After checkout, the index and working directory (tree) will be identical
to the chosen commit (tree) (with default options).
57# git checkout c8d9b9c01eea11fb1032903b0dd2bea3eeb46f48 && git status
Note: switching to 'c8d9b9c01eea11fb1032903b0dd2bea3eeb46f48'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by switching back to a branch. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -c with the switch command. Example: git switch -cOr undo this operation with: git switch - Turn off this advice by setting config variable advice.detachedHead to false HEAD is now at c8d9b9c first commit HEAD detached at c8d9b9c nothing to commit, working tree clean 
Specifying a commit hash for checkout will result in "detached HEAD" state.
58# cat .git/HEAD
c8d9b9c01eea11fb1032903b0dd2bea3eeb46f48
Note HEAD is just a hash now, not a ref:... reference to some
.git/refs/heads/BRANCH pointer.
You can even commit things...
59# echo "commit in detached head" > detached.txt && git add detached.txt && git commit -m "detached.txt"
[detached HEAD 4bffad8] detached.txt 1 file changed, 1 insertion(+) create mode 100644 detached.txt
60# git checkout test
Warning: you are leaving 1 commit behind, not connected to any of your branches: 4bffad8 detached.txt If you want to keep it by creating a new branch, this may be a good time to do so with: git branch4bffad8 Switched to branch 'test' 
...and Git will helpfully warn you when moving away that without creating a branch or tag pointing to the last commit, it's dangling (a "loose object"). It'll be retrievable by hash only, and might get removed by garbage collection in a while (see gc.pruneExpire, default is two weeks).
61# git reflog | head
5350fa4 HEAD@{0}: checkout: moving from 4bffad870e307173ee175f4f3929bc39ba0fb772 to test
4bffad8 HEAD@{1}: commit: detached.txt
c8d9b9c HEAD@{2}: checkout: moving from master to c8d9b9c01eea11fb1032903b0dd2bea3eeb46f48
c8d9b9c HEAD@{3}: checkout: moving from packfile_demo to master
The reflog is a local log of the HEAD pointer and other references.
Whenever you do a commit/checkout/reset, a line will be written to this log.
The log isn't part of the actual repo and will not be shared by "git push"
and the like.
Note that the fun we had when the plumbing commands didn't update
the reflog.
It's a handy thing to look at if you got lost at any point, or are
working with detached HEAD and the like.
62# git reflog --date=iso | head
5350fa4 HEAD@{2000-01-01 12:00:00 +0000}: checkout: moving from 4bffad870e307173ee175f4f3929bc39ba0fb772 to test
4bffad8 HEAD@{2000-01-01 12:00:00 +0000}: commit: detached.txt
c8d9b9c HEAD@{2000-01-01 12:00:00 +0000}: checkout: moving from master to c8d9b9c01eea11fb1032903b0dd2bea3eeb46f48
c8d9b9c HEAD@{2000-01-01 12:00:00 +0000}: checkout: moving from packfile_demo to master
reflog can also show the time the reflog entry was created along with
the other data. Note that in this demo, the date is broken due to us
fixing the dates using env variables.
Note the reflog entries expire (see gc.reflogExpire, default 90 days).
Also, the reflog provides functionality such as the master@{one.week.ago}
notation, which really looks at the reflog (i.e., "what did master point
to one week ago on this machine") and NOT at the commit log.
63# git log -g | head --l 16
commit 5350fa43e7e3a6263c85e47d24b3351f84be9a22
Reflog: HEAD@{0} (Ijon Tichy <ijon@beteigeuze.space>)
Reflog message: checkout: moving from 4bffad870e307173ee175f4f3929bc39ba0fb772 to test
Author: Ijon Tichy <ijon@beteigeuze.space>
Date:   Sat Jan 1 12:00:00 2000 +0000
    a commit, done the hard way
commit 4bffad870e307173ee175f4f3929bc39ba0fb772
Reflog: HEAD@{1} (Ijon Tichy <ijon@beteigeuze.space>)
Reflog message: commit: detached.txt
Author: Ijon Tichy <ijon@beteigeuze.space>
Date:   Sat Jan 1 12:00:00 2000 +0000
    detached.txt
Note git log also has a variant that walks the reflog instead of the
commit ancestry.
Up to git reset...
64# git checkout test && echo "...plus more text" >> test.txt && git add test.txt && git commit -m "changing test.txt" && git log --pretty=oneline
Already on 'test' [test 5395c9c] changing test.txt 1 file changed, 1 insertion(+) 5395c9ce4bd2ccb14dd3f7b847694fe87b2c2d94 changing test.txt 5350fa43e7e3a6263c85e47d24b3351f84be9a22 a commit, done the hard way c8d9b9c01eea11fb1032903b0dd2bea3eeb46f48 first commit
To recap, in the test branch, we started with one commit adding the README, then one commit adding test.txt, and we just committed a change to test.txt.
65# git reset --soft 5350fa4 && git status
On branch test Changes to be committed: (use "git restore --staged <file>..." to unstage) modified: test.txt
reset --soft moves the HEAD of the current branch to the selected
tree/commit.
It does *not* touch the index nor the working directory.
In consequence, after soft reset, git status will show differences
of your (unchanged) working directory and index to the branch HEAD
that has been reset.
That means that if you soft reset to any commit, then git commit
again immediately, the resulting tree of the new commit will be
identical to your starting working directory. One thing you can
easily do with that is squashing commits within a branch, but
probably rebase --interactive (we will look at that later) is
better suited for that.
If you want to get rid of changes of a commit, reset --soft
is not what you want.
66# git reset 5350fa4 && git status
Unstaged changes after reset: M test.txt On branch test Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git restore <file>..." to discard changes in working directory) modified: test.txt no changes added to commit (use "git add" and/or "git commit -a")
reset --mixed (the default) changes current branch HEAD *and* index
to the selected tree/commit.
It does not touch the working directory.
This command is good for reworking commit(s), e.g., splitting
changes that have been accidentally put into one commit,
but similar to reset --soft, probably rebase --interactive
will be the better choice for this.
Again, if you want to get rid of changes of a commit,
reset --mixed is not what you want.
67# git reset --hard 5350fa4 && git status && head -v test.txt
HEAD is now at 5350fa4 a commit, done the hard way On branch test nothing to commit, working tree clean ==> test.txt <== The hard way
reset --hard additionally overwrites the working directory with
the index. Any uncommitted changes of the working directory will be lost.
This is the go-to command to get rid of commits completely,
switching around branches (e.g., if you want to switch master
and dev branches), or get rid of any local changes (e.g.,
git reset --hard origin/master).
Note that all reset commands potentially move HEAD back in
history (or to some commit that has no common ancestor with
the previous state even). If that is done, if working with
remote repositories, you will need to be able to force push.
Time to dive into remote repositories.
68# ls -1 ../fakeremote
config description HEAD hooks info objects refs
For the purposes of this demo, we use a pre-initialized
local bare repository as remote. A bare repository is
basically just the contents of the .git/ folder, without
any working directory.
This highlights a key aspect of what remotes
are: They're basically just pointers to a separate .git/
directory, regardless of whether they're reachable
via SSH, HTTP, or directly via filesystem access.
69# git clone ../fakeremote git-playground
Cloning into 'git-playground'... done.
Just pretend this was something like
git clone git@someserver:git-playground.git
Cloning a remote repository basically sets up a local
empty .git/ repository and adds the remote repository
as a remote called 'origin'. When using defaults, git clone
then connects to the origin, fetches its Git object files,
creates remote-tracking branches for the branches of the
remote, then creates a local master branch, sets its HEAD
to origin/master and checks it out.
Note that if you connect to an actual remote server,
it will output "Enumerating objects" etc. messages during
clone; that's the remote server repacking (only) those
object files that are needed to finish the operation.
I.e., any "loose objects" etc. are not transmitted,
and in case you used the --depth or --single-branch
options with git clone, just a fraction of the remote's
objects will be transmitted typically.
70# cat git-playground/.git/config
[core] repositoryformatversion = 0 filemode = true bare = false logallrefupdates = true [remote "origin"] url = example/../fakeremote fetch = +refs/heads/*:refs/remotes/origin/* [branch "master"] remote = origin merge = refs/heads/master
.git/config is used to keep track of the fact that the local master branch is tracking a remote repository branch.
71# find git-playground/.git/refs -type f && tail -v git-playground/.git/refs/remotes/origin/HEAD
git-playground/.git/refs/heads/master git-playground/.git/refs/remotes/origin/HEAD ==> git-playground/.git/refs/remotes/origin/HEAD <== ref: refs/remotes/origin/master
No magic: Remote branches are just text files
containing commit references, just as are local branches.
There's no .git/refs/remotes/origin/master though...?
72# cat git-playground/.git/packed-refs
# pack-refs with: peeled fully-peeled sorted 27c0b46416b5c6ed7b0d75b835c06cabefb8c044 refs/remotes/origin/master
Remember references may get packed instead of put in their own file.
Let's go back to the previous local example repository and do some cleanup.
73# rm -rf git-playground && git checkout master && git branch -D packfile_demo && git branch -D test
Switched to branch 'master' Deleted branch packfile_demo (was ddd7a4e). Deleted branch test (was 5350fa4).
...and add the remote under the name 'playground':
74# git remote add playground ../fakeremote && git remote -v
playground ../fakeremote (fetch) playground ../fakeremote (push)
There's no need to start by cloning; you can add a remote to an existing local repository as well.
75# git branch -a
* master
No change is visible yet, even with the new remote added.
76# git fetch playground && git branch -a
From ../fakeremote * [new branch] master -> playground/master * master remotes/playground/HEAD -> playground/master remotes/playground/master
After git fetch, we see the remote branches. fetch doesn't change local branches nor the index nor the working directory.
77# cat .git/refs/remotes/playground/master && grep "master" .git/packed-refs
27c0b46416b5c6ed7b0d75b835c06cabefb8c044 c8d9b9c01eea11fb1032903b0dd2bea3eeb46f48 refs/heads/master
Note that remotes/playground/master is completely
different from our local master as currently these
repositories have noting in common, which Git was
also pointing out nicely during fetch.
By the way, you probably don't want to use grep and
cat to resolve references, especially with references
getting stored in two different ways possibly.
78# git show-ref master
c8d9b9c01eea11fb1032903b0dd2bea3eeb46f48 refs/heads/master 27c0b46416b5c6ed7b0d75b835c06cabefb8c044 refs/remotes/playground/master
...is probably easier.
79# cat .git/config
[core] repositoryformatversion = 0 filemode = true bare = false logallrefupdates = true [user] name = Ijon Tichy email = ijon@beteigeuze.space [remote "playground"] url = ../fakeremote fetch = +refs/heads/*:refs/remotes/playground/*
Note that adding a remote didn't make any of our
local branches track a remote one, in contrast to
when cloning a repo.
Say we want to push the local master to the remote.
Does simple git push work?
80# git push playground master
To ../fakeremote ! [rejected] master -> master (non-fast-forward) error: failed to push some refs to '../fakeremote' hint: Updates were rejected because the tip of your current branch is behind hint: its remote counterpart. If you want to integrate the remote changes, hint: use 'git pull' before pushing again. hint: See the 'Note about fast-forwards' in 'git push --help' for details.
No. Our local master branch has no reference to the
current remote master HEAD in its history, i.e.,
the remote master HEAD is not any ancestor of our
local master, so standard push will fail.
For the time being, let's push the local master
to the remote, under another branch name.
81# git push playground master:master_in_playground
To ../fakeremote * [new branch] master -> master_in_playground
git push supports LOCALBRANCH:REMOTEBRANCH syntax for pushing a local branch to a remote under a different name.
82# cat .git/config
[core] repositoryformatversion = 0 filemode = true bare = false logallrefupdates = true [user] name = Ijon Tichy email = ijon@beteigeuze.space [remote "playground"] url = ../fakeremote fetch = +refs/heads/*:refs/remotes/playground/*
Note that just pushing our branch does *not* make our local master track the remote master_in_playground branch...
83# git pull
There is no tracking information for the current branch.
Please specify which branch you want to rebase against.
See git-pull(1) for details.
    git pull  
If you wish to set tracking information for this branch you can do so with:
    git branch --set-upstream-to=playground/ master
   
...which means that git pull does not know what to do.
84# git branch -u playground/master_in_playground
branch 'master' set up to track 'playground/master_in_playground'.
branch -u (shorthand for branch --set-upstream-to) makes the current branch track a remote branch. When pushing a branch to a remote for the first time, the -u flag is available as well.
85# cat .git/config
[core] repositoryformatversion = 0 filemode = true bare = false logallrefupdates = true [user] name = Ijon Tichy email = ijon@beteigeuze.space [remote "playground"] url = ../fakeremote fetch = +refs/heads/*:refs/remotes/playground/* [branch "master"] remote = playground merge = refs/heads/master_in_playground
The tracking info has been added to .git/config...
86# git pull
Already up to date.
...and git pull works just as expected.
87# git push playground --delete master_in_playground
To ../fakeremote - [deleted] master_in_playground
We did that only for demo purposes and delete the
remote master_in_playground branch again.
Previously, the default push to remote master failed.
Let's force push which just overwrites the remote master
HEAD without any checks.
88# git push --force playground master
To ../fakeremote + 27c0b46...c8d9b9c master -> master (forced update)
That works. It might not for 'true' remotes that have
branch protection enabled. This feature disallows (force) pushes
if there's no reference to the current remote HEAD in the
pushed branch history; i.e., for protected branches you
are limited to adding commits on top.
Protected branches are a feature of Git services such as
GitHub and GitLab, and can get configured in their web UIs.
Let's check if the push actually worked.
89# git show-ref master
c8d9b9c01eea11fb1032903b0dd2bea3eeb46f48 refs/heads/master c8d9b9c01eea11fb1032903b0dd2bea3eeb46f48 refs/remotes/playground/master
It did. The local master HEAD and the remote playground/master
are identical now.
Let's not forget to set up remote tracking.
90# git branch -u playground/master
branch 'master' set up to track 'playground/master'.
Now, let's play around with commits, pushes, merges, and rebasing.
91# echo "Commit A" > commit_a && git add commit_a && git commit -m 'commit_a' && git push playground
To ../fakeremote c8d9b9c..21d78fc master -> master [master 21d78fc] commit_a 1 file changed, 1 insertion(+) create mode 100644 commit_a
...so now we have a file 'commit_a' both locally and on the remote. Let's undo that commit locally.
92# git reset --hard HEAD~ && git status
HEAD is now at c8d9b9c first commit On branch master Your branch is behind 'playground/master' by 1 commit, and can be fast-forwarded. (use "git pull" to update your local branch) nothing to commit, working tree clean
As expected, since we 'forgot' the last commit locally, the remote is ahead of us now. Let's ignore that and add another file locally, just as would happen if we kept developing while someone else pushed new commits to the server.
93# echo "Commit B" > commit_b && git add commit_b && git commit -m 'commit_b' && git status
[master 40a1d20] commit_b 1 file changed, 1 insertion(+) create mode 100644 commit_b On branch master Your branch and 'playground/master' have diverged, and have 1 and 1 different commits each, respectively. (use "git pull" if you want to integrate the remote branch with yours) nothing to commit, working tree clean
Local and remote master have diverged. push will fail now; force push
would overwrite commit A in the remote repo.
Time for a bit of visualization, finally.
94# git config --local --add alias.graph "log --graph --all --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset' --abbrev-commit --date=relative --date-order" && git graph
* 40a1d20 - (HEAD -> master) commit_b (26 years ago) <Ijon Tichy> | * 21d78fc - (playground/master, playground/HEAD) commit_a (26 years ago) <Ijon Tichy> |/ * c8d9b9c - first commit (26 years ago) <Ijon Tichy>
Now, what happens if we merge the remote master?
Merge never changes existing commits (but may create a new commit and a new tree).
Typically, this means that other branches' changes are put *on top* of the current branch commits.
But actually Git doesn't track diffs, so a merge commit is just a marker that two trees have been joined.
Any merge conflicts get resolved 'within' the merge commit.
That's nasty if there have been large conflicts as errors in conflict resolving are difficult to spot.
Also, merging creates a bit of a convoluted git history:
95# git merge --no-edit playground/master && git graph
Merge made by the 'ort' strategy. commit_a | 1 + 1 file changed, 1 insertion(+) create mode 100644 commit_a * 208e599 - (HEAD -> master) Merge remote-tracking branch 'playground/master' (26 years ago) <Ijon Tichy> |\ * | 40a1d20 - commit_b (26 years ago) <Ijon Tichy> | * 21d78fc - (playground/master, playground/HEAD) commit_a (26 years ago) <Ijon Tichy> |/ * c8d9b9c - first commit (26 years ago) <Ijon Tichy>
We could push now, but that history is a bit convoluted
for no good reason, right? It's not like the merge commit
adds a lot of information here; it rather complicates things.
Rebase takes another branch and puts the current branches' changes
on top that, one by one.
This of course changes commit hashes of the current branch
(the history of a commit is part of the basis of its hash) but makes
the local branch a straightforward continuiation of the remote.
Let's reset to commit_b and rebase onto the remote master.
96# git reset --hard 40a1d20 && git rebase playground/master && git graph
Rebasing (1/1) Successfully rebased and updated refs/heads/master. HEAD is now at 40a1d20 commit_b * 2d78b2c - (HEAD -> master) commit_b (26 years ago) <Ijon Tichy> * 21d78fc - (playground/master, playground/HEAD) commit_a (26 years ago) <Ijon Tichy> * c8d9b9c - first commit (26 years ago) <Ijon Tichy>
Much clearer. Note that the commit hash of our local commit_b has changed because now it has commit_a as parent instead of the first commit.
97# git status
On branch master Your branch is ahead of 'playground/master' by 1 commit. (use "git push" to publish your local commits) nothing to commit, working tree clean
But that's okay as we can push now without any further complications.
98# git push && git graph
To ../fakeremote 21d78fc..2d78b2c master -> master * 2d78b2c - (HEAD -> master, playground/master, playground/HEAD) commit_b (26 years ago) <Ijon Tichy> * 21d78fc - commit_a (26 years ago) <Ijon Tichy> * c8d9b9c - first commit (26 years ago) <Ijon Tichy>
There.
Of course, merge commits have their uses. For example, they are a
good way to document the development process if the merge is the
result of a (non-trivial) Pull Request: Without the merge commit,
there would be no link to the Pull Request in the commit history.
However, if you want to merge trivial things from a PR, do rebase your
changes onto the destination branch first, then merge using
git merge --ff-only MYBRANCH. This is what "Rebase and merge" in GitHub
does as well (only they know why they don't indicate clearly that
there'll be no merge commit in that case though, which might not
always be desirable).
And IF you want to merge nontrivial things from a PR, do rebase your
changes, then do a normal merge creating a merge commit so that
the history has a pointer to the original PR. Rebasing first
makes sure you don't have to resolve conflicts in the merge commit,
which would be a nasty thing (e.g., mistakes introduced in merge
commit conflict resolution are really hard to find later).
Instead of a rebase, we could also reset --hard to the branch we
want to rebase onto, then git cherry-pick all the commits we want
to add.
What is difference of cherry-pick and merge?
git merge looks for the common ancestor, then does a diff between
that ancestor and the specified commit, applies the diff to the
current index, then commits the result, giving the specified commit as
an additional parent of the merge commit (easy, isn't it?).
git cherry-pick doesn't look for an ancestor; it just diffs from
the specified commit to its parent and applies and commits that
diff. So cherry-pick is really only about changes introduced in
single commits whereas merge is concerned with "everything up to"
the specified commit.
Let's reset to an older commit, then cherry pick commits.
99# git reset --hard c8d9b9c && echo "---" && git cherry-pick 21d78fc 2d78b2c && echo "---" && git log --pretty=oneline
HEAD is now at c8d9b9c first commit --- [master 21d78fc] commit_a Date: Sat Jan 1 12:00:00 2000 +0000 1 file changed, 1 insertion(+) create mode 100644 commit_a [master 2d78b2c] commit_b Date: Sat Jan 1 12:00:00 2000 +0000 1 file changed, 1 insertion(+) create mode 100644 commit_b --- 2d78b2cfec3aa09041ccf4772453003692fa69ec commit_b 21d78fce05d66a5c99b56dc77a511d1bc28706e1 commit_a c8d9b9c01eea11fb1032903b0dd2bea3eeb46f48 first commit
Note the hashes stayed the same! This won't happen in practice, if you re-arrange commits, for example (and you didn't pin commit and author dates, as we did at the beginning).
100# git reset --hard c8d9b9c && echo "---" && git cherry-pick 2d78b2c 21d78fc && echo "---" && git log --pretty=oneline
HEAD is now at c8d9b9c first commit --- [master 40a1d20] commit_b Date: Sat Jan 1 12:00:00 2000 +0000 1 file changed, 1 insertion(+) create mode 100644 commit_b [master b43bd9c] commit_a Date: Sat Jan 1 12:00:00 2000 +0000 1 file changed, 1 insertion(+) create mode 100644 commit_a --- b43bd9c7eec7d1d2bb7ac3a3641ff408bace8f5d commit_a 40a1d20f7a96609e8767ef9c8da9a29d2244fb88 commit_b c8d9b9c01eea11fb1032903b0dd2bea3eeb46f48 first commit
We changed the order of the two commits, and now their commit hashes have changed because their "parent" commit has changed, and that metadata is part of the commit hash.
A more comfortable and versatile way of rearranging commits is using interactive rebase. In standard usage, it opens a list of commits since the specified commit and lets you rework those commits; this includes rearranging/amending/editing/merging/dropping commits.
101# GIT_SEQUENCE_EDITOR=cat git rebase --interactive c8d9b9c
Successfully rebased and updated refs/heads/master. pick 40a1d20 commit_b pick b43bd9c commit_a # Rebase c8d9b9c..b43bd9c onto c8d9b9c (2 commands) # # Commands: # p, pick <commit> = use commit # r, reword <commit> = use commit, but edit the commit message # e, edit <commit> = use commit, but stop for amending # s, squash <commit> = use commit, but meld into previous commit # f, fixup [-C | -c] <commit> = like "squash" but keep only the previous # commit's log message, unless -C is used, in which case # keep only this commit's message; -c is same as -C but # opens the editor # x, exec <command> = run command (the rest of the line) using shell # b, break = stop here (continue rebase later with 'git rebase --continue') # d, drop <commit> = remove commit # l, label <label> = label current HEAD with a name # t, reset <label> = reset HEAD to a label # m, merge [-C <commit> | -c <commit>] <label> [# <oneline>] # create a merge commit using the original merge commit's # message (or the oneline, if no original merge commit was # specified); use -c <commit> to reword the commit message # u, update-ref <ref> = track a placeholder for the <ref> to be updated # to this position in the new commits. The <ref> is # updated at the end of the rebase # # These lines can be re-ordered; they are executed from top to bottom. # # If you remove a line here THAT COMMIT WILL BE LOST. # # However, if you remove everything, the rebase will be aborted. #
(note the GIT_SEQUENCE_EDITOR=cat thing here is just to make the
command non-interactive for the sake of this presentation)
Git is nice and displays a rather comprehensive help along with
the commit list as well.
So, for rearranging commits in the style we did above using
reset plus cherry-pick, we can just edit that list as well.
102# git reset --hard 2d78b2
HEAD is now at 2d78b2c commit_b
...first, reset back to the "commit A first, then commit B" version...
103# GIT_SEQUENCE_EDITOR="../reverse_file" git rebase --interactive c8d9b9c && git log --pretty=oneline
Rebasing (1/2) Rebasing (2/2) Successfully rebased and updated refs/heads/master. b43bd9c7eec7d1d2bb7ac3a3641ff408bace8f5d commit_a 40a1d20f7a96609e8767ef9c8da9a29d2244fb88 commit_b c8d9b9c01eea11fb1032903b0dd2bea3eeb46f48 first commit
And behold, it's the same result as with the cherry picks:
Now commit B comes first, and commit A is second.
Note conflicts occurring during rebase may need some concentration
to resolve. If you want to do something complex, consider issuing
multiple rebase --interactive commands, rearranging and squashing
commits in different runs. Take a look at git status often. Remember
there's always git rebase --abort.
In general, a good practice is to do a rebase --interactive on your
PR branches just before merging them in order to clean up the
branch (not needed if you squash the PR branch commits in the merge
anyways).
104# git reset --hard 2d78b2
HEAD is now at 2d78b2c commit_b
...again, reset back to "commit A first, then commit B".
105# echo "Fixed commit A" > commit_a && git commit -m "fixup! commit_a" commit_a
[master a6a7ad1] fixup! commit_a 1 file changed, 1 insertion(+), 1 deletion(-)
A quick look at a nice goodie built into rebase --interactive: When using the syntax "fixup! (some previous commit message)" as a commit message, that commit will be squashed into the referenced previous commit on a rebase --interactive --autosquash.
106# GIT_SEQUENCE_EDITOR=cat git rebase --interactive c8d9b9c --autosquash
Rebasing (2/3) Rebasing (3/3) Successfully rebased and updated refs/heads/master. pick 21d78fc commit_a fixup a6a7ad1 fixup! commit_a pick 2d78b2c commit_b # Rebase c8d9b9c..a6a7ad1 onto c8d9b9c (3 commands) # # Commands: # p, pick <commit> = use commit # r, reword <commit> = use commit, but edit the commit message # e, edit <commit> = use commit, but stop for amending # s, squash <commit> = use commit, but meld into previous commit # f, fixup [-C | -c] <commit> = like "squash" but keep only the previous # commit's log message, unless -C is used, in which case # keep only this commit's message; -c is same as -C but # opens the editor # x, exec <command> = run command (the rest of the line) using shell # b, break = stop here (continue rebase later with 'git rebase --continue') # d, drop <commit> = remove commit # l, label <label> = label current HEAD with a name # t, reset <label> = reset HEAD to a label # m, merge [-C <commit> | -c <commit>] <label> [# <oneline>] # create a merge commit using the original merge commit's # message (or the oneline, if no original merge commit was # specified); use -c <commit> to reword the commit message # u, update-ref <ref> = track a placeholder for the <ref> to be updated # to this position in the new commits. The <ref> is # updated at the end of the rebase # # These lines can be re-ordered; they are executed from top to bottom. # # If you remove a line here THAT COMMIT WILL BE LOST. # # However, if you remove everything, the rebase will be aborted. #
Nice: The fixup commit has been moved immediately after the commit
it references, and the action has been changed to "fixup" as well.
Let's have a look at the history...
107# git log --pretty=oneline
db259018e09184c90d2996a0df7c2c1f7805d827 commit_b c401edd2ebc00980cb8ee1298777909de1733793 commit_a c8d9b9c01eea11fb1032903b0dd2bea3eeb46f48 first commit
The fixup commit is gone and has been melded into the original commit.
Let's reset to a cleaner state for the next steps.
108# git reset --hard 2d78b2c
HEAD is now at 2d78b2c commit_b
Often, you will want to work on several branches that will never
be identical, e.g., a development and a production branch that
will diverge with regard to configuration and some code (debugging, etc.).
You don't want production-only commits winding up in development;
you don't want development-only commits getting merged into production.
How can you do that?
109# git branch production && git checkout production && echo "Production config" > production.conf && git add production.conf && git commit -m "production config"
Switched to branch 'production' [production caae1d1] production config 1 file changed, 1 insertion(+) create mode 100644 production.conf
110# git checkout master && echo "Development config" > development.conf && git add development.conf && git commit -m "development config"
Switched to branch 'master' Your branch is up to date with 'playground/master'. [master 0f12049] development config 1 file changed, 1 insertion(+) create mode 100644 development.conf
Ok great, so let's do some development in the master branch.
111# echo -e "#!/bin/bash\necho 'Hello world'" > hello_world.sh && chmod a+x hello_world.sh && git add hello_world.sh && git commit -m "add hello_world.sh"
[master 0a5cb55] add hello_world.sh 1 file changed, 2 insertions(+) create mode 100755 hello_world.sh
Now, let's merge that into production.
112# git checkout production && git merge --no-edit master
Switched to branch 'production' Merge made by the 'ort' strategy. development.conf | 1 + hello_world.sh | 2 ++ 2 files changed, 3 insertions(+) create mode 100644 development.conf create mode 100755 hello_world.sh
That's not great. The development config was merged into production as well.
We'll have to undo that.
But while we're at it, how does a merge commit look like?
113# git log -1
commit 655b1cfd043d03966c5efcd5862535e7397edc35
Merge: caae1d1 0a5cb55
Author: Ijon Tichy <ijon@beteigeuze.space>
Date:   Sat Jan 1 12:00:00 2000 +0000
    Merge branch 'master' into production
Ok, and what does a merge commit look internally?
114# git cat-file -p 655b1cfd043d03966c5efcd5862535e7397edc35
tree bf9910b30216b4aca40c9f6c253f6e1880529399 parent caae1d1f565d8d0f370d7db95af481e88b72f253 parent 0a5cb5530949f158d2e02f0ca8d6755bf90cce27 author Ijon Tichy <ijon@beteigeuze.space> 946728000 +0000 committer Ijon Tichy <ijon@beteigeuze.space> 946728000 +0000 Merge branch 'master' into production
A merge commit has several parent commits instead of one
parent.
Note that in the low level commit view this is nothing
special at all - it does not spell out "merge"
anywhere, and any commit might have 100 parent commits
just as well as one or two parents, and yes,
git merge actually supports merging more than one
branch at once.
Also, note that there's no "main parent" or anything
like that. All the parent metadata entries say is that
the content of those parent commits is "taken care of"
in this commit tree, and that's just the same for
commits that have only one parent.
Anyways, we didn't want to merge development config.
Let's quickly undo that.
115# git reset --hard caae1d1 && git graph
HEAD is now at caae1d1 production config * 0a5cb55 - (master) add hello_world.sh (26 years ago) <Ijon Tichy> | * caae1d1 - (HEAD -> production) production config (26 years ago) <Ijon Tichy> * | 0f12049 - development config (26 years ago) <Ijon Tichy> |/ * 2d78b2c - (playground/master, playground/HEAD) commit_b (26 years ago) <Ijon Tichy> * 21d78fc - commit_a (26 years ago) <Ijon Tichy> * c8d9b9c - first commit (26 years ago) <Ijon Tichy>
We have to tell Git to ignore the commit that added the development
config when merging.
This can be done by changing the "merge strategy".
The default merge strategy is "recursive" which does merges
as we all know.
There are other strategies as well, including the "ours"
strategy, which actually ignores the things it is told
to merge. That means it essentially marks things as
merged (on commit/Git history level) when they are not
(on file level). Great! That's what we want.
116# git merge -s ours -m 'fake merge: ignore dev config' 0f12049 && ls -1
Merge made by the 'ours' strategy. commit_a commit_b production.conf README
Looking good. Now merge the rest of the dev branch.
117# git merge --no-edit master && git graph && echo -e "\n---" && ls -1
Merge made by the 'ort' strategy. hello_world.sh | 2 ++ 1 file changed, 2 insertions(+) create mode 100755 hello_world.sh * f695c37 - (HEAD -> production) Merge branch 'master' into production (26 years ago) <Ijon Tichy> |\ * \ ef4c8ba - fake merge: ignore dev config (26 years ago) <Ijon Tichy> |\ \ | | * 0a5cb55 - (master) add hello_world.sh (26 years ago) <Ijon Tichy> | |/ * | caae1d1 - production config (26 years ago) <Ijon Tichy> | * 0f12049 - development config (26 years ago) <Ijon Tichy> |/ * 2d78b2c - (playground/master, playground/HEAD) commit_b (26 years ago) <Ijon Tichy> * 21d78fc - commit_a (26 years ago) <Ijon Tichy> * c8d9b9c - first commit (26 years ago) <Ijon Tichy> --- commit_a commit_b hello_world.sh production.conf README
That worked! We don't have the dev config, but we do
have the hello world file introduced in the dev branch.
By the way, the same outcome can be reached by doing all this
manually using git commit-tree and giving the commit we want
to "fake merge" as its parent. We leave this as an exercise to the
reader.
Now, a quick look at some things worth knowing.
118# for i in {100..200}; do echo "config_$i=false" >> production.conf; done && git commit -m "some more conf" production.conf
[production d0bb103] some more conf 1 file changed, 101 insertions(+)
We add some more lines to the production config.
119# sed -i -E 's/config_(..)0=false/config_\10=true/' production.conf && tail -v --l 20 production.conf
==> production.conf <== config_181=false config_182=false config_183=false config_184=false config_185=false config_186=false config_187=false config_188=false config_189=false config_190=true config_191=false config_192=false config_193=false config_194=false config_195=false config_196=false config_197=false config_198=false config_199=false config_200=true
...then we change some lines in that config.
For quickly reviewing and staging changes, there's the
"--patch" (-p) option available for git add and commit:
120# yes | git add -p production.conf
diff --git a/production.conf b/production.conf index cd1dd95..8654cee 100644 --- a/production.conf +++ b/production.conf @@ -1,5 +1,5 @@ Production config -config_100=false +config_100=true config_101=false config_102=false config_103=false (1/11) Stage this hunk [y,n,q,a,d,j,J,g,/,e,p,?]? @@ -9,7 +9,7 @@ config_106=false config_107=false config_108=false config_109=false -config_110=false +config_110=true config_111=false config_112=false config_113=false (2/11) Stage this hunk [y,n,q,a,d,K,j,J,g,/,e,p,?]? @@ -19,7 +19,7 @@ config_116=false config_117=false config_118=false config_119=false -config_120=false +config_120=true config_121=false config_122=false config_123=false (3/11) Stage this hunk [y,n,q,a,d,K,j,J,g,/,e,p,?]? @@ -29,7 +29,7 @@ config_126=false config_127=false config_128=false config_129=false -config_130=false +config_130=true config_131=false config_132=false config_133=false (4/11) Stage this hunk [y,n,q,a,d,K,j,J,g,/,e,p,?]? @@ -39,7 +39,7 @@ config_136=false config_137=false config_138=false config_139=false -config_140=false +config_140=true config_141=false config_142=false config_143=false (5/11) Stage this hunk [y,n,q,a,d,K,j,J,g,/,e,p,?]? @@ -49,7 +49,7 @@ config_146=false config_147=false config_148=false config_149=false -config_150=false +config_150=true config_151=false config_152=false config_153=false (6/11) Stage this hunk [y,n,q,a,d,K,j,J,g,/,e,p,?]? @@ -59,7 +59,7 @@ config_156=false config_157=false config_158=false config_159=false -config_160=false +config_160=true config_161=false config_162=false config_163=false (7/11) Stage this hunk [y,n,q,a,d,K,j,J,g,/,e,p,?]? @@ -69,7 +69,7 @@ config_166=false config_167=false config_168=false config_169=false -config_170=false +config_170=true config_171=false config_172=false config_173=false (8/11) Stage this hunk [y,n,q,a,d,K,j,J,g,/,e,p,?]? @@ -79,7 +79,7 @@ config_176=false config_177=false config_178=false config_179=false -config_180=false +config_180=true config_181=false config_182=false config_183=false (9/11) Stage this hunk [y,n,q,a,d,K,j,J,g,/,e,p,?]? @@ -89,7 +89,7 @@ config_186=false config_187=false config_188=false config_189=false -config_190=false +config_190=true config_191=false config_192=false config_193=false (10/11) Stage this hunk [y,n,q,a,d,K,j,J,g,/,e,p,?]? @@ -99,4 +99,4 @@ config_196=false config_197=false config_198=false config_199=false -config_200=false +config_200=true (11/11) Stage this hunk [y,n,q,a,d,K,g,/,e,p,?]?
This happens interactively (disabled here by the "yes" tool). ...git checkout and reset support -p, too, so for unstaging a file partially we can use reset HEAD -p:
121# yes | git reset HEAD -p production.conf
diff --git a/production.conf b/production.conf index cd1dd95..8654cee 100644 --- a/production.conf +++ b/production.conf @@ -1,5 +1,5 @@ Production config -config_100=false +config_100=true config_101=false config_102=false config_103=false (1/11) Unstage this hunk [y,n,q,a,d,j,J,g,/,e,p,?]? @@ -9,7 +9,7 @@ config_106=false config_107=false config_108=false config_109=false -config_110=false +config_110=true config_111=false config_112=false config_113=false (2/11) Unstage this hunk [y,n,q,a,d,K,j,J,g,/,e,p,?]? @@ -19,7 +19,7 @@ config_116=false config_117=false config_118=false config_119=false -config_120=false +config_120=true config_121=false config_122=false config_123=false (3/11) Unstage this hunk [y,n,q,a,d,K,j,J,g,/,e,p,?]? @@ -29,7 +29,7 @@ config_126=false config_127=false config_128=false config_129=false -config_130=false +config_130=true config_131=false config_132=false config_133=false (4/11) Unstage this hunk [y,n,q,a,d,K,j,J,g,/,e,p,?]? @@ -39,7 +39,7 @@ config_136=false config_137=false config_138=false config_139=false -config_140=false +config_140=true config_141=false config_142=false config_143=false (5/11) Unstage this hunk [y,n,q,a,d,K,j,J,g,/,e,p,?]? @@ -49,7 +49,7 @@ config_146=false config_147=false config_148=false config_149=false -config_150=false +config_150=true config_151=false config_152=false config_153=false (6/11) Unstage this hunk [y,n,q,a,d,K,j,J,g,/,e,p,?]? @@ -59,7 +59,7 @@ config_156=false config_157=false config_158=false config_159=false -config_160=false +config_160=true config_161=false config_162=false config_163=false (7/11) Unstage this hunk [y,n,q,a,d,K,j,J,g,/,e,p,?]? @@ -69,7 +69,7 @@ config_166=false config_167=false config_168=false config_169=false -config_170=false +config_170=true config_171=false config_172=false config_173=false (8/11) Unstage this hunk [y,n,q,a,d,K,j,J,g,/,e,p,?]? @@ -79,7 +79,7 @@ config_176=false config_177=false config_178=false config_179=false -config_180=false +config_180=true config_181=false config_182=false config_183=false (9/11) Unstage this hunk [y,n,q,a,d,K,j,J,g,/,e,p,?]? @@ -89,7 +89,7 @@ config_186=false config_187=false config_188=false config_189=false -config_190=false +config_190=true config_191=false config_192=false config_193=false (10/11) Unstage this hunk [y,n,q,a,d,K,j,J,g,/,e,p,?]? @@ -99,4 +99,4 @@ config_196=false config_197=false config_198=false config_199=false -config_200=false +config_200=true (11/11) Unstage this hunk [y,n,q,a,d,K,g,/,e,p,?]?
...let's check...
122# git status
On branch production Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git restore <file>..." to discard changes in working directory) modified: production.conf no changes added to commit (use "git add" and/or "git commit -a")
Correct.
Let's get rid of the changes for the next part about git bisect.
123# git checkout production.conf
Updated 1 path from the index
If your project has a bug that you knew wasn't there a year ago, but there's about 1000 commits to check, git bisect is there to help you. It runs a binary search on the commits, finding the commit that introduced the bug very quickly, and it can do that in an automated way.
124# git bisect start
status: waiting for both good and bad commits
...to start the process. Then, you have to mark the broken and a known good commit.
125# git bisect bad && git bisect good caae1d1
status: waiting for good commit(s), bad commit known Bisecting: 2 revisions left to test after this (roughly 1 step) [0a5cb5530949f158d2e02f0ca8d6755bf90cce27] add hello_world.sh
Git now tells you how many revisions are left for testing, and how many steps this will take. Test, then mark, as appropriate.
126# git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps) [0f12049d595ccbd4a5f20e4a84d94118eef2465d] development config
etc. etc. - if you cannot test the current commit, you can skip:
127# git bisect skip
There are only 'skip'ped commits left to test. The first bad commit could be any of: 0f12049d595ccbd4a5f20e4a84d94118eef2465d 0a5cb5530949f158d2e02f0ca8d6755bf90cce27 We cannot bisect more!
...of course bisect might not be able to tell the exect commit
that broke things if it doesn't have complete information.
To end the bisect session once you are done, reset:
128# git bisect reset
Previous HEAD position was 0f12049 development config Switched to branch 'production'
If you have tests ready that can just be run from command line,
git bisect run SCRIPT is your friend.
Note that instead of "bad" and "good", any other terms can get
used.
For more information on git bisect, see
https://git-scm.com/docs/git-bisect
Another goodie: If you frequently use long lived topic branches,
you probably struggle with recurring merge conflicts.
git rerere can help you with that.
rerere means "Reuse recorded resolution of conflicted merges". Basically, rerere keeps a database of conflict resolutions and applies those resolutions if it sees the exact conflict again in any merge or rebase. Let's reset our development branche to the commit with that nice long configuration file, and create a new topic branch.
129# git remote rm playground && git reset --hard d0bb103 && git branch topic && git checkout topic && tail production.conf
Switched to branch 'topic' HEAD is now at d0bb103 some more conf config_191=false config_192=false config_193=false config_194=false config_195=false config_196=false config_197=false config_198=false config_199=false config_200=false
(of course, never branch off production for a topic branch
in reality...)
Ok! Now we change some bits in the topic branch.
130# sed -i -E 's/config_(..)0=false/config_\10=true/' production.conf && tail -v --l 20 production.conf && git commit -m "changed config" production.conf
==> production.conf <== config_181=false config_182=false config_183=false config_184=false config_185=false config_186=false config_187=false config_188=false config_189=false config_190=true config_191=false config_192=false config_193=false config_194=false config_195=false config_196=false config_197=false config_198=false config_199=false config_200=true [topic ddb58ff] changed config 1 file changed, 11 insertions(+), 11 deletions(-)
Say that in the production branch some unrelated fix is made.
131# git checkout production && sed -i 's/config_100=false/config_100=file_not_found/' production.conf && git commit -m "fix config_100" production.conf
Switched to branch 'production' [production 1fc94bb] fix config_100 1 file changed, 1 insertion(+), 1 deletion(-)
Say we keep developing in the topic branch.
132# git checkout topic
Switched to branch 'topic'
At some point, we want to check if merging with the main branch still works, so we do a "test merge" (that, once it's done, we'll roll back, since we don't really want that merge in our topic branch).
133# git merge production
Auto-merging production.conf CONFLICT (content): Merge conflict in production.conf Automatic merge failed; fix conflicts and then commit the result.
This results in a merge conflict.
134# git diff
diff --cc production.conf index 8654cee,593d505..0000000 --- a/production.conf +++ b/production.conf @@@ -1,5 -1,5 +1,9 @@@ Production config ++<<<<<<< HEAD +config_100=true ++======= + config_100=file_not_found ++>>>>>>> production config_101=false config_102=false config_103=false
We could fix it and move on, but since in this development
model we'd be re-doing that merge again later, we'd encounter
that conflict again.
This is where rerere comes into play. We have to enable it first.
135# git config --local rerere.enabled true
You might want to use --global instead of --local on your machine. Now, we roll back and trigger the merge again.
136# git merge --abort && git merge production
Recorded preimage for 'production.conf' Auto-merging production.conf CONFLICT (content): Merge conflict in production.conf Automatic merge failed; fix conflicts and then commit the result.
There's the conflict again, but note that "Recorded preimage" line;
that's by the rerere functionality.
Let's fix that conflict now.
137# git checkout topic production.conf && sed -i 's/config_100=true/config_100=file_not_found/' production.conf
Updated 1 path from cd06154
rerere can tell us about the current state of the resolution:
138# git rerere diff
--- a/production.conf +++ b/production.conf @@ -1,9 +1,5 @@ Production config -<<<<<<< config_100=file_not_found -======= -config_100=true ->>>>>>> config_101=false config_102=false config_103=false
Let's finalize and commit the merge.
139# git add production.conf && git commit --no-edit
Recorded resolution for 'production.conf'. [topic 47898f3] Merge branch 'production' into topic
Note the conflict resolution has been recorded by rerere.
If we roll back, then do the merge again, the conflict will get
resolved by rerere without further manual intervention.
140# git reset --hard ddb58ff && git merge --no-edit production
Resolved 'production.conf' using previous resolution. HEAD is now at ddb58ff changed config Auto-merging production.conf CONFLICT (content): Merge conflict in production.conf Automatic merge failed; fix conflicts and then commit the result.
The merge will still complain, but the actual conflict is gone,
i.e., one can add and commit the offending file.
Conflict resolutions will be used in rebase, too.
141# git reset --hard ddb58ff && git rebase production
Rebasing (1/1) error: could not apply ddb58ff... changed config hint: Resolve all conflicts manually, mark them as resolved with hint: "git add/rm", then run "git rebase --continue". hint: You can instead skip this commit: run "git rebase --skip". hint: To abort and get back to the state before "git rebase", run "git rebase --abort". hint: Disable this message with "git config set advice.mergeConflict false" Resolved 'production.conf' using previous resolution. Could not apply ddb58ff... changed config HEAD is now at ddb58ff changed config Auto-merging production.conf CONFLICT (content): Merge conflict in production.conf 
This looks bad, but DON'T PANIC.
142# git status
interactive rebase in progress; onto 1fc94bb Last command done (1 command done): pick ddb58ff changed config No commands remaining. You are currently rebasing branch 'topic' on '1fc94bb'. (fix conflicts and then run "git rebase --continue") (use "git rebase --skip" to skip this patch) (use "git rebase --abort" to check out the original branch) Unmerged paths: (use "git restore --staged <file>..." to unstage) (use "git add <file>..." to mark resolution) both modified: production.conf no changes added to commit (use "git add" and/or "git commit -a")
This looks fine, doesn't it?
143# git diff
diff --cc production.conf index 593d505,8654cee..0000000 --- a/production.conf +++ b/production.conf
...and this looks even better, so just add and continue the rebase.
144# git add production.conf && GIT_EDITOR=true git rebase --continue
Successfully rebased and updated refs/heads/topic. [detached HEAD 02b9030] changed config 1 file changed, 10 insertions(+), 10 deletions(-)
Let's look at the diff to production.
145# git diff production | head --l 20
diff --git a/production.conf b/production.conf index 593d505..34f7a09 100644 --- a/production.conf +++ b/production.conf @@ -9,7 +9,7 @@ config_106=false config_107=false config_108=false config_109=false -config_110=false +config_110=true config_111=false config_112=false config_113=false @@ -19,7 +19,7 @@ config_116=false config_117=false config_118=false config_119=false -config_120=false +config_120=true config_121=false
Such nice diff! Note there's no trace of the conflict.
146# git log --pretty=oneline
02b9030adf3a837f19ad9b635a8c9165ee993c8a changed config 1fc94bb6c197b00a594f9c9996957ab838867d87 fix config_100 d0bb103972bd7407175de2ac1a25f9b9b7fea24b some more conf f695c37d7477387dfbb84e6d3f05e6aa9bfe3b26 Merge branch 'master' into production ef4c8baa9b59c8d50b2bbb1429a613be81e444c0 fake merge: ignore dev config 0a5cb5530949f158d2e02f0ca8d6755bf90cce27 add hello_world.sh caae1d1f565d8d0f370d7db95af481e88b72f253 production config 0f12049d595ccbd4a5f20e4a84d94118eef2465d development config 2d78b2cfec3aa09041ccf4772453003692fa69ec commit_b 21d78fce05d66a5c99b56dc77a511d1bc28706e1 commit_a c8d9b9c01eea11fb1032903b0dd2bea3eeb46f48 first commit
...and such nice history.
For more information on git rerere, see
https://git-scm.com/docs/git-rerere
https://git-scm.com/book/en/v2/Git-Tools-Rerere
147# echo Thanks go to...
Thanks go to...
Pro Git book https://git-scm.com/book/en/v2
Git plumbing https://medium.com/@shalithasuranga/how-does-git-work-internally-7c36dcb1f2cf
Fellow B&Bers for input