RNG

thoughts on software development and everything else

Retroactive git history

2021-08-27

While backing up important stuff from my laptop in preparation for a fresh new OS install, I came across a folder full of Go solutions I’d written for some Hackerrank puzzles years ago. I thought it would be worthwhile turning it into a git repo, not only for an easily accessible backup but to act as a simple portfolio to show I can do some stuff in Go.

But I wanted to have an accurate history: rather than pretending that all the code was written simultaneously on 2021-08-26, I wanted to have the real dates that I wrote each solution.

So what to do? Off to Google and StackOverflow to find how to do such a thing.

Note: if you’re planning to do this yourself, don’t touch anything in the repo. This approach relies on the “last modified date” for files/folders in Linux and obviously those will change if you alter the files or folders in any way.

Getting the last modified date for files

There are a few ways to get the access/modify/change dates of files in Linux:

  • stat
  • date -r
  • ls -l
  • find

find is handy because it allows recursive search with just the one command. Checking man find you see that we can provide a formatted string for the -printf flag to get timestamps. %Tk gets the last modification time in a format specified by the value of k.

%TY-%Tm-%TdT%TH:%TM:%.2TS gives us the ISO 8601 standard 2020-01-01T01:02:03, which is one of the formats git accepts.

%p gives us the path of the file.

The following will give us timestamps and filepaths for every file in the directory, recursively, sorted by date:

find . -type f -printf "%TY-%Tm-%Td %TH:%TM:%.2TS %p\n" | sort -n

all files with their timestamps But I don’t want to have a single commit for every file. It’s easier to have a commit for each folder, under the assumption that 1 folder = 1 puzzle. But how do I find only the leaf folders in this directory?

Thanks to this SO answer I now know you can use the -links flag. In a POSIX compliant system a directory will have links to

  • its parent
  • itself
  • any subdirectories

Therefore, if there are only 2 links, it is a bottom-level or leaf directory.

find . -type d -links 2 -printf "%TY-%Tm-%TdT%TH:%TM:%.2TS %p\n" | sort -n

Note: Put this output in a file somewhere and save it before you start messing around with git. Git checkout and other commands will delete/recreate/modify files and therefore change the created/modified dates for those files.

Setting dates for commits

Git associates two dates with each commit:

  • GIT_AUTHOR_DATE : The date when the commit is first created
  • GIT_COMMITTER_DATE : The date that a commit was modified by --amend, a force push, rebase, or other git commands (see Github docs)

Among others, git accepts ISO 8601 standard date format: 2005-04-07T22:13:13

You can set the date variables inline when you commit or amend:

GIT_COMMITTER_DATE="2020-01-01T01:02:03" GIT_AUTHOR_DATE="2020-01-01T01:02:03" git commit -m "testing"

resulting commit with past date

So far so good.

Making it automatic

Now we can put these two steps together with a bash script to create all the necessary commits automatically. We take the output of the find command above and for each line we add files to git from that filepath, and commit using that date.

Originally I was executing the find command in the script itself:

IFS=$'\n'
for res in $(find . -type d -links 2 -printf "%TY-%Tm-%TdT%TH:%TM:%.2TS %p\n" | sort -n);
do
  # git add
  # git commit
done;

(It is necessary to set the Internal Field Separator so that the output of find is split on lines rather than all whitespace)

But I quickly realised that saving the output of find into a file and then reading from that in the script was much better.

Using a file as input has two advantages:

  1. The dates returned by find will no longer be correct if you make some git commits and then have to undo or change things. Keeping a record of what the timestamps originally were is safer.
  2. You can make manual changes to the dates and paths, for example if you want to put two subfolders together into one commit.

We also want to avoid committing any special folders like .git itself or the .vscode and .history folders created by VSCode and plugins. I put in a simple filter for folders starting with a dot. You could also do this step as part of the find command, by using the -regex flag or by passing it through grep.

The final code:

#!/bin/bash

while read -r date filepath;
do
    echo "$date $filepath";
    if [[ $filepath == *"/."* ]]; then
      echo "dot folder, skipping";
    else
      git add "$filepath";
      GIT_COMMITTER_DATE="$date" GIT_AUTHOR_DATE="$date" git commit -m "added $(echo "$filepath" | cut -c3-)";
    fi;
done < folder-modified-dates.txt;

And the result: commit script output - making commits commit script output - skipping dot folders resulting git history

You can check out the code and the history on GitHub: https://github.com/ronniegane/hackerrank-solutions