While backing up important stuff from my laptop in preparation for a fresh new OS install, I came across a folder full of Go solutions I’d written for some Hackerrank puzzles years ago. I thought it would be worthwhile turning it into a git repo, not only for an easily accessible backup but to act as a simple portfolio to show I can do some stuff in Go.
But I wanted to have an accurate history: rather than pretending that all the code was written simultaneously on 2021-08-26, I wanted to have the real dates that I wrote each solution.
So what to do? Off to Google and StackOverflow to find how to do such a thing.
Note: if you’re planning to do this yourself, don’t touch anything in the repo. This approach relies on the “last modified date” for files/folders in Linux and obviously those will change if you alter the files or folders in any way.
Getting the last modified date for files
There are a few ways to get the access/modify/change dates of files in Linux:
- stat
- date -r
- ls -l
- find
find
is handy because it allows recursive search with just the one command. Checking man find
you see that we can provide a formatted string for the -printf
flag to get timestamps. %Tk
gets the last modification time in a format specified by the value of k
.
%TY-%Tm-%TdT%TH:%TM:%.2TS
gives us the ISO 8601 standard 2020-01-01T01:02:03, which is one of the formats git accepts.
%p
gives us the path of the file.
The following will give us timestamps and filepaths for every file in the directory, recursively, sorted by date:
find . -type f -printf "%TY-%Tm-%Td %TH:%TM:%.2TS %p\n" | sort -n
But I don’t want to have a single commit for every file. It’s easier to have a commit for each folder, under the assumption that 1 folder = 1 puzzle. But how do I find only the leaf folders in this directory?
Thanks to this SO answer I now know you can use the -links
flag. In a POSIX compliant system a directory will have links to
- its parent
- itself
- any subdirectories
Therefore, if there are only 2 links, it is a bottom-level or leaf directory.
find . -type d -links 2 -printf "%TY-%Tm-%TdT%TH:%TM:%.2TS %p\n" | sort -n
Note: Put this output in a file somewhere and save it before you start messing around with git. Git checkout and other commands will delete/recreate/modify files and therefore change the created/modified dates for those files.
Setting dates for commits
Git associates two dates with each commit:
- GIT_AUTHOR_DATE : The date when the commit is first created
- GIT_COMMITTER_DATE : The date that a commit was modified by
--amend
, a force push, rebase, or other git commands (see Github docs)
Among others, git accepts ISO 8601 standard date format: 2005-04-07T22:13:13
You can set the date variables inline when you commit or amend:
GIT_COMMITTER_DATE="2020-01-01T01:02:03" GIT_AUTHOR_DATE="2020-01-01T01:02:03" git commit -m "testing"
So far so good.
Making it automatic
Now we can put these two steps together with a bash script to create all the necessary commits automatically.
We take the output of the find
command above and for each line we add files to git from that filepath, and commit using that date.
Originally I was executing the find
command in the script itself:
IFS=$'\n'
for res in $(find . -type d -links 2 -printf "%TY-%Tm-%TdT%TH:%TM:%.2TS %p\n" | sort -n);
do
# git add
# git commit
done;
(It is necessary to set the Internal Field Separator so that the output of find
is split on lines rather than all whitespace)
But I quickly realised that saving the output of find
into a file and then reading from that in the script was much better.
Using a file as input has two advantages:
- The dates returned by
find
will no longer be correct if you make some git commits and then have to undo or change things. Keeping a record of what the timestamps originally were is safer. - You can make manual changes to the dates and paths, for example if you want to put two subfolders together into one commit.
We also want to avoid committing any special folders like .git
itself or the .vscode
and .history
folders created by VSCode and plugins. I put in a simple filter for folders starting with a dot. You could also do this step as part of the find
command, by using the -regex
flag or by passing it through grep
.
The final code:
#!/bin/bash
while read -r date filepath;
do
echo "$date $filepath";
if [[ $filepath == *"/."* ]]; then
echo "dot folder, skipping";
else
git add "$filepath";
GIT_COMMITTER_DATE="$date" GIT_AUTHOR_DATE="$date" git commit -m "added $(echo "$filepath" | cut -c3-)";
fi;
done < folder-modified-dates.txt;
And the result:
You can check out the code and the history on GitHub: https://github.com/ronniegane/hackerrank-solutions