Catching API Keys With Git Hooks

Fri 28th Oct 16

Here is a little trick I just learned about to help prevent things like API keys from ending up in your Git repo. I've mentioned it to a few Git loving developers who all claimed that it is obvious and that loads of people are already using it, but, as we regularly see keys in GitHub, I'd guess that its a case of what people know they should be doing verses what they are actually doing.

So, the trick... Git has a system of hooks which let you run scripts on key events such as when code is committed or pushed to a remote repository, these can be used to check the content of files or commit messages and, if necessary, block the action. They can run on both the local and remote side and are set up by simply dropping an executable with the right name in the right directory. You can read more about these here Customizing Git - Git Hooks.

The defence I'm suggesting is to check the files for bad content before they are allowed to be committed into the repository, this type of check is performed by the .git/hooks/pre-commit script. This gets ran on a commit but before the files actually make it into the repository. The script gets access to a list of all the files that are being committed, it is then a simple case of grepping through them looking for bad content, such as API keys or anything else you don't want to end up in your repo. The script is able to send messages out to the user by writing to standard out and if the script exits with zero the commit is allowed otherwise it is rejected.

The following script uses two regex's to check for Amazon and Google private keys. It makes a call out to grep with these for each file that is to be committed. If grep returns content then there is a match and so a message is printed and the count of bad files increased. At the end, the script exits with the bad count value, if there were no matches then the count is zero and the commit is allowed, otherwise commit is rejected.

#!/usr/bin/env ruby

# This script can be bypassed by using the --no-verify
# parameter when checking in

files_modified = `git diff-index --cached --name-only HEAD`
files_modified_arr = files_modified.split("\n")

# puts "Checking files: #{files_modified_arr.inspect}"

bad_files = 0

# Build a hash of all the keys and things you don't want
# checked in here.
# Note the pair of slashes before the slash quote, this
# is to ensure a slash quote is built into the string
# to be passed to grep.

regexs = {
    "AWS Key" =< "['\\\"][a-z0-9\/+]{40}['\\\"]",
    "Google Key" =< "['\\\"][a-z0-9_]{39}['\\\"]",
}

files_modified_arr.each do |file|
    regexs.each_pair do |key_name, regex|
        grep_command = "grep -iE \"#{regex}\" #{file}"
        # puts grep_command
        res = `#{grep_command}`
        # puts res.inspect
        unless res == ""
            bad_files += 1
            puts "Match rule for #{key_name} on file: #{file}"
        end
    end
end

exit bad_files

This first example shows a safe use of API keys where they will be loaded in from a file before use.

Git hook allowing the safe use of API keys

The file is allowed through the Git commit process as normal.

This second example stores the API key directly in the file and, as can be seen, the hook spots this and blocks the commit:

Git hook rejecting commit

Finally, as with any pattern matching rule, there is likely to be the occasional false positive picked up. If you want to override the hook and its checks you can do this with the --no-verify parameter when committing:

Bypass Git hook using --no-verify

Conclusion

This technique can be used to detect any type of content and prevent it from ending up in the repo. Obviously there is still the chance of things slipping through if the regex isn't well written, similarly, a poorly written regex can result in false positives and so work needs to be done to ensure the regex list is a tight as possible.

Also, as was shown, the checks can be easily bypassed and so it is still recommended that regular source code audits are performed looking for keys or other information which developers have forced through.