In a previous post, I discussed how to do repeatable python builds. I also mentioned it was too much of a hassle to check dependency hashes at install time. According to the pip documentation on the subject, hash checking does the following:
This protects against a compromise of PyPI or the HTTPS certificate chain. It also guards against a package changing without its version number changing (on indexes that allow this).
For the most part, I’m not too concerned about these things, but
while researching this I was very confused about how difficult to use
pip hash command
pip hash requires you to have already downloaded a dependency
pip download and it only gives you a single hash for the exact
dependency file you downloaded. That means you can’t easily use it to
generate a single
constraints.txt file that
will work on different architectures or operating systems since each
could have a different hash, especially for packages written in c or
fortran. The way to deal with this is to provide multiple allowed
--hash arguments to
pip install, but there doesn’t seem to be an
easy way to automate this using
I was curious about this, so I did a little digging and discovered that there is a convenient way to find all the hashes for a given package version using the pip json api.
For any given package version, you can hit
https://pypi.org/pypi/$package/$version/json and get a response that
sha256 hash of each wheel or tarball available via
pypi. The hashes are inside the various
releases keys in the json
sha256. Using this information it’s
straightforward to generate a
--hash=sha256:$mysha for each of those
to generate a fully hashed
requirements.txt that works across
operating systems and architectures.
Given this, I might re-think how much of a hassle doing hash-checking is.