Most, if not all, ML projects either implicitly or explicitly depend on external packages
such as numpy
, scikit-learn
, pandas
, matplotlib
, tensorflow
, pytorch
, etc..
While you can install these manually and individually onto your local environment with pip install
,
it is very easy to lose track of which exact packages and versions thereof you’ve installed.
In turn, that makes it very difficult for your colleagues (or even yourself) to replicate the
set of packages that you had installed while developing your project. This could result in your code
simply not working due to missing packages or displaying unexpected bugs because of an updated dependency.
Proper dependency management is thus important for the maintainability and reproducibility of your project,
yet research on open-source ML projects has shown that very few ML applications
actually manage their dependencies correctly. Many use basic requirements.txt
files, often generated using pip freeze
, but these have a
high tendency to include unrelated packages or packages that cannot be resolved from PyPI (Pip’s standard package index),
are hard to maintain as they have no distinction between run-time dependencies and development-time dependencies, nor direct and indirect dependencies,
and may hamper the reproducibility of your ML project by underspecifying their exact versions and checksums.
Managing your project’s packages with a setup.py
file is similarly flawed and thus also not recommended,
except if there is a direct need to build your project into a platform-specific Pip package.
The Python Packaging User Guide recommends using either Poetry or Pipenv as dependency managers. The recommendation is to use Pipenv if your project is an application and to use Poetry if it is a library or otherwise needs to be built into a Python package.
If you’re seeing this in a report, it means your project is currently not using a dependency manager, or one that is not recommended.
Learn more about Poetry and Pipenv using the links below, pick the one that most suits you, your project and your team, then start managing your dependencies with it.