Rule — Dependency Management — Project places its development dependencies in dev-dependencies

Development dependencies are dependencies of your project that are only necessary for development purposes, but are not required for your software to actually run. Examples of this are code quality linters, unit testing frameworks and other project analysis tools, including mllint. This rule is only checked when your project uses Poetry or Pipenv, since these support having development dependencies. When mllint detects one of the following dependencies in your project, but it is not in your development dependencies, then it will fail this rule....

1 min · Bart van Oort (bvobart)

Rule — Dependency Management — Project properly keeps track of its dependencies

Most, if not all, ML projects either implicitly or explicitly depend on external packages such as numpy, scikit-learn, pandas, matplotlib, tensorflow, pytorch, etc.. While you can install these manually and individually onto your local environment with pip install, it is very easy to lose track of which exact packages and versions thereof you’ve installed. In turn, that makes it very difficult for your colleagues (or even yourself) to replicate the set of packages that you had installed while developing your project....

2 min · Bart van Oort (bvobart)

Rule — Dependency Management — Project should only use one dependency manager

In most cases, using multiple different dependency managers only creates confusion in your team regarding which manager to install, which to use for installing the project’s dependencies, and in what order. It can also be confusing for your team to figure out where a new dependency should be added, or where an existing dependency should be updated (just in one dependency manager (but which one?), or in both?). We therefore recommend using only one dependency manager, preferably either Poetry or Pipenv....

1 min · Bart van Oort (bvobart)

Rule — Testing — Project has automated tests

Every ML project should have a set of automated tests to assess the quality, consistency and correctness of their application in a repeatable and reproducible manner. This rule checks how many test files your project contains. In accordance with pytest’s conventions for Python tests, test files are Python files starting with test_ or ending with _test.py. Per default, mllint expects at least one test file to be implemented in your project (i....

2 min · Bart van Oort (bvobart)

Rule — Testing — Project passes all of its automated tests

Of course, the point of having automated tests is to ensure that they pass. While mllint will not run your tests as part of its static analysis, mllint expects you to run these on your own terms and provide a the filenames to a JUnit-compatible XML test report and a Cobertura-compatible XML coverage report in your project’s mllint configuration. Specifically for this rule, the JUnit test report is analysed. When using pytest to run your project’s tests, use the --junitxml=<filename> option to generate such a test report, e....

1 min · Bart van Oort (bvobart)

Rule — Testing — Project provides a test coverage report

One way of measuring the effectiveness of automated tests, is by measuring how many lines of code are touched while the tests are being executed. This is called test coverage. The idea is that the more lines are being executed by your tests, the more of your code’s behaviour is being exercised, thus yielding a greater probability of bugs surfacing and being detected or prevented. Note, however, that line test coverage only measures whether a line of code is executed....

2 min · Bart van Oort (bvobart)

Rule — Testing — Tests should be placed in the tests folder

In accordance with pytest’s conventions for Python tests and recommendations on test layout, test files are Python files starting with test_ or ending with _test.py and should be placed in a folder called tests at the root of your project. This rule therefore simply checks whether all test files in your projects are indeed in this tests folder at the root of your project.

1 min · Bart van Oort (bvobart)

Rule — Version Control — DVC: File 'dvc.lock' should be committed to Git

While using DVC to define pipelines in a dvc.yamlfile, DVC maintains a dvc.lock file to record the state of your pipeline(s) and help track its outputs. As with any .lock file, it is highly recommended to commit your dvc.lock to your project’s Git repository. Learn more about dvc.lock files here. If you’re seeing this in a report, then your project contains a dvc.lock file, but it has not been added to Git....

1 min · Bart van Oort (bvobart)

Rule — Version Control — DVC: Folder '.dvc' should be committed to Git

DVC uses the .dvc folder to keep records of and information about all your DVC-tracked files and where they are hosted. This folder must be committed to your Git repository in order to work with DVC correctly. Learn more about the .dvc directory here. If you’re seeing this in a report, then your project’s Git repository is not tracking the ‘.dvc’ folder. To fix this, you may use the following commands:...

1 min · Bart van Oort (bvobart)

Rule — Version Control — DVC: Is installed

To be able to use DVC, it must be installed correctly. If you’re seeing this as part of anmllintreport, then it means that mllint was unable to find ‘dvc’ on your PATH. This could either indicate that DVC is not installed in your project, or it is not included on your path. See DVC’s installation instructions to learn more about installing DVC, or simply add it to your project as a Pip package, e....

1 min · Bart van Oort (bvobart)