mllint — Linter for Machine Learning projects

Rule — Code Quality — Pylint reports no issues with this project

Pylint is a static analysis tool for finding generic programming errors. This rule checks whether Pylint returns any errors when running it on all Python files in this project. The score for this rule is determined as a function of the number of messages Pylint returns and the lines of Python code that your project has. In the ideal case, Pylint does not recognise any code smells in your project, in which case the score is 100%....

Rule — Continuous Integration — Project uses Continuous Integration (CI)

This rule checks if your project is using Continuous Integration (CI). To learn more about what CI is, does and entails, see the description of category ci Implementing CI requires picking a CI provider that will run the automated builds and tests. There are many CI providers available and you will have to make your own decision on which fits you best, but mllint currently recognises four CI providers, namely:...

Rule — Dependency Management — Project places its development dependencies in dev-dependencies

Development dependencies are dependencies of your project that are only necessary for development purposes, but are not required for your software to actually run. Examples of this are code quality linters, unit testing frameworks and other project analysis tools, including mllint. This rule is only checked when your project uses Poetry or Pipenv, since these support having development dependencies. When mllint detects one of the following dependencies in your project, but it is not in your development dependencies, then it will fail this rule....

Rule — Dependency Management — Project properly keeps track of its dependencies

Most, if not all, ML projects either implicitly or explicitly depend on external packages such as numpy, scikit-learn, pandas, matplotlib, tensorflow, pytorch, etc.. While you can install these manually and individually onto your local environment with pip install, it is very easy to lose track of which exact packages and versions thereof you’ve installed. In turn, that makes it very difficult for your colleagues (or even yourself) to replicate the set of packages that you had installed while developing your project....

Rule — Dependency Management — Project should only use one dependency manager

In most cases, using multiple different dependency managers only creates confusion in your team regarding which manager to install, which to use for installing the project’s dependencies, and in what order. It can also be confusing for your team to figure out where a new dependency should be added, or where an existing dependency should be updated (just in one dependency manager (but which one?), or in both?). We therefore recommend using only one dependency manager, preferably either Poetry or Pipenv....

Rule — Testing — Project has automated tests

Every ML project should have a set of automated tests to assess the quality, consistency and correctness of their application in a repeatable and reproducible manner. This rule checks how many test files your project contains. In accordance with pytest’s conventions for Python tests, test files are Python files starting with test_ or ending with _test.py. Per default, mllint expects at least one test file to be implemented in your project (i....

Rule — Testing — Project passes all of its automated tests

Of course, the point of having automated tests is to ensure that they pass. While mllint will not run your tests as part of its static analysis, mllint expects you to run these on your own terms and provide a the filenames to a JUnit-compatible XML test report and a Cobertura-compatible XML coverage report in your project’s mllint configuration. Specifically for this rule, the JUnit test report is analysed. When using pytest to run your project’s tests, use the --junitxml=<filename> option to generate such a test report, e....

Rule — Testing — Project provides a test coverage report

One way of measuring the effectiveness of automated tests, is by measuring how many lines of code are touched while the tests are being executed. This is called test coverage. The idea is that the more lines are being executed by your tests, the more of your code’s behaviour is being exercised, thus yielding a greater probability of bugs surfacing and being detected or prevented. Note, however, that line test coverage only measures whether a line of code is executed....

Rule — Testing — Tests should be placed in the tests folder

In accordance with pytest’s conventions for Python tests and recommendations on test layout, test files are Python files starting with test_ or ending with _test.py and should be placed in a folder called tests at the root of your project. This rule therefore simply checks whether all test files in your projects are indeed in this tests folder at the root of your project.

Rule — Version Control — DVC: File 'dvc.lock' should be committed to Git

While using DVC to define pipelines in a dvc.yamlfile, DVC maintains a dvc.lock file to record the state of your pipeline(s) and help track its outputs. As with any .lock file, it is highly recommended to commit your dvc.lock to your project’s Git repository. Learn more about dvc.lock files here. If you’re seeing this in a report, then your project contains a dvc.lock file, but it has not been added to Git....