Category — Code Quality

This category assesses your project’s code quality by running several static analysis tools on your project. Static analysis tools analyse your code without actually running it, in an attempt to find potential bugs, refactoring opportunities and/or coding style violations. The linter for this category will check whether your project is using the configured set of code quality linters. mllint supports (and by default requires) the following linters: pylint mypy black isort bandit For your project to be considered to be using a linter…...

1 min · Bart van Oort (bvobart)

Category — Continuous Integration

This category evaluates checks whether your project uses Continuous Integration (CI) and how you are using it. Continuous Integration is the practice of automating the integration (merging) of all changes that multiple developers make to a software project. This is done by running an automated process for every commit to your project’s Git repository. This process then downloads your project’s source code at that commit, builds it, runs the linters configured for the project—we hope you include mllint—and runs the project’s tests against the system....

2 min · Bart van Oort (bvobart)

Category — Custom Rules

This category enables you to write your own custom evaluation rules for mllint. Custom rules can be useful for enforcing team, company or organisational practices, as well as implementing checks and analyses for how your proprietary / closed-source tools are being used. Custom rules may also be useful for creating ‘plugins’ to mllint, that implement checks on tools that mllint does not yet have built-in rules for. mllint will pick up these custom rules from your configuration and automatically run their checks during its analysis....

4 min · Bart van Oort (bvobart)

Category — Data Quality

This category assesses your project’s data quality. It is not implemented yet. The idea is that this will contain rules on whether you have proper cleaning scripts and may also include dynamic checks on the data that is currently in the repository (e.g. is it complete (not missing values), are types of each value consistent, that sorta stuff. Perhaps with data-linter or tensorflow-data-validation)

1 min · Bart van Oort (bvobart)

Category — Dependency Management

This category deals with how your project manages its dependencies: the Python packages that your project uses to make it work, such as scikit-learn, pandas, tensorflow and pytorch. Proper dependency management, i.e., properly specifying which packages your project uses and which exact versions of those packages are being used, is important for being able to recreate the environment that your project was developed in. This allows other developers, automated deployment systems, or even just yourself, to install exactly those Python packages that you had installed while developing your project....

2 min · Bart van Oort (bvobart)

Category — Deployment

This category evaluates your project’s ability to be deployed in the real world. It is not yet implemented, but may contain rules about Dockerfiles and configurability, among others. Recommendations: SeldonCore - An open source platform to deploy your machine learning models on Kubernetes at massive scale. Seldon handles scaling to thousands of production machine learning models and provides advanced machine learning capabilities out of the box including Advanced Metrics, Request Logging, Explainers, Outlier Detectors, A/B Tests, Canaries and more....

1 min · Bart van Oort (bvobart)

Category — File Structure

This category deals with the file and folder structure of your ML project. It is not implemented yet. Examples of rules you might see here in the future: Project keeps its data in the ‘./data’ folder Project maintains documentation in a ‘./docs’ folder. Project’s source code is kept in a ‘./src’ folder, or a folder with the same name as the project / package.

1 min · Bart van Oort (bvobart)

Category — Testing

Testing in the context of Software Engineering refers to the practice of writing automated checks to ensure that something works as intended. Testing ML systems is, however, different from testing traditional software systems. In traditional software systems, humans write all the logic that processes whatever data the system handles, whereas in ML systems, humans provide examples (training data) of what we want the desired behaviour to be and the machine learns the logic required to produce this behaviour....

2 min · Bart van Oort (bvobart)

Category — Version Control

This category contains rules relating to version controlling the code and data. Version control software allows you to track changes to your project and helps to work collaboratively with other people within the same project. It also allows you to easily return to an earlier version of your project or merge two versions together. Git is the ubiquitously used tool for version controlling code, but Git is not very efficient at handling large or binary files....

1 min · Bart van Oort (bvobart)