python ml skeleton project#

generic skeleton for machine learning project with python, hydra, pytest, sphinx, github actions, etc. with dummy functionalities! It is mostly oriented geospatial projects

PyPI python PyPI version License Documentation Status pre-commit.ci status codecov

Why this project?#

The goal of this project is to present a standard architecture of python repository/package including a full CiCd pipeline to document/test/deploy your project with standard methods of 2022. It can be used as starting point for any project without reinventing the wheel.

The code has no interest!#

The code of this project is totally dummy: it makes simple mathematics operations like addition and subtration! The next iteration will make the opetations more interesting by using multi-layers perceptron! It will try to add a complete example of Hydra configuration.

In a close future, it will serve as a demonstrator by the example of a standard ML pipeline for experimentation and production

Installation#

Install requirements#

As Gdal dependencies are presents it’s preferable to install dependencis via conda before installing the package:

  git clone https://github.com/samysung/python_ml_project_skeleton
  cd python_ml_project_skeleton/packaging
  conda env create -f package_env.yml

From pip:#

pip install pmps
or pip install pmps==vx.x # for a specific version
Other installation options

From source:#

python setup.py install

From source using pip:#

pip install git+https://github.com/samysung/python_ml_project_skeleton

Project Architecture#

├── CHANGELOG.rst
├── .codecov.yml
├── deploy
│   └── dockerfile
├── docs
│   ├── add.rst
│   ├── build.sh
│   ├── changelog.rst
│   ├── conf.py
│   ├── deploy.sh
│   ├── index.rst
│   ├── Makefile
│   ├── readme_link.md
│   └── _static
│       └── img
├── .github
│   └── workflows
│       ├── publish.yml
│       ├── test_code.yml
│       ├── test_docs.yml
│       ├── test_packaging.yml
│       └── test_publish.yml
├── .gitignore
├── LICENSE
├── packaging
│   ├── doc_env.yml
│   ├── doc_requirements.txt
│   ├── package_env.yml
│   ├── requirements.txt
│   ├── test_env.yml
│   └── test_requirements.txt
├── pmps
│   ├── api
│   │   ├── add.py
│   │   ├── __init__.py
│   │   └── subtract.py
│   ├── core
│   │   ├── add.py
│   │   ├── __init__.py
│   │   └── subtract.py
│   └── __init__.py
├── .pre-commit-config.yaml
├── .pylintrc
├── README.md
├── readthedocs.yml
├── setup.cfg
├── setup.py
├── tests
│   ├── api
│   │   ├── __init__.py
│   │   ├── test_add.py
│   │   └── test_subtract.py
│   └── __init__.py
└── VERSION

Architecture component overview#

Component

Path

Description

Python Package

pmps/

where the python executable code is localized. It is your root package as it’s the first directory to contain a init.py and its name is generally the one you choose for your publishing package (the one build and published on forge like pypi conda, etc. Don’t forget for any subpackage to add an init.py module to declare it as python package. NB: separate core and api in different sub package is a design choice not standard, it comes from java world but a lot of python project prefers declaring private python modules.

Documentation

docs/

the source code of your documentation: conf.py is where you configure your sphinx doc, _static/ for your additional statis files (img, text, icon, video, etc.), doc is built under docs/_build/html but can be modified in maekfile.

Tests Package

tests/

where you organize the test code of your executable code. Your unit tests (pytest is the library used) should at least test what you expose to your clients, you can add static analysis of your tests code with extentions like mypy and flake8. Use the pytest-cov extension to produce test cover reporting.

Python Env

packaging/

Place for your conda environment files and requirement files.

Deployment

deploy/

Place for Dockerfiles or any other deployment solution

CI/CD workflows

.github/

github workflows configuration files (details below)

CD (Documentation publishing)

.readthedocs.yml

configuration of the documentation publication on readthedocs (see readthedocs link)

CI (tests covering publishing)

.codecov.yml

configuration of the code covering pubication on codecov (see codecov)

CI (static analysis publishing)

.pre-commit.yml

configuration of the pre-commit publication (see pre-commit)

CD (packaging)

setup.cfg and setup.py

configuration files for packaging on pipy, local, etc (see python doc)

CI/CD pipeline#

The first and essential goal is to have a skeleton quickly editable for a lot of use case projects with a big emphasis on continuous integration and continuous deployment. Here is a schematic view of the Ci/Cd pipeline targeted for open source python project, largely inspired by others well known projects:

DIAGRAM#

Ci/Cd diagram

Github Workflows#

test code worflow (.github/workflows/test_code.yml):#

Used to run unit test (and functionnal if implemented) tests on pull request events or push on main branch. It publishes coverage results on codecov.io. Use the packaging/test_env.yml conda environment file, github cache action and codecov/codecov-action

test docs workflow (.github/workflows/test_docs.yml):#

Used to test the build of sphinx documentation. Run on pull request events or push on main branch. Use the packaging/doc_env.yml conda environment file, and the github cache action.

publish workflow (.github/workflows/publish.yml):#

Used to publish the package on pypi, when a new tagged version or release is published. Use the packaging/package_env.yml conda environment file, github cache action github download and upload artifacts, and gh-action-pypi-publish.

test publish workflow (.github/workflows/test_publish.yml):#

Same worflow as above, but on a test branch and test.pypi forge, for testing deployment improvement recipes

test packaging workflow (.github/workflows/test_packaging.yml):#

Worflow actioned by CRON event (see crontab-guru), every n hours. Used to test that the package has been published and the lasted version is working.

Github workflows based on github webhooks or githu Apps#

Some workflow works are handled by third party applications, like the readthe docs publication or the online pre-commit static analysis.

Pre-commit#

Pre-commit action is launched via a github app (pre-commit.ci) on every commit made on remote. it’s configured via the file pre-commit-config.yaml

Read-the-docs publication#

Readthedocs publish new documentation version via a webhook subscribed for push and commit event. You can configure the type of push trigering the process in the readthedocs.org configuration section. See read the docs documentation for more detail.