Let’s create an open-source project in Python from scratch06 Feb 2020
In the summer of 2018, I had the privilege to create a plotting library in Ruby (Have patience we will get to Python soon) from the scratch, that is when I realised that creating a package in a programming language is quite different from simply using the language. You need to make sure that your code is readable, its working is reproducible, and robust, so that you can make changes to the code-base without affecting the prior functionality.
To solidify the software development concepts that I learned, I decided to use them with Python. In this post we will create a simple open-source Python package from scratch that should work with Python 2.7 and Python 3.
The code used in this blog post can be found here.
Let us name our package sampkg (named after the Greek god of sample packages). Our package is going to have a module, sammod (which is actually named after the Nordic god of sample modules).
The module, will contain a class (Samclass) that will have a string (samstring) and a function that returns the string with “hello “ prepended (sam_inst_func) as instance attributes along with a integer(counter , which counts number of instances) as a class attribute.
Along with Samclass, sammod will also contain a function (sam_func) that will that take a Samclass object and return a string containing natural logarithm of counter (calculated using numpy) and a reversed samstring.
Now that we have chalked out plans for our package, we will look into the software development aspect.
First, we will create a Project hosted on Github for sampkg (make sure you include .gitignore for python) and clone it.
We now will set up the package directory, followed by tests, code linters, Setup script, continuous integration and then we will put our nifty package on PyPI.
So we will create the package directory to look like this:
sampkg/ <-- In my case I cloned it to my user's home folder ├── LICENSE ├── README.md ├── sampkg <-- our package │ ├── __init__.py <-- contains info about the package │ └── sammod <-- module in our package │ ├── __init__.py <-- imports the contents of the module │ └── sammod.py ├── .gitignore ├── setup.py <- our setup script └── tests <- we will put all our test here └── sammod ├── test_samclass.py └── test_sam_func.py
Before we write any code, we need to plan out what that code is going to do, for that we will write tests that out code needs to pass.
After all, standardised testing is the best measure of capability. /s
We will be using pytest as our testing tool. First, we need to install pytest .
pip install pytest
Since we have a class and a function in the module sammod, we shall write two tests: Make sure that the name of the files containing tests and the test functions start with “test_”
We will run the tests from the project’s base directory.
python -m pytest
As expected, the tests are failing. As we still need to write the code to make them pass.
You might be wondering why we did not call
pytest to run the tests, You can read about it here.
Let’s write the code for sammod:
And edit ~/ sampkg/sammod/__init__.py
Running the tests again we get:
Seems like I have forgotten to reduce the counter when an object of Samclass is destructed.
Fixing the code: And running the tests again, we get:
Linting your code
Why do you need to write beautiful code?
To quote John Keats:
A thing of beauty is a joy for ever:
Its loveliness increases; it will never
Pass into nothingness;
The same is true for your code. A well written code-base is easy to read, maintain and expand.
However, beauty is in the eye of beholder. I might prefer to indent with a tab and you might prefer to indent with two spaces. But this will not only lead to inconsistencies in the project, but it will also lead to those nasty TabErrors.
Hence, it is better to follow a fixed set of rules. Most of the projects follow the standard PEP8 guidelines.
To check if our code is at par with the desired guidelines we will use a code-linter named Flake8.
To install flake8, simply type:
pip install flake8
Now we run it from the project base:
This will go through all the py files in the package and point out all the PEP8 violations.
This is how our code looks like after cleaning: See, our code is much better and more readable now!
Alternatively, you can use flake8 with some text editors and get real-time feedback while typing your code.
Whenever a user downloads our package, we would like all the dependencies for the package to be installed (which is numpy in this case) and the tests to be run. We create a setup.py file to facilitate that.
What is the use of setup.py?
Now the package can be installed in the active Python environment by using:
python setup.py install
It can be simply uninstalled by:
pip uninstall sampkg
We will also create a setup.cfg:
So that we can run the tests when we type:
python setup.py test
You can go through the official Python documentation for creating the setup file here.
Since we are building an open source project, we can expect others interested in the project to be able to contribute to it. Before merging their pull requests, we need to be sure that their code changes integrate well with the existing codebase for different environments supported by us. This is where Continuous integration (CI) comes in.
As the name suggests, it continuously runs integration tests for every commit/pull request to your repository. You can find a list of Continuous Integration services that you can use here.
I will be using Travis-CI , a continuous integration service which is free for open source projects on GitHub.
First, you need to authorise Travis-CI to access your Github account and select the repository that you want Travis-CI to build. Follow steps one to three[here][https://docs.travis-ci.com/user/tutorial/#to-get-started-with-travis-ci). Spoiler Alert! Make sure that you do not read step number four.
After selecting the repository it will trigger a build which will fail and gives this error:
This is because we haven’t pushed a file that will configure Travis-CI to our repository yet, so it reverts to a default config file.
You read step four, didn’t you? I am not angry, just disappointed.
Sigh, now you know that to instruct Travis-CI about the build specifications we need create a markup file named “.travis.yml”. We will create the following file in the root of our project:
This .yml file tells our continuous integration service, how to install our file, what scripts to run and what environments we want to test it for.
Now that we have written it, we commit the file to the repository. Travis-CI automatically picks up the commit and runs the integration test in the environments defined by you. Look at the Tests run!
It seems like our tests are failing for two environments, Python 2.7 and nightly builds.
Since, we want our package to work with the stable releases we will remove the “nightly” environment from our .travis.yml file.
However, we do want our package to work with Python 2.7. We need yo investigate the logs for this environment. It seems like type(self) within Samclass is returning “instance” instead of “Samclass” in Python 2.7. After some effective Googling, I came across this answer. We need to explicitly inherit from “object” class.
Fixing the code:
Rejoice! Everything is passing.
Whenever we will push a new commit now, Travis-CI will inform you if it is successfully building in all the desired environments!
PS: You can also specify the OS and lot more, make sure to go through their documentation.
Uploading your project on PyPI
We have worked really hard to create your project, now we need a way to distribute it so that people around the world can easily download it. We want the community to be able to download sampkg using
pip install sampkg just like we easily downloaded pytest.
To do so first we need to create a PyPI account.
After creating the account, we will pack our code for distribution. We can either use
wheel to do so. I have a crazy idea, we can use both!
pip install setuptools wheel python setup.py sdist bdist_wheel
This will add two files (along with other artefacts) to a newly created dist folder:
What is difference between these two, read that for yourself.
Our code is now ready for distribution. To upload it on PyPI we will use twine
pip install twine twine upload dist/*
After running the above commands you will be prompted to enter your PyPI credentials, after which you will receive a response similar to:
View at: https://pypi.org/project/sampkg/0.1.0/
Congratulations! Your package can now be downloaded and used by anyone by using:
pip install sampkg
If you want to update your package and publish the changes in PyPI, update the version number in setup.py and run:
python setup.py sdist bdist_wheel twine upload --skip-existing dist/*
You can also update the package you downloaded using pip by:
pip install --upgrade sampkg
If you have been following this blogpost, then your package would be look something like this:
├── LICENSE ├── README.md ├── dist │ ├── sampkg-0.1.0-py3-none-any.whl │ └── sampkg-0.1.0.tar.gz ├── sampkg │ ├── __init__.py │ └── sammod │ ├── __init__.py │ └── sammod.py ├── sampkg.egg-info │ ├── PKG-INFO │ ├── SOURCES.txt │ ├── dependency_links.txt │ ├── requires.txt │ └── top_level.txt ├── .gitignore ├── .travis.yml ├── setup.cfg ├── setup.py └── tests └── sammod ├── test_sam_func.py └── test_samclass.py
I have included dist and .egg-info for your viewing pleasure, they will (and should) be ignored by you .gitignore file when you will commit your changes.
Now that you have been familiarised to the tools you need to create an open-source Python project. What are you waiting for? Go create the next disruptive package!
Make sure to tweet to me@pgtgrly, if this post helped you or if you have any suggestions. I would love to hear from you!