Python for Scientific Computing
I am a signal processing engineer and most of my work involves the creation of algorithms that make sense of data. I work with the data that is usually generated from an automotive radar but this necessarily doesn’t have to be the case. For me, the data could be from any source but as long as there is information in the data, I can find a way to extract it. The complication that arises with my line of work comes from the fact that the data is usually noisy. Therefore, I end of doing quite a bit of statistical simulations to verify my work.
Most engineers and scientists in my field work with MATLAB®. I however prefer to use Python. So if you are someone doing something similar to what I have just described, you might find this post useful. The following are some of the learnings from 2-3 years of consistent use of Python code scientific computing. This is of course a subjective account but I am free to discuss any of the points below via email.
Filtering Noise (Source: Catchpoint) |
The don’ts
I will start with a few don’t that I believe are counter productive if you want to be good at scientific computing.
Don’t reinvent the wheel
If you are a software engineer, you can ignore this. But if you are trying to do signal processing, do not reinvent the wheel. Do not write your own fourier_transform()
function. If you are learning to optimize the fourier transform or trying to understand it, by all means, go ahead. But most of the time, stealing someone else’s implementation will be much more productive. Most importantly, your implementation will be slower. This will impact your simulation time. Learn to use numpy and scipy.
Don’t use the latest version of Python
Python is work in progress. Several versions of python exists and I have found that the sweet spot for the version to use is one lower than the latest. Most packages on PyPI will support it and you will not run into issues where a dependency of a dependency of your package is not supported by the latest version. Python is notoriously difficult at handling this.
Don’t over engineer your code
I have noticed that most of the code that actually helps make scientific conclusions can be achieved via very little. Keep your code and data simple. You don’t need inheritance, you don’t need that shiny new data structure. Keep to the basics and things will be much more easy to manage in the long run. It is not about how many lines of code but how few lines of code. Reduce the number of moving parts. A side note: If you are working with a team on a large project. Work on a smaller repository for your idea with a simplified test bench to prove your idea before shipping it to the larger system.
The do’s
Use version control
Even if you are the only person working on a project. Use git. This is crucial. If you want to work with other, it is doubly more important. git is not restricted to software projects but any text files that require frequent revision. Even this blog is under version control. Do not have 10 versions of your code floating around. Work on two branches, main
and develop
and keep them regularly in sync.
Use a virtual environment
As mentioned earlier, python is known to be difficult to handle with respect to dependencies. Dependencies in python are both a boon and a curse. I find PyPI to be a democracy of sorts with the best package rising to the top for use. This allows for programmers to help each other out and reuse things. However, the version of package that you use might not be the same as the one installed on your colleagues machine. Therefore, for every git repository you have on your machine maintain a venv. Maintain a requirements.txt
file in the root folder so that others can easily install the dependencies needed for your code.
Automate the testing
Most of the time, simulations require the generation of synthetic data or reading stored data and running it through your code. In my experience this is the most time consuming part of scientific computing. Learn to automate this and it will improve your workflow tremendously. Bonus points if you can make this work with git on pushes to the git server via Jenkins some other automated testing platform.
Document the code
At least, maintain a README.md for your repository. Explain briefly what you code does and how to install dependencies and run it. Start with this and when the experiment starts showing results, start maintain a more long form report containing the ins and outs of the code and the mathematics behind it. I prefer LaTeX for its superior typesetting and ubiquity in academia. It also exports the same report as a HTML document which you can server as a static page along with your code.