Reading DDIA and Solving Gossip Glomers in Python: Part 1
While I was writing my yearly goals for 2024, I knew in the back of my mind I wanted some kind of technical challenge. I didn’t write anything about it in the post since I couldn’t come up with an idea of the spot. In hindsight, the goal should have been to come up with a goal. I’ve since come up with, and started, such a goal.
One thing I’ve learned about learning is that it sticks best when it relates to your day-to-day activities. At Parse.ly, we deal with distributed systems daily. One of the canonical sources of distributed systems is Designing Data-Intensive Applications (DDIA). In fact, our internal Parse.ly docs contained this book reading map, with DDIA at the center. The first part of my goal was to finish reading the book (completed Jun 10, 2024).
Reading a book is one thing; understanding it is another. That’s when I decided to apply what I had learned to fly.io’s Gossip Glomers. The Gossip Glomers are a series of distributed challenges that use Maelstrom under the hood. Maelstrom is a slick framework with a bunch of goodies baked in for testing distributed systems. So, the second part of this goal is to complete all the gossip glomers (still in progress).
As of right now, I’m on the fourth challenge. I’ve found these challenges well designed for dipping your toes into distributed waters. Most of the challenges are, understandably, written in Golang. Python is heavy handed, but I’m a Parseltongue at heart. If you are also a pythonista and want to get started on these challenges, this post is for you. I’ll walk through getting things installed and solving the first problem: Echo.
Getting Started
First, we need to be able to run Maelstrom. Maelstrom is built with closure so we’ll need Java. Maelstrom also provides some plotting/graphing, which is clutch when you are trying to debug your code. You’ll need to install the following software:
- JDK (I’m using jdk11_headless)
- gnuplot
- graphviz
Next you will need the Maelstrom code:
wget https://github.com/jepsen-io/maelstrom/releases/download/v0.2.3/maelstrom.tar.bz2
tar -xvf maelstrom.tar.bz2
The Maelstrom executable is located at ./maelstrom/maelstrom
after you untar the files.
I leave the exectuable at that location and write commands in a Makefile to automate setup and run commands.
The final touch to this is getting the python code in order. First, you’ll want to download maelstrom.py. This is a library that provides the constructs for building your Maelstrom code in python.
Second is echo.py, which is the first solution.
It’s more of a “hello world” rather than an actual challenge.
You also need to make sure that you chmod +x echo.py
.
If the file is not executable, maelstrom will complain and error.
After gathering all your files, your directory layout should look like the following:
├── echo.py
├── maelstrom
│ ├── ...
│ ├── maelstrom
├── maelstrom.py
You should now be able to run:
./maelstrom/maelstrom test -w echo --bin echo.py --node-count 1 --time-limit 10
The command has verbose messaging while it runs. You’ll want to look for the following message at the bottom of the output:
...
Everything looks good! ヽ(‘ー`)ノ
...
That’s it! You now have all the building blocks to start progressing on the next challenges.
Extra Tips
Now that you’ve got your feet wet, here’s a few essential tips.
Each run of maelstrom generates results in a store
folder, separated by workloads and then by each run.
The lastest run is a symlinked to store/latest
.
This folder contains your results and is where you do all your debugging.
Logging
Most developers I know still use the tried and true method of print methods.
The problem here is that Maelstrom uses stdin and stdout for its network calls.
Our maelstrom.py
has all the necessary methods built to help you out.
node.spawn(node.log("I am here"))
Calling node.log
will write messages for each node in a log file.
Node 0, for instance, can be viewed cat store/latest/node-logs/n0.log
.
I also wrap the node.log
in node.spawn
so that if you write bad code (who me? never) you don’t block your logging.
Viewing Network Communication
Here’s were the graphing feature of Maelstrom comes in handy.
Each time you run malestrom, it will generate a svg of all the network traffic being sent.
You can view this at store/latest/messages.svg
.
This is invaluable when you start getting into the later stages of more complex code.
Being able to see the messages in invaluable when you start performance tuning.
Writing distributed code in your head is tough, better to have some picture to go along with it.
Now that you have all the parts, you can move onto Challenge #2: Unique ID Generation. Good luck!