Python is a powerful language, but setting the workflow up can be a hassle. Let’s make it simple.
In this series, we’re going to set up python from scratch and use it to test whether Zipf’s Law holds on a given data set. We’re going to break this tutorial into several articles: one to set up a python program, a few to cover the features of python we’ll use, and one to actually write the program.
This article will set you up to write python to you heart’s content.
This Article is for Beginners
If you already know python and how to set things up, this series will not be for you. This series is aimed at people who want to get into python programming, not necessarily people who already know python. Later on, I’ll do more advanced stuff with python and other programming languages that you might find interesting.
Other Ways of Getting Started
There are plenty of other ways of getting started with python, from setting up things like anaconda or by using Jupyter notebooks. I won’t talk about those ways because overwhelming people with choice is a bad way to teach them something. If you write an article that contains another way of getting started with python or you feel that I’ve left out some details, let me know in the responses below.
An Unusual Pattern
Zipf’s Law predicts that the nth most common item in a data set shows up 1/n times as often the most common item. For example, the second most common item should show up half as often as the most common item and the twentieth most common item should show up one twentieth as often as the most common element. This empirical law was originally derived from word counts, but it seems to apply to many other things like cities and their populations (the second most populous city has half as many people as the most populous people, etc.). In this tutorial series, we’re going to try to verify this law for a large body of text (specifically, Moby Dick because it’s public domain and uses mostly ASCII characters).