A public, interactive map to explore Black-owned businesses in the Greater Boston Area

Image for post
Image for post

Introduction

The COVID-19 pandemic has taken a serious toll on small businesses across the United States, and Black-owned businesses continue to be hit the hardest. In the past year, many have watched the escalation of ongoing economic and racial justice crises compound with a crisis of public health. Calls to support local and Black-owned businesses can hardly begin to accomplish all that’s needed to put an end to these concurrent crises; such calls can, however, be a small step in the right direction toward narrowing the racial wealth gap and repairing local economies.

The map discussed in this article is intended…


Image for post
Image for post

Using the Fibonacci Sequence

As a devout python fan, I’ve regrettably shied away from exploring many data visualization tools that are beyond the scope of libraries like Matplotlib, Seaborn, and Plotly. While perusing the diverse capabilities of D3.js in this visualization gallery, however, I was definitely excited to dip my toes in the water.

D3 is a JavaScript library that supports highly customizable and interactive web-based data visualizations. It’s short for “Data Driven Documents,” and it allows developers to create and manipulate web documents based on data. Having some prior familiarity with HTML, CSS and JavaScript will make it easier to get started with…


An Analysis of StreetEasy Rental Listings

Image for post
Image for post

Introduction

When I think about the ever-rising cost of rent in New York City, I deeply lament the fact that the Rent is Too Damn High Party has failed to get Jimmy McMillan elected as Governor. There’s no question that New Yorkers need him in office — but until that day comes, you can rely on this analysis for your awareness of how to estimate rental costs in the city.

Specifically, we’ll be working with a dataset of 3.5 thousand StreetEasy rental listings in Manhattan. Linear regression is an appropriate model to consider when predicting the value of a continuous dependent…


An exploration of three years of dating app messages with NLP

Image for post
Image for post

Introduction

Valentine’s Day is around the corner, and many of us have romance on the mind. I’ve avoided dating apps recently in the interest of public health, but as I was reflecting on which dataset to dive into next, it occurred to me that Tinder could hook me up (pun intended) with years’ worth of my past personal data. If you’re curious, you can request yours, too, through Tinder’s Download My Data tool.

Not long after submitting my request, I received an e-mail granting access to a zip file with the following contents:


Using Built-In Functions and Datatypes

Image for post
Image for post

Introduction

I’m a sucker for puzzles, and my current obsession is the New York Times Spelling Bee. In this daily puzzle, players are given a ‘hive’ of seven letters. They’re tasked with creating words which may only contain those letters, and which must include the letter at the center of the hive (known as the ‘center letter’). Valid words must also be at least four letters long. In the imitation puzzle below, for instance, ‘accolade’ would be a valid word, but ‘load’ (no center letter) and ‘cap’ (too short) would not be.


Examples of Inner, Left, Right and Full Outer Joins Using the World Database

Image for post
Image for post

Introduction

Structured Query Language (SQL) is the standard programming language for communicating with relational databases, which organize related data in the form of tables. Understanding the basics of relationships and joins is necessary for working with any relational database management system.

This article covers different types of relationships and joins in SQLite — which supports many of the features of standard SQL, but with lower memory requirements— using the world database and SQLiteStudio. The world database contains 3 tables: ‘City,’ ‘Country,’ and ‘CountryLanguage.’ These tables are all related to each other through a shared country code variable, which allows data across…


Cumulative US Cases by Region

Image for post
Image for post

Introduction

As countries around the world continue their efforts to combat the COVID-19 pandemic, data on the virus is tirelessly reported every day. The US is currently facing a surge of new cases in nearly all of its states. While the volume and significance of COVID-19 data can be overwhelming, it’s important to stay informed about recent developments as we all do our part to help fight the spread of the virus. This short article presents some visualizations of case count data by region, and explains the steps behind their creation.

About the Data

The data presented in this article comes from two sources:


An Analysis of NYC 311 Service Requests

Image for post
Image for post

Introduction

NYC Open Data is a vast trove of City government datasets that have been made available to the public. One such dataset, 311 Service Requests from 2010 to Present, will be the focus of this article. This 311 data is updated daily and contains information about more than 24 million service requests made since 2010. For those who aren’t familiar, 311 is a phone number used in the U.S. that allows callers to access non-emergency municipal services, report problems to government agencies, and request information. This article discusses my process for exploring trends in a recent subset of the data…


Image for post
Image for post

Introduction

The analysis of social media posts may be able to tell us just as much about a user’s political views as voting records or traditional polling. In this article, I’ll explain how, with the help of natural language processing , I built classifiers to predict the partisan bias and message of political posts. The data used for this project is available on Kaggle, and contains 5000 Facebook and Twitter posts from politicians that were collected in 2015. The full code for this project is available on GitHub.

Exploring the Data

After loading the data, I plotted value counts for the two dependent variables…


Using results from the 2018 General Social Survey

Image for post
Image for post

Introduction

The upcoming presidential election is widely considered to be among the most important elections in recent US history. Voter outreach organizations are working tirelessly to ensure a high turnout, especially amid new voter suppression concerns related to the COVID-19 pandemic. Past research has suggested that socioeconomic variables such as education, wealth, and occupation are strong predictors of voter turnout. In this article, I’ll present how I used a selection of the most recent data from the General Social Survey (GSS) to build a binary classifier to predict voter participation. …

Avonlea Fisher

Brooklyn-based data scientist and political scientist, interested in using the power of data for public good. https://www.linkedin.com/in/avonlea-fisher/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store