If I can do it - so can you!
Here are some notes on an anecdotal path towards becoming a data visualization practitioner - Someone who does data vis. These steps are in rough sequential order, with of course overlap and repeating expected. All resources are tiny subsets of what could or should be included. Such is the nature of resource lists.
Step 1 - Ponder
Ponder what type of practitioner you would like to be.
Some options:
Technically focused, typically with computer science background
Examples:
Design focused, with entry points into data vis
Examples:
Domain focused, providing expertise to a narrow
Examples:
- Biology - Bang Wong
- Music - Alexander Chen
General Advocate, preaching to the masses.
Examples:
- Jon Schwabish (also is a domain expert on vis in government)
- Noah Illinsky
Academic focused, do data vis in a research setting (often overlaps with domain expert)
Examples:
Journalism focused, some of the best vis out there comes from newspapers
Examples:
Of course many people expand, overlap, and break out of these categories.
Moritz Stefaner is a example of someone who has combined technical, design, and academic focuses.
Also, you don’t have to use any of these generalized “templates” or even decide which type to become now, its just useful to keep these broad categories in your head.
Step 2: Read
You need to get a base line of existing data visualization examples, practices, and research. You will start to develop methodologies of how to talk about data vis, and what has been done in data vis. This comes in the form of books, blogs, and research papers.
Suggested reading list.
Books:
- The Visual Display of Quantitative Information - Tufte
- Visualize This - Nathan Yau
- The Functional Art - Alberto Cairo
Research Papers:
Blog Posts:
Step 3: Monitor
Get a sense of what has been happening in the world of data visualization recently.
The readings address the history of data vis, but you also want to see what is going on RIGHT NOW.
Blogs to scroll through:
- Flowing Data
- 2014 New York Times Interactives
- 2015 New York Times Interactives
- Mike Bostock’s Visual Creations
- Visualizing Economics
- Junk Charts
- Eagereyes
- the functional art
Additional Twitter People to follow:
- Lynn Cherney - vis, data science & python
- Nadieh Bremer - d3, style & interactives
- Elijah Meeks - networks & opinions
- Kim Rees - do good with data
- Martin Wattenberg - google and famous
- Jonathan Corum - science and news
- Hadley Wickham - R, data science
- Mike Bostock - D3
Audio:
- Data Stories - podcasts
- Policy Vis - podcasts
Step 4: Learn
Should be done concurrently with Step 3 and Step 5.
In order to create visualizations or talk about the creation of visualizations, you are going to need to learn how to use some tools.
These tools could be programming languages, code libraries or toolkits, or even GUI-based desktop applications.
Some suggestions. The important thing is to not try to learn them all at once, but pick one or two to explore based on your current strengths:
Programming Languages
R - if you want to do static images and data exploration, its hard to beat R. The ‘tidyverse’(ggplot2, dplyr, etc) of tools originally built by Hadley Wickham is really a full stack for researching and visualizing data. The R language is weird and not the easiest - but with a few concepts under your belt, you become very powerful.
Python - if you want a more standard scripting language to work with, try python. Pandas, Matplotlib, and Seaborn: go a long way to munging and visualizing data.
Processing - for more design / educational tool, processing is great. Many of the tutorials are moving to Processing.js - a javascript port of the tool. This allows for the creation of web-based interactive visualizations as well. But standard Processing is still a powerful and interesting option.
Code Libraries & Toolkits
D3 - Javascript library, still the Czar of interactive data visualizations. If you want to make interactives, and want them on the web - you should probably take a look at D3.
Processing.js - as mentioned, Processing.js is a great tool for ‘painting with data’. it takes a different paradigm then D3 - which some might find easier to reason about.
Desktop Applications
Excel - Still the workhorse of the non-programmer data world, its good to have some basic knowledge of Excel just to be able to share spreadsheets and quick analysis with others. And you can make some pretty interesting visualizations with it (as long as you stay away from the defaults).
Tableau - Success story of the Business Intelligence (BI) world, Tableau has matured into a robust and fast data exploration and visualization suite. I would pick this over Excel for non-coding exploration. Their flagship product is expensive, but they also have Tableau Public which is free and the way to start.
Step 5: Create
Should be done concurrently with Step 3 and Step 4
Creations can be in the form of
- a visualization (static or interactive)
- an analysis of a topic or dataset
- an opinion of a hot data visualization topic
- an aggregation or collection of visualizations around a central theme
- a visualization book review
- a code tutorial or process
Start by creating a website to hold your creations.
I would hands down recommend using Github Pages with a custom domain name. Jekyll (the tool powering these sites) is powerful and useful tool to learn and use.
You can put javascript interactive as separate repos. its awesome. And, it only costs ~15$ a year (for the domain name).
Hopefully you have seen visualizations that resonate with you, or visual forms you might want to explore more. Alternatively, you might have seen or heard or thought about interesting topics with data you might want to visualize.
Follow one of these routes down a path to capitalize on your learning and start the creation process. Creations can be simple, but well done - the best you can at this point in time. And always cite inspirations and data sources in an ‘about’ section!
Data to explore:
- Data is Plural - hand curated and always interesting.
- Open data portals: New York, Seattle, and more!
- My Pinboard of data