Photo by Burak K from Pexels

As an aspiring master of all things data I often think about my favorite aspect of the many things one can do with data: Visualizations (data viz)! If you’ve ever taken time to visit one of the many websites using data visualizations (like https://pudding.cool/ or https://setosa.io/#/) and play with their charts and graphs you can quickly get a sense of the potential and power that data visualizations can unearth. In my personal work with data I’ve used a combination of Matplotlib and Tableau to make vizualizations but I’ve always been intrigued with flexibility and power of the JavaScript library D3.js. This library is an amazing tool, but it also comes with the reputation of being very dense and having a steep learning curve. While that may be true, I’ve decided to finally dive into the deep end and see if my knowledge of JavaScript and data will provide me with enough intuition to keep me afloat. I intend to use this, and future blog entries, as a learning journal to share with others who might be thinking of taking that same leap. This post is meant as a basic intro and setup with the addition of how to use selectors. If I can run into a few walls so you don’t have to then I’ve done my job.

The first question before starting any quest for knowledge is “Why?”. There’s more information in this world than any one person could ever hope to consume, so I think it’s a good idea to lay out why we need data visualizations and why D3.js. Visualizing data allows us to do two very important things. Firstly it allows us to see things that raw data or even summary statistics can’t. Trying to hold and draw insight from more than a few numbers in your head is a losing game. We need ways to summarize data, but the ways we choose to summarize can lead to very different insights. An example often used in the data viz community is Anscombe’s Quartet. This dataset is made up of 4 tables with different x and y values. If we view the summary statistics of all these tables they look exactly the same. Now if we plot the points we see wildly different distributions.

From Wikimedia Commons, the free media repository

How we choose to display our visualizations can also reveal very different truths. For example Aurélien Géron’s book Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (2nd Edition) shows us that plotting a dataset in 3 versus 2 dimensions can reveal drastically different results regarding where to set a decision boundary when splitting data (for more on this topic and dimensionality reduction in generaly check out my other blog post: SHAMELESS PLUG)

Now that we have a sense of why we visualize data, we can ask, “Why D3?”. D3 seems to hit the sweet spot between customization and convenience. While we could probably build out a quick visualization in Tableau faster than we could in D3, there are more ways to customize in D3. While we could draw an SVG line path by writing the code out from scratch, D3 gives us a line generator to save us some time. D3 essentially allows us a great deal of customization while providing enough functions and methods to make coding it approachable.

If you’re not quite comfortable with the web-dev holy trinity of HTML, CSS, and JavaScript you may want to take some time to get a working understanding how each of these interact with the Document Object Model (DOM) to render a webpage. If you need a refresher or have some programming background and just need a quick “how do these work”, you can take advantage of the freeCodeCamp.org video Data Visualization with D3.js — Full Tutorial Course. This video is almost 13 hours long so you can just skip to the 0:29:31 mark and watch from there for a quick and practical guide.

Assuming you have some basic knowledge we can set up D3. To do this we want to create a new project folder and in it we create a file which we will call “index.html”. We also want to create another file in the same folder which we will call “app.js”. In “index.html” we can set up a basic html boilerplate. Now that we have our boilerplate we can navigate to https://d3js.org/ and copy the script tag.

This should be pasted between the `<head> </head>` tags. If you visit that url you can see that it links to all of the D3 functions. So when we run our page it loads in that library for us to use. We are also going to add a script tag below our body with the source equal to “app.js” which is where we will be writing our actual code. So far, this is what we should have:

Now that we have the basic setup I want to end this blog with one last concept: selections. If you’re familiar with JavaScript you should know methods for selecting tags like getElementById() or getElementsByClassName(). D3 actually makes selections a little bit easier. According to Mike Bostock, one of the developers of D3.js, these are the 5 most common selection methods in D3 that you should know:

SIMPLE SELECTORS:
#foo // <any id=’foo’>
foo // <foo>
.foo // <any class=’foo’>
[foo=bar] // <any foo=’bar’>
foo bar // <foo><bar></foo>

COMPOUND SELECTORS:
foo.bar // <foo class=’bar’>
foo#bar // <foo id=”bar”>

To see how we can select and manipulate elements let’s create a div and header tag in the body of our “index.html” page and give it an id of “text-test” and text of your choosing. In our “app.js” file we can write “d3.select()”. If we look at our selection methods id’s are prepended with a hashtag, so we pass “#text-test” into “d3.select(‘#text-test’)”. In HTML we can change the color like so: <h2 style=”color: blue”>testing</h2>. Similarly in D3 we can manipulate the color by adding the style method and passing in “color” and “red” as the arguments which should look like this “d3.select(‘#text-test’).style(‘color’, ‘red’);”.

Using this same methodology I encourage you to play around with building some elements and trying to manipulate them using D3. This setup introduction should be enough to get you up and running. Check back for more additions to this D3 series of posts to learn how to start working with SVG shapes.

Former English teacher turned Data Scientist/Analyst interested in data, design, and storytelling.