Little Known Ways to Make your Data Visualization Awesome

A few months back, while I was surfing Instagram, I saw a comment in a post, with a remark, “You have used data-ink ratio in good order.” I immediately started exploring the term “data-ink” and realized that it was coined by “Edward Tufte.” I had read his name on some of the research papers a few months ago on the Human-Computer Interaction course. As I determined the connection between the author’s name and the term data-ink ratio, I began to explore it and started implementing it on my data visualization journey.

This blog is dedicated to those who want to dive into or are engaging in mesmerizing data visualization fields.

Data visualization is mapping values to visuals or, say, turning numbers into pictures and stories, allowing us to explore, explain and understand the data in a different way, giving us the window of opportunity to grab the hidden pattern and insights.

Designed by the Author. Illustration from unDraw.

Want to explore more about EDA?

Suppose you have previous experience in data visualization or is a neophyte to this field. In that case, I am sure you have played with visual encoding or have a basic idea of it.

Generally, when we try creating a visualization, we append a lot of information needed for the audience, so we fantasize about introducing more colors, facts, images, etc. They sometimes look good, but most of the time, it leads to the visual clutter: making our work loath by the audience.

Image for post
Designed by Author. Illustration from unDraw.

Today, people are so familiar with images that they neglect the text accompanying the photos. This is why Instagram gained its popularity in a short interval of time: Power of Images. As people won’t read through the text, our graphs have to be self-explanatory and can’t be open to others’ interpretations. They shouldn’t detract from the information with lots of bright colors and bold fonts, with a high cognitive load on the user. Thus, naked visualization is a better option for our users.

How can we make our visualization, Naked?

For this, we can accompany the following three concepts:

Data-ink ratio:

Above all else, show the data. — Edward R. Tufte

Information visualization expert Edward R. Tufte is an artist and statistician professor at Yale University, who has written, designed, and published four books dedicated to data visualization knowledge. In 1983, Tufte published his first book called The Visual Display of Quantitative Information, focusing on designing data graphics’ theories and practices.It is probably the most important book ever written on data visualization. In this book, he introduces two interesting graphical heuristics, the data-ink ratio, and chartjunk.

He defines data-ink as “non-erasable core of the graphic,” or in other words, “the data-ink ratio is the portion of ink (i.e., pixels) that makes up data-information on the view.”

Tufte defines the data-ink ratio as the amount of data-ink divided by the total ink required to print the graphic.

In a layman term, Tufte suggests removing those elements that don’t add new information to the graphic, i.e., using less ink (chart elements) is more effective, attractive, and imperative for viewers.

He puts forward the following five principles related to data-ink:

  • Above all else, show data.
  • Maximize the data-ink ratio.
  • Erase non-data-ink.
  • Erase redundant data-ink.
  • Revise and edit.

Let’s take an example of the data-ink ratio.

The chart below represents the penguin species and their body mass in grams.

Low Data-Ink Ratio. Image by Author. Data by Dr. Kristen Gorman.

So how we can achieve a high data-ink ratio here:

  1. We can remove the gray background behind the bars since it provides no notional value.
  2. Getting rid of the legend, since each bar can be labeled directly along the x-axis.
  3. The y-axis labels’ title is trimmed since there’s plenty of reference throughout the image(see title ) about penguins and their body mass.
  4. Coloring each bar for the penguins is unnecessary, and the x-axis name is enough to show their difference. Hence, keep the colors uniform.
  5. We can also drop the bolding throughout.
  6. There is a gridline on the graph, and we are not sure about its value here so, we can remove it.
  7. We can get rid of y-axis values and directly label each bar in the graphic.
  8. Adding the word “Species” on the title for better understanding.

After following these approaches mentioned above, we come into the chart below:

High Data-Ink Ratio. Image by Author. Data by Dr. Kristen Gorman.

We have made the graphic more straightforward and readable and increased the amount of information the viewer sees by increasing the data-ink ratio.

Rule of Thumb
Extraneous elements, such as background grids and tick lines, should be removed or visible enough to serve their purpose but not detract from the relevant information.

Chartjunk

Kill the frills and get to the point! — Edward R. Tufte

When we try to make visualization, we always think of what we will put in the graphs rather than what we will leave out. This gives rise to chartjunk.

Chartjunk is the next heuristic, Tufte mentioned in his book The Visual Display of Quantitative Information (1983). He calls Chartjunk — the excessive and unnecessary use of graphical effects in graphs. He also calls out moiré vibration, massive grids, and self-promoting charts used to demonstrate the designer’s graphic ability rather than display the data.

In simple terms, chartjunk refers to all visual elements that are not necessary for the chart view or say, which distracts the user or in the simple phrase when the chart or graph has many superfluous labels or decorations that do not convey data.

Three Type of Chartjunk:

  1. Useful Junk
    Type of junks includes all the parts of a graphic that do not convey data but carry useful or exciting context. Like annotations, info text, subheading, etc.
  2. Harmless Junk
    Type of junks that do not convey data and don’t help clarify the concept presented in data, and overall it doesn’t get in the way, like adding a border to charts or adding logo and so on. Knowingly or unknowingly, most of the works fall into these categories.
  3. Harmful Junk
    I like to say this as “The Danger Junk.” Those junk which intuitively interfere with reading and understanding of visualization. Like starting the bar chart not from the same label( excluding 0 labels) and using disturbing background images, etc.

Let’s take an example of the Chartjunk.

High Chartjunk. Image by the Author. Data by Dr. Kristen Gorman.

So how we can remove chartjunk here:

  1. Legends are unnecessary as we have penguin names on the label itself.
  2. Addition of a penguin image is not necessary as we have written the word Penguin. (Some of us might disagree with this, but I used this for understanding purposes only. )
  3. Title has (g) denoting gram, but it is also on labels only, so it is also junk.
  4. Removing word penguin from labels.
  5. Adding species on the title for better clarity.

After removing the chartjunk, we have the following graphic.

Low Chartjunk. Image by the Author. Data by Dr. Kristen Gorman.

Thus, adding extra “stuff” is not always informative as sometimes it gives the cognitive load. It is not still exact to know what chartjunk is and what is not. So we should also think about how we can clarify the message, how we want people to read our data, and what we want them to take away as sometimes adding various design elements can be useful for clarifying the story we want to tell.

1+1 = 3

White space is to be regarded as an active element, not a passive background — Jan Tschichold.

1 + 1 = 3. Image by the Author.

There is one saying, “You can’t design when you can’t see.” , So in the below design, how many elements do you see?

Designed by the Author. Image by Buzzle.
Gif by Giphy.com.

Did you also speak 2!

Designed by the Author. Image by Buzzle.
Gif by Giphy.com.

Oh, sorry, my friends, there is a third element, also called whitespace.

Designed by the Author. Image by Buzzle.

Tufte, in his next book, called Envisioning Information writes that about 1 +1 =3. Don’t be sad, space or say negative space is the most ignored design elements. It is the one that controls every other design element in our visualization. So in data visualization, the expression 1 +1 =3 is very popular as white space is anything as empty design.

Why Even White Space Matters In Data Visualization?

  • White space improves readability by ensuring that our graphics and texts are clear and legible.
  • Effective use of white space has been shown to improve content comprehension by up to 20 percent.
  • White space helps readers understand the content better by creating a sense of order and flow between graphic elements.
  • It is one of the creative tool, helping to more effectively convey a message.
  • And much more.