Creating Effective Data Visualizations in Excel 2016: Some Basics

From E-Learning Faculty Modules


Contents

Module Summary

This short module introduces some methods for creating effective data visualizations from structured data in Excel 2016 using the “Recommended Charts” and built-in charts features. Some common built-in data visualizations: column, line, pie, bar, area, X Y (scatter), stock, surface, radar, treemap, sunburst, histogram, box & whisker, waterfall, combo, and 3D maps. A number of the data visualizations are available in both 2D and 3D versions. This will introduce the work of creating data visualizations in Excel 2016.

Takeaways

Learners will...

  • review the basics of the structure of “structured data” and consider how data structure (and data amounts) affects the possible data visualizations
  • conceptualize the core descriptive functions of data visualizations from structured data
  • think about whether data visualizations should be linked to their underlying data tables or not, and also when data visualizations should be direct copied or when they should be rendered as image files
  • explore what to consider when selecting possible data visualizations
  • consider the pros and cons to the “Recommended Charts” feature in Excel 2016

Module Pretest

1. What are the basic structures of “structured data”? How does data structure affect possible data visualizations? What is the importance of the amount of data being visualized?

2. What are some of the core descriptive functions of data visualizations from structured data?

3. When should data visualizations be linked to their underlying data tables? When shouldn’t data visualizations be linked to their underlying data tables? Also, when should data visualizations be represented as copied files vs. image files?'

4. What are important points to consider when selecting possible data visualizations?

5. What are some pros and cons to using the “Recommended Charts” feature in Excel 2016? What are some add-ins to Excel 2016 that enhance the data visualization capabilities of the tool?


ExcelScreenshot.jpg

Main Contents

1. What are the basic structures of “structured data”? How does data structure affect possible data visualizations? What is the importance of the amount of data being visualized?

“Structured data” refers to data in data tables and worksheets. The typical structure is that variables are labeled in the column headers, and the rows consist of records, with unique identifiers in the row headers down the far left column. The data is “structured” because every piece of information is identified. Every cell has a column header and a row header and so is “labeled”. Also, the placement of the data in respective cells enables the study of various interrelationships between the table data. Most software tools used to analyze structured data can identify the basic data types: dates, string data, latitude and longitude, and others.

The “structure” of the data (their columnar order, from left to right; the “type” data format for each column; the proper columnar label) has to be correct in order for a software tool to create a corresponding data visualization. For example, an OHLC chart has to be labeled as open, high, low, and close to properly represent the stock chart data. Without those labels and without proper information in the respective cells, a coherent data visualization will not be possible.

The amount of data in a data visualization is important in several ways. First, a 2D and a 3D space—the x and y axes, or the x, y, and z axes—has limited “real estate” to convey information. While these visualizations can often handle millions of lines of row data, only a few columns of data may be addressed in a clear way. Many data visualizations exist in a context, with surrounding descriptive information. Many data visualizations are part of a data visualization sequence, so the necessary information is conveyed in greater completeness over time. Also, many data visualizations are linked to underlying dataset(s), so access to fuller informational understandings is available.


2. What are some of the core descriptive functions of data visualizations from structured data?

One way to conceptualize data visualizations is as core descriptive functions. Data visualizations communicate the following:

  • proportionality
  • frequency (and “intensity”)
  • changes over time
  • hierarchical relationships (in terms of various groupings and relationships)
  • descriptive statistics of central tendency and distributions
  • social relationships
  • physical –spatial relationships, and others

(For more on this, please see the slideshow “Creating Effective Data Visualizations in Excel: Some Basics” hosted on SlideShare. The slideshow lists various data visualizations linked to the particular core descriptive functions listed above.)


3. When should data visualizations be linked to their underlying data tables? When shouldn’t data visualizations be linked to their underlying data tables? Also, when should data visualizations be represented as copied files vs. image files?

By default, a data visualization that is copied out from Excel will be linked to the underlying data. If the underlying data is changed, the data visualization automatically updates to accommodate that change in value in the respective cell (or the changes in formulas or some other change). This connection continues as long as the file are in the same folders and with the same local relationships. If files are moved or links break, then the data visualization will not update.

Copying a data visualization into a slideshow and such for presentation is generally a good idea because the data visualization will be machine (and screen reader-) readable. This means that the data visualization will be more accessible to a broader range of people, including many who have some visual acuity or other perceptual or symbolic processing challenge.

A small downside, though, is that such data tables do not resize very well with proper maintenance of aspect ratio. Also, the data charts have a transparent background (alpha channel), which means that preset page numbers and backgrounds of the receiving documents will show through.

A data chart may be screen-captured and pasted into a receiving document or slideshow. In this case, there is a given non-transparent background. Resizing with proper aspect ratios is easy. However, the image is not automatically machine readable or screen reader readable. Images have to be annotated with “alt text” with informational equivalency of the original chart.


4. What are important points to consider when selecting possible data visualizations?

The depicted data has to fit the data visualization. There are conventions to different data visualizations. Data visualizations are understood to convey certain types of data, and if the underlying data does not meet the requirements for the data visualization, then researchers should proceed with care and / or caveat their use of a particular visualization for non-conventional data types. For example, histograms are understood to represent continuous data. So if a histogram is used for qualitative and categorical data, that should be noted.

With the ease of data access, and the ease of running statistical analyses over data, and the ease of creating a data visualization, some researchers will blitz through the work and unthinkingly output meaningless (or seriously compromised) data. On the Web are a number of sites and videos that give wrong information on how to output various data visualizations. Such sites make the issue worse. And then, the speed at which people read data visualizations also compounds the problem—of having information mis-shared widely and feeding a sense of alternative facts (untruths).

This is to say that those who would conduct research need to do so in highly professional and careful ways. They need to handle data with care and not introduce error by how they handle data. They need to continue that professional care when structuring data for data visualizations and then outputting and publishing those data visualizations. The speed of technology is impressive and a net positive, but these capabilities benefit researchers who do their homework and take the time needed to do something right. (One additional note: Researchers need to constantly refresh on their statistical understandings to be effective.)


5. What are some pros and cons to using the “Recommended Charts” feature in Excel 2016? What are some add-ins to Excel 2016 that enhance the data visualization capabilities of the tool?

Finally, what are some pros and cons to using the “Recommended Charts” feature in Excel? This feature provides generalist “cognitive scaffolding” for those who are pretty new to data visualizations. This provides a sense of how the highlighted data may be visualized and may help users get into the mental space for creating data visualizations.

What this doesn’t do is provide higher-level insights by checking the fit of the data to the data visualization. This doesn’t offer deeper insights to other data visualizations that may be possible with the same data. This tool will sometimes glitch and may require a reboot of the software in order for it not to kick up a message that it cannot find data visualizations for a certain set of highlighted data. This is a good tool feature if used properly and with the right expectations.

Finally, for those who are interested in creating more complex data visualizations, they may want to explore add-ins to Excel 2016 from the Office Store, third-party software tools, and other trusted spaces. Add-ins enable complex visualizations like 3D maps, streamgraphs, network graphs, and others. (There is more on this in the slideshow below in the References section.)

Examples

For a wide range of examples, please see the link to the SlideShare slideshow on which this module is based.

How To

The slideshow below in the References area walks users through how to structure data for the various types of data visualizations available in Excel 2016. Because the data is already publicly available and because this module is already quite extensive, readers are asked to download or access the slideshow on SlideShare for their purposes.

Possible Pitfalls

Creating effective data visualizations can be somewhat fraught. This is because data visualizations are by necessity summary data, and they cannot fully represent the underlying dataset. They are also built off selected data, not all the underlying data. What this means is that there has to be mindfulness in the creation of such data visualizations, and the context and data labels and other customization elements should help mitigate some of the potential misunderstandings from the data visualizations. There are a number of free add-ins to Excel 2016 that enable data visualizations of other types—streamsgraphs, networks, and others, to add a wider variety of options for users of this software. Also, properly designed data visualizations will draw viewers in, so that they spend sufficient time to understand the underlying data instead of just making some fast assumptions and thinking that they understand the information.

Releasing underlying datasets? When should data visualizations be linked to underlying data tables? For many researchers in the “hard sciences” conducting research using quantitative data, they are required to share their datasets backing up their research. The requirement may come from the grant funders. For others, publishing datasets is a condition of the publishing entity, which may require “reproducible” research. In these cases, such datasets are released along with the R-code, which enables various queries and dynamic visualizations.

Before datasets are released as stand-alone files, they should be properly cleaned and de-identified. The data should not be re-identifiable. If metadata rides with the dataset, that should also be cleaned off (or should at least not unintentionally or accidentally leak data).

When shouldn’t underlying datasets be released? If the researcher lacks legal rights to release the data (such as if he or she or they is / are using others’ data), then he or she should not release the data. If third-party data is being used and it is publicly available or open-source, then he or she can link to the data source instead. If the audience receiving the data visualization does not need more data and may lack the sufficient background to understand the data, then it may be better to not release the datasets. (In qualitative research, most datasets contain some private data. There is not yet a clear way forward on how qualitative research data may be shared constructively given the inherent subjectivity and framing to qualitative research.)

Module Post-Test

1. What are the basic structures of “structured data”? How does data structure affect possible data visualizations? What is the importance of the amount of data being visualized?

2. What are some of the core descriptive functions of data visualizations from structured data?

3. When should data visualizations be linked to their underlying data tables? When shouldn’t data visualizations be linked to their underlying data tables? Also, when should data visualizations be represented as copied files vs. image files?'

4. What are important points to consider when selecting possible data visualizations?

5. What are some pros and cons to using the “Recommended Charts” feature in Excel 2016? What are some add-ins to Excel 2016 that enhance the data visualization capabilities of the tool?

References

Hai-Jew, S. (2017, Apr. 26). Creating Effective Data Visualizations in Excel 2016: Some Basics. SlideShare. https://www.slideshare.net/ShalinHaiJew/creating-effective-data-visualizations-in-excel-2016-some-basics.

Extra Resources

Hai-Jew, S. (2017, Apr. 26). Creating Effective Data Visualizations in Excel 2016: Some Basics. SlideShare. https://www.slideshare.net/ShalinHaiJew/creating-effective-data-visualizations-in-excel-2016-some-basics