Intensity Matrices for Data Analytics

From E-Learning Faculty Modules



Contents

Module Summary

Intensity matrices are a common type of visual data representation representing frequency counts, with higher numbers of frequencies indicated as darker shades of color (such as in choropleth representations). What are some of the parts to an intensity matrix? How may they be interpreted? This short module provides some examples of intensity matrices representing different types of data. This offers a discussion of some ways to use intensity matrices for analyses. There is also a short section on how to create intensity matrices using Microsoft Excel.

Takeaways

Learners will...

  • explore what an “intensity matrix” is, its respective parts, the choropleth aspects of an intensity matrix, the importance of data labeling in this visualization, and the need for other labels (and information)
  • consider the sorts of data an intensity matrix contains, how this data is attained, the importance of frequencies and counting, and some limits in frequency counts
  • list some substitute types of data visualizations used to convey data in intensity matrices (including bar charts, line graphs, sunburst diagrams, treemap diagrams, 3D bar charts, and others)
  • consider some of the most common types of intensity matrix data from computational data analysis
  • describe how to make an intensity matrix in Microsoft Excel

Module Pretest

1. What is an “intensity matrix”? What are the parts of an “intensity matrix”? What are the choropleth aspects of an intensity matrix? Why is it important to label the cells of an intensity matrix with numbers? What other labels are necessary for a clearly defined intensity matrix?

2. What sorts of data does an “intensity matrix” contain? How is this data attained? What is the power of frequency (and counting) in data? What are some limits in frequency counts? (Why is it also important to consider low-frequencies in counts? Anomalies? Rarities? Long tails in power-law frequency curves?)

3. What are some “substitute” types of data visualizations used to convey the data in intensity matrices? Bar charts? Line graphs? Sunburst diagrams? Treemap diagrams? 3D bar charts? (How are 3D diagrams, with volume, interpreted in terms of intensities?)

4. What are some of the most common types of intensity matrix data from computational data analysis? Sentiment analysis? Topic modeling? Word frequency counts? What are some manually-created intensity matrix data?

5. How does one make an intensity matrix using Excel?

Main Contents

1. What is an “intensity matrix”? What are the parts of an “intensity matrix”? What are the choropleth aspects of an intensity matrix? Why is it important to label the cells of an intensity matrix with numbers? What other labels are necessary for a clearly defined intensity matrix?

An intensity matrix is simply a data table with labels in the columns (column headers) and in the leftmost rows (row headers) and representational numbers (frequency counts, intensities) in the cells. In some intensity matrices, there are equal numbers of columns and rows, so the matrix is a square; in some, there are unequal numbers of columns and rows, so the shape may be as a rectangle.

A sparse intensity matrix is one in which many of the cells may be empty, null, or zero. A dense intensity matrix is one in which many of the cells are filled and with high numbers (indicating both presence of the observed phenomenon and intensity).

Intensity matrices should not just contain color as communications because not all people can use color information (due to color-blindness, due to lack of visual acuity from color hues and saturation), and the numerical data in each cell can be informative. For an intensity matrix to be fully defined, it would be helpful for there to be a table name, captioning, and lead-up and lead-away text that describes where the data comes from, how it was handled, how the intensity matrix was arrived at, the relevance of the intensity matrix, and other relevant data. (If the intensity matrix is an HTML table, then there should be proper scripting to enable accurate reading of the table using an automated reader (so that it’s clear what each cell represents as the automated reader passes over the cell).


2. What sorts of data does an “intensity matrix” contain? How is this data attained? What is the power of frequency (and counting) in data? What are some limits in frequency counts? (Why is it also important to consider low-frequencies in counts? Anomalies? Rarities? Long tails in power-law frequency curves?)

An intensity matrix generally contains frequency data around a particular dimension or phenomenon or concept.

The data may be attained in any number of ways. Manual coding involves a human coder. For example, a human coder may mark the various locations of avatars (human-embodied ones) at various intervals in a designed virtual space in Second Life™ and log the various presences by frequency. Or, such data may be autocoded—conducted by a computer. For example, text (whether from speech or writing) may be coded for intensities of sentiment (How positive or negative is the sentiment?) Another type of autocoding involves the uses of algorithms to identify themes (topic-modeling) in text…and then instances of these theme mentions are counted…and may be represented as data in intensity matrices.

Frequency counts generally focus on the more “intense” (high-count) occurrences of particular phenomenon. Generally, the intensity matrices offer summary data about a dataset…based on raw counts. Focusing on such summary data may shed light on a particular facet of the data only. In terms of counts, single and low-level counts may shed light on other aspects of a dataset. For example, word frequency counts are not only interesting in terms of the high counts but also the low ones (in the long tail). Anomalies and rarities may highlight other issues of interest.


3. What are some “substitute” types of data visualizations used to convey the data in intensity matrices? Bar charts? Line graphs? Sunburst diagrams? Treemap diagrams? 3D bar charts? (How are 3D diagrams, with volume, interpreted in terms of intensities?)

The same data in intensity matrices may be represented as bar charts, line graphs (where the phenomenon is continuous—or not misrepresented by a line), sunburst diagrams (a type of area chart), treemap diagrams (a type of area chart), 3D bar charts (a type of area chart with the x, y, and z axes), and others.

Deciding which data visualization to use for the underlying data depends on several factors:

  • the data sharing context
  • the clarity of the data visualization
  • the individual’s data visualization preferences and desired aesthetics
  • how the users will be using the data and their expectations for the visualizations,

and other factors.


4. What are some of the most common types of intensity matrix data from computational data analysis? Sentiment analysis? Topic modeling? Word frequency counts? What are some manually-created intensity matrix data?

Some common types of intensity matrices from computational data analysis includes the following data:

  • sentiment analysis
  • topic modeling
  • spatial frequencies
  • word frequency counts
  • popularity data (such as preference data from surveys), and others

Manually created intensity matrices may be created from human counts of particular categories of a thing…based on a shared concept or dimension or phenomenon.


5. How does one make an intensity matrix using Excel?

There are some ways to create “Heat Maps in Excel” (by Geocenter) (https://geocenter.github.io/resources/2016/10/12/heatmap.html) and “Heat Maps in Excel” (https://www.excel-university.com/heat-maps-in-excel/) (by Excel University). This process involves a fairly complex sequence, so the article links will have to suffice here.

Examples

The @kaggle account on Twitter as used to create some intensity matrix examples. At the time of the data capture, the @kaggle account (https://twitter.com/kaggle) had 3,225 Tweets, 181 following, 101,283 followers, 1,372 likes, and 2 lists. There were 2,868 records captured for the visualizations.

The first intensity matrix shows the topic-modeling of the messaging (via NVivo 11 Plus and portrayed in Excel). Note that a related bar chart has also been created from the same data.


AutoExtractedThemesIntensityMatrix.jpg


The second intensity matrix shows the sentiment coding of the words used in the messaging (also via NVivo 11 Plus and portrayed in Excel).


SentimentAnalysisIntensityMatrix.jpg

How To

Possible Pitfalls

What are some possible pitfalls in using intensity matrices? One possible pitfall is that the intensity matrix is not properly labeled (such as with numbers in the respective cells), and with a reliance only on the differential saturation of colors. Or worse, the background information about how the intensity matrices were arrived at may not have been described. It is important to know what is behind the data visualization, how the data was collected, and how the data was processed. There are risks to using data visualizations without understanding their origins and how to present the information accurately.

Another risk is to stop at the intensity matrix and not study the issue further. For example, a frequency count captures the most common terms for study, but sometimes, there is value in looking at the “long tail” of word occurrences…single-incidence topics and terms…because different insights may be captured from that data. Intensity matrices focus on high-level occurrences, usually, and they are not used for single-instance occurrences.

Module Post-Test

1. What is an “intensity matrix”? What are the parts of an “intensity matrix”? What are the choropleth aspects of an intensity matrix? Why is it important to label the cells of an intensity matrix with numbers? What other labels are necessary for a clearly defined intensity matrix?

2. What sorts of data does an “intensity matrix” contain? How is this data attained? What is the power of frequency (and counting) in data? What are some limits in frequency counts? (Why is it also important to consider low-frequencies in counts? Anomalies? Rarities? Long tails in power-law frequency curves?)

3. What are some “substitute” types of data visualizations used to convey the data in intensity matrices? Bar charts? Line graphs? Sunburst diagrams? Treemap diagrams? 3D bar charts? (How are 3D diagrams, with volume, interpreted in terms of intensities?)

4. What are some of the most common types of intensity matrix data from computational data analysis? Sentiment analysis? Topic modeling? Word frequency counts? What are some manually-created intensity matrix data?

5. How does one make an intensity matrix using Excel?

References

Extra Resources