Sentiment Analysis with NVivo 11 Plus

From E-Learning Faculty Modules


Contents

Module Summary

Takeaways

Learners will...

  • Learn a general view of what sentiment analysis is and how it is applied to human awareness and decision-making
  • Explore the sentiment analysis tool in NVivo 11 Plus (its capabilities, its four categories—which may be collapsed into two categories), and other aspects
  • Review the sorts of data that may be automatically coded for sentiment (and how this data may be human pre-processed and also human post-processed)
  • Consider the auto coding of the sentiment-extracted text sets with NVivo 11 Plus
  • Consider some of the supportable assertions that may be made with sentiment coding from text sets in NVivo 11 Plus

Module Pretest

1. What is sentiment analysis? How is it generally conceptualized? How is it applied to human awareness and decision-making?

2. What is the sentiment analysis tool in NVivo 11 Plus? How does it generally work? What are the four categories of sentiment captured using NVivo 11 Plus? How can these categories be collapsed into either positive or negative sentiment?

3. What sorts of data may be automatically coded for sentiment? How is this data pre-processed? Why is pre-processing necessary? (How does the structure of the ingested texts in NVivo 11 Plus affect the sentiment processing?) How is this data post-processed? Why is post-processing necessary?

4. What are the auto coded text sets in NVivo 11 Plus? What are some additional computational analyses that may be run on these text sets?

5. What are some types of supportable assertions that may be made with sentiment coding from text sets in NVivo 11 Plus?

Main Contents

1. What is sentiment analysis? How is it generally conceptualized? How is it applied to human awareness and decision-making?


“Sentiment analysis” (through computational means) was pursued in earnest beginning around 2005. The intuition behind this research is that attitude leads to behavior, and whether people feel positive or negatively towards something will affect how they think and act. Sentiment analysis, simply, is conceptualized as a positive-negative polarity, with semantic-based terms indicating positive or negative leanings. Some algorithms enable close analyses of sentiment on particular types of texts (such as Tweetstreams or formal published articles in particular learning domains, for example). Others are generalist algorithms that strive to capture sentiment, even those with double negatives or sarcasm or irony or humor.

In the market, the NVivo 11 Plus sentiment analysis tool seems to be a mid-range one, which enables the capturing of sentiment…but which still has a lot of “noise” in the data.


2. What is the sentiment analysis tool in NVivo 11 Plus? How does it generally work? What are the four categories of sentiment captured using NVivo 11 Plus? How can these categories be collapsed into either positive or negative sentiment?


The sentiment analysis tool in NVivo 11 Plus compares selected text or text sets in a project against a pre-labeled sentiment dictionary, and it auto codes text into four basic sentiment categories: very negative, moderately negative, moderately positive, and very positive. These categories may be viewed in the Nodes area of the tool, and the specific texts coded to each category may be viewed in the four categories, or they may be viewed as either positive or negative (with the “very” and “moderately” sets collapsed). The coding may be applied at various units of analysis: paragraph, sentence, or cell. The proper choice depends on the general type of text data in the analyzed set(s), and it may require some experimentation to see what works best.

The general steps to the sentiment analysis are as follows:

1. Collect a set of text [to answer a research question(s) and its(their) subquestions]. Text may be acquired in a number of ways. Texts may be formal or informal; they may be born digital, extracted from multimedia, digitized from analog files, or other methods.

2. Process and clean the text. (Make a note of how the text was processed, if at all.) Ensure that the text is in one main language (since NVivo 11 Plus works with one base language at a time). Multiple autocoding runs and multiple types of autocoding may be done on a particular text set.

3. Start an NVivo 11 Plus project.

4. Ingest the text(s) into the project in the proper form (as a synthesized individual text set, as small text subsets, as single documents, or whatever).

5. Run the “sentiment analysis” auto coding tool.

6. Review the underlying texts in each of the categories to ensure that they are properly coded. If not, remove the text from the coded category, or re-code to another category.

7. Finalize the sentiment analysis.

8. Analyze the created nodes with the coded text linked to each node.

9. Create data visualizations from the extracted sentiment labels for further insights.

10. Run additional analyses (and related data visualizations). In NVivo 11 Plus, these analyses may include the following: text frequency counts, text searches, matrix queries, theme and sub-theme extraction, and others.

11. Use “close reading” to understand the data at a more granular level.

12. Use due care when making assertions from the sentiment analysis.


Depending on the research needs of the researcher and the research context (and likely other factors), these steps may differ.

An Example

To see how this might work, an example was run based on a set of Tweets (including retweets) from the @ISS_Research tweetstream. This account, which is a shared one for astronauts conducting #research and #science on the International Space Station, started in September 2010. At the time of the data extraction, the account had 9,306 Tweets, 428 following, 327,317 followers, 3,017 likes, and 5 lists. The data extraction itself resulted in the capture of 2,275 messages through the Twitter API (application programming interface).

Image:Screenshot@ISS_ResearchonTwitterLandingPage.jpg

The text set was coded at paragraph level. An intensity matrix of the extracted sentiments from the text set is shown below.

Image:SentimentIntensityMatrix.jpg

The same sentiment extractions may be visualized as a bar chart.


Image:SentimentBarChart.jpg

The visualization below shows the text set coded to the Very Positive category for the @ISS_Research Tweetstream (with Tweets from the most recent).

Image:ExtractedSentimentVeryPositiveTextSet.jpg


3. What sorts of data may be automatically coded for sentiment? How is this data pre-processed? Why is pre-processing necessary? (How does the structure of the ingested texts in NVivo 11 Plus affect the sentiment processing?) How is this data post-processed? Why is post-processing necessary?


In NVivo 11 Plus, while many common forms of text files, spreadsheets, audio, video, and other such file types may be ingested and manually coded, auto coding may be applied to text only. In other words, the various “data queries” and “autocoding” tools work on machine-readable text. The multimedia contents may be of any kind that is usable in NVivo, but any non-text files need text equivalents that contain the informational equivalency of those multimedia files: alt text in imagery, transcription for audio and video, and so on. Scanned PDFs have to be “searchable” or readable by screen readers (and other computer software programs).

How a researcher builds the sets of data is important. For example, a set of topically related articles may be downloaded from various databases for sentiment analysis. A person who runs a sentiment analysis on the (1) individual articles as a collection will get different results than a person who (2) synthesizes the articles into a text set and then runs the sentiment analysis. In the first case, the individual articles will be treated as separate documents, and sentiment analyses will be conducted on each article, and the results may be seen in an intensity matrix with the articles in alphabetical order in the left menu (as single records). In the latter case, the sentiment analysis will treat the full set as one entity, and the sentiment analysis is a summary extraction from the full set as one large “bag of words.”

As with other forms of research, the text sets may have to be cleaned before processing. The necessary cleaning depends on the type of data captured and what needs to be pre-processed. A common pre-processing involves de-duplication (the removal of repeated documents, for example). Text files may have to be transcoded into other formats.

Another decision that researchers have to make is at what level the texts should be analyzed at. There are three general choices: (1) paragraph, (2) sentence, or (3) cell. It is a good idea to decide the level of granularity at which such an analysis should be applied. The paragraph level is coarser and less granular than the sentence level. The cell level may be the proper choice for microblogging extractions which come out in .csv, .xl, and .xlsx formats (and do not have traditional sentences), but the cell level may be inappropriate if the data is in data table format but the data comes from online surveys and include full paragraph and sentence responses in cells (the cells contain full paragraphs and sentences). Cell level analysis in the latter case may be too coarse and not sufficiently granular. It would be helpful to experiment with the various text sets and text formats before deciding on how to set up the parameters of the sentiment analysis. (And of course, it is important to document the approaches in order to report out for presentations and publications.)

Finally, there is also data post-processing. At minimum, researchers will need to review the extracted text sets based on the four sentiment categories. Per the limitations of the software tool, there will be texts that do not belong to particular categories…as the program has “misunderstood” the messaging. This noise in the data has to be removed for an accurate representation. There are additional analyses in the next section.


4. What are the auto coded text sets in NVivo 11 Plus? What are some additional computational analyses that may be run on these text sets?


The auto coded text sets in NVivo Plus usually fall into one of four categories in terms of the sentiment analysis tool: very negative, moderately negative, moderately positive, and very positive. These may be seen as nodes. (The nodes may be converted back into basic text files.)

  • These nodes may be analyzed using the matrix coding query
  • Theme extractions may be run against the respective sentiment text sets
  • Word frequency counts may be run against the respective sentiment text sets
  • Word searches may be run against the respective sentiment text sets (and word trees extracted)

There are other more sophisticated ways to conduct analyses on text sets, using other software tools and distant reading techniques. These are beyond the purview of this article though.


5. What are some types of supportable assertions that may be made with sentiment coding from text sets in NVivo 11 Plus?


It depends. Some questions to ask include the following:

  • Where did the text sets come from?
  • How were the text sets handled and pre-processed?
  • What were the parameters of the sentiment analysis run in NVivo 11 Plus?
  • What post-processing of data occurred?

And so on.

There are a number of domain issues that affect the assertability of sentiment-based insights as well.

Examples

Please see above in the Main Contents area.

How To

Briefly put, the steps to the process are as follows:

1. Collect a set of text [to answer a research question(s) and its(their) subquestions]. Text may be acquired in a number of ways. Texts may be formal or informal; they may be born digital, extracted from multimedia, digitized from analog files, or other methods.

2. Process and clean the text. (Make a note of how the text was processed, if at all.) Ensure that the text is in one main language (since NVivo 11 Plus works with one base language at a time). Multiple autocoding runs and multiple types of autocoding may be done on a particular text set.

3. Start an NVivo 11 Plus project.

4. Ingest the text(s) into the project in the proper form (as a synthesized individual text set, as small text subsets, as single documents, or whatever).

5. Run the “sentiment analysis” auto coding tool.

6. Review the underlying texts in each of the categories to ensure that they are properly coded. If not, remove the text from the coded category, or re-code to another category.

7. Finalize the sentiment analysis.

8. Analyze the created nodes with the coded text linked to each node.

9. Create data visualizations from the extracted sentiment labels for further insights.

10. Run additional analyses (and related data visualizations). In NVivo 11 Plus, these analyses may include the following: text frequency counts, text searches, matrix queries, theme and sub-theme extraction, and others.

11. Use “close reading” to understand the data at a more granular level.

12. Use due care when making assertions from the sentiment analysis.


Depending on the research needs of the researcher and the research context (and likely other factors), these steps may differ.

Possible Pitfalls

A sentiment analysis tool is one of those that is not often used alone in a research context. It is again a tool that enables some early insights and may offer some leads for further exploration, but its findings are not usually cited alone for research. In other words, sentiment analysis complements other research approaches but generally is not the primary focus.

Also, per QSR International’s page on “How auto coding sentiment works”, various types of expressions are not capturable with the sentiment analysis tool. These include the following: “sarcasm, double negatives, slang, dialect variations, idioms, (and) ambiguity”. In other words, texts are analyzed in a fairly simplistic way, without tuning to cultural and other nuances.

Module Post-Test

1. What is sentiment analysis? How is it generally conceptualized? How is it applied to human awareness and decision-making?

2. What is the sentiment analysis tool in NVivo 11 Plus? How does it generally work? What are the four categories of sentiment captured using NVivo 11 Plus? How can these categories be collapsed into either positive or negative sentiment?

3. What sorts of data may be automatically coded for sentiment? How is this data pre-processed? Why is pre-processing necessary? (How does the structure of the ingested texts in NVivo 11 Plus affect the sentiment processing?) How is this data post-processed? Why is post-processing necessary?

4. What are the auto coded text sets in NVivo 11 Plus? What are some additional computational analyses that may be run on these text sets?

5. What are some types of supportable assertions that may be made with sentiment coding from text sets in NVivo 11 Plus?

References

How auto coding sentiment works. (n.d.) QSR International. Retrieved Feb. 2, 2016, from http://help-nv11.qsrinternational.com/desktop/concepts/How_auto_coding_sentiment_works.htm.

Extra Resources

Hai-Jew, S. (2015). "Autocoding" through sentiment extraction and analysis. In "Using NVivo: An Unofficial and Unauthorized Primer".