Publication Metrics

From E-Learning Faculty Modules


Contents

Module Summary

Publishing new research is an important aspect of career advancement in academia. Such publications are seen as signs of the faculty member’s creativity and professionalism, the reputation of the university for attracting talent and grant funds, and supporting important work. A common byproduct of this work involves academic publishing, which is a culminating step of the research work and which puts a public face on the university’s efforts. With the digitization of so many academic works, various publishing companies, content creators, and other organizations have created computational methods to understand the “influence” of various academic publications and authors, among others. Influence can be an elusive concept to quantify, especially in contexts which enable some level of manipulation (like self-citation). This module offers an early look at publication metrics, how they are calculate, how they are used in academia, and some challenges in interpreting publication metrics.

Takeaways

Learners will...

  • explore publication metrics and what they are supposed to represent
  • consider common publication metrics for academic periodicals and journals (and how they work)
  • consider common publication metrics for authors and authoring teams (and how they work)
  • review common ways that publication metrics are made more reliable and are validated
  • describe how publication metrics are used by authors and publishers / publications to burnish credentials and advance interests


Module Pretest

1. What are publication metrics? What are they supposed to represent?

2. What are common publication metrics for academic periodicals and journals? How do they work?

3. What are common publication metrics for authors and authoring teams? How do they work?

4. What are some common ways that those who run publication metrics ensure that these metrics are accurate? What are some ways that people try to bypass the checks in publication metrics? What are some identified “misleading” publication metrics?

5. How are publication metrics used by authors to burnish credentials and advance careers? How are publication metrics used by publishers and publications to burnish credentials, promote sales, and attract authors?


Main Contents

1. What are publication metrics? What are they supposed to represent?

“Publication metrics” are quantitative measures about various aspects of academic publications. These quantitative measures are supposed to represent the “influence” and “prestige” of respective publications…and the “influence” (and value and capabilities) of individual and team researchers’ based on their public works.

Some common metrics for periodicals include the numbers of incoming citations there are for their works, the relative ranking of the publications of these articles which provide the incoming citations, and their relative rankings among competitor publications in a particular domain. Some common publication metrics for authors include the numbers of publications they have and how many citations each has received. The publications are tracked over time, so it is possible to see a bar chart of their publication output, which is most typically a single-modal curve…or sometimes a bimodal or multimodal curve.

Public publications in English-language journals are currently the most common sources of such metrics. Non-English works are not captured in many of the well-known publication metrics. These also do not capture embargoed (legally-held or secret) research.


2. What are common publication metrics for academic periodicals and journals? How do they work?

There are a number of publication metrics applied to academic periodicals. Each has its own strengths and weaknesses, depending both on the underlying data sources and on the methods for calculating the summary scores.

Broadly speaking, an “impact factor” (IF) or “journal impact factor” (JIF) is a summary score that combines the two complete prior years of a journal’s publications and how many citations there were of the published works. The IF or JIF is the average number of citations per work in the target periodical for the prior two years. The recent two years is important because “influence” is a decaying phenomenon and has to be maintained over time to be relevant. The two-year period enables some time to pass to enable people to read the works and to include them in other research. (It’s not enough to have been a publication has-been.) Said another way, the “impact factor” is the “yearly average number of citations to recent articles published in that journal” (“Impact factor,” Oct. 13, 2017). This is a simple measure which controls for journals that may publish high numbers of articles by taking an average of the number of citations per article. Eugene Garfield, founder of the Institute for Scientific Information, originated this measure. The impact factor for a particular year is calculated based on adding the number of citations for each of the two prior years divided by the numbers of publications of those two prior years. The IF number is actually the average number of citations per article.

As with every type of publication metric, there are strengths and weaknesses. For example, not all citations are equal, but in this count, they are. Who cites a work and in what publication is not taken into account directly. The quality of the citation—and what of the original works is cited—is not directly considered. Some suggest that there are ways people try to game this score: some publishers have apparently engaged in “coercive citation” to force authors to cite certain works before they would be published (“Impact factor,” Oct. 13, 2017) in order to skew counts. There are reports that some publishers will publish lesser works like reviews (“lesser” because these require less effort and investment than original primary research) because these are more popularly cited. Another critique is that these findings are not reproducible in a stable way (the counts come out differently depending on who is doing the counting, probably partly based on noise in the data and the variant handling of different types of articles).

There are risks of negative externalities as well, such as people writing to the “impact factor” to improve their standing. For example, researchers might ride the bandwagon of hot topics that are exciting but likely faddish. Others may go with lesser open-access and open-source publications in order to be more findable and possibly more citable with their works not behind paywalls. (Worse, some will release their published works on open-access platforms and break copyright in order to try to gain more citations. Some research sharing sites are structured to encourage sharing, but the ethical ones offer easy opt-outs based on copyright—which is often fully purchased by the commercial publishers.)

Reputations, though, are not made by the target individual alone but by a range of individuals in the world. With the speed of communications and the ease of idea sharing and some degree of possible anonymity, people’s strengths and weaknesses are aired.

Another type of journal-level publication metric is the “eigenfactor” which is described as follows:

”A rating of the total importance of a scientific journal according to the number of incoming citations, with citations from highly ranked journals weighted to make a larger contribution to the eigenfactor than those from poorly ranked journals” (“Journal ranking,” May 25, 2017)

In other words, the popularity of the citing journals makes a differences in the value of the citation because more popular publications are more respected and more read. More prestigious publications have lower acceptance rates for articles, so the articles are more valuable. (“Journal ranking,” May 25, 2017)

Another popular publication metric is the “SCImago Journal Rank” (or “SJR Indicator). This is defined as follows:

”a measure of scientific influence of scholarly journals that accounts for both the number of citations received by a journal and the importance or prestige of the journals where such citations come from” (“SCImago Journal Rank,” July 20, 2017)

This one is “a variant of the eigenvector centrality measure used in network theory” (“SCImago Journal Rank,” July 20, 2017). The influence of a journal within its domain is represented in a rank order or hierarchy.

There are a wide range of ways to count and sort and rank…as with all data, and each approach has its insights and strengths, as well as skews and weaknesses.


3. What are common publication metrics for authors and authoring teams? How do they work?

At the author (and authoring team) levels, there are a number of publication metrics as well.

Coined by Alan Pritchard in 1969, “bibliometrics” refers to the “statistical analysis of written publications.” The application of statistical methods to books and other communications media have resulted in citation networks and citation analysis, analyses of learning domains, reader usage of resources, and other applications (“Bibliometrics,” May 9, 2017). There have been studies of how ideas propagate through researchers in respective fields.

One of the most publicly accessible publication metric is from the Google Scholar web search engine, which indexes “scholarly literature” and patents, including both full-text versions and only metadata for some. Google Scholar was released in November 2004. It indexes not only academic writings but also dissertations, theses, conference papers, technical reports, patents, and other works. It reportedly includes some 80 – 90% of the academic articles published in English. The original impetus for this resource was to make problems easier to solve by increasing access to science-based knowledge.

Ability to create individual scholar profiles and to extract an h-index (“h-index is the largest number h such that h publications have at least h citations. The second column has the ‘recent’ version of this metric which is the largest number h such that h publications have at least h new citations in the last 5 years”) and an i10-index (number of publications with at least 10 citations) Risks of over-counting

Back in May 2014, Google Scholar is thought to contain approximately 160 million “unique records” (Orduña-Malea, Ayllón, Martin-Martin, & López-Cózar, 2014, p. 28). Google Scholar enables researcher disambiguation through Google accounts and enables researchers to activate “Google Scholar Citation” by researcher. Within this tool is a capturing of that author’s h-index composite score, which combines productivity measures (number of publications per year over time) and impact (numbers of citations of those works in the indexed literature) (“h-index,” Sept. 19, 2017).

The “h-index” (or the “Hirsch index” or “Hirsch number”) was originated by Jorge E. Hirsch of the University of California San Diego (UCSD). The h-index is captured from the body of work by an individual author, and these can only be compared within-field given the high variance in publishing patterns between unrelated fields (which make them unable to be compared in a valid way). Within fields, it is possible to set average h-scores for tenure, advancement, and so on. As noted earlier, the h-number is based on an author’s number of papers published and the number of times a particular paper is cited in a published work. In other words, this captures the author’s productivity (# of articles published by year) and influence or impact (# of citations per article). The h-index is a citation “where h number of articles have been cited h or more times,” so an author can have a score of “7” (which means that the has at least 7 published articles with 7 or more citations). A higher achieving author would have any h-number higher than 7 as the h-score, and that would indicate that that researcher has fought against a gravitational pull against both productivity and influence / impact to do better. An open-source visual representation of the h-index follows:


HIndexVisualization.jpg


This image of “h-index” was created by AEL 2 and vectorized by Vulpecula and is released in the public domain via Wikipedia at https://commons.wikimedia.org/wiki/File:H-index-en.svg.

Google Scholar Citation profiles provide a descending listing of the most cited works to the least-cited works of the author, a bar chart showing productivity over time, a defined h-index score, and other information. There are links to the cited works as well if these are hosted on a server that has been mapped by the Google Search spiders. These summary views provide a sense of the interests of the respective researchers.

One critique of Google Scholar is that it can be sensitive to “spam” manipulations. Another is that the algorithms have mistaken machine-generated texts as citable articles, according to some researchers (“Google Scholar,” Oct. 3, 2017).

Google Scholar offers other metrics of interest, such as the “top 100 publications in several languages” and the ability to explore particular research areas. It is likely that this research is citable (“Google Scholar Metrics,” Oct. 3, 2017).

Another class of common author-level citation is ”altmetrics” or “alternate metrics” (or “non-traditional metrics”). These metrics are based on social media mentions and “word of mouth” in the Social Age. On the Social Web (Web 2.0), buzz may affect readership (to some degree). By contrast, #altmetrics are to be used in a complementary way to more traditional publication metrics like “impact factor” and “h-index.” These altmetrics are actually built around articles, but can be applied to “people, journals, books, data sets, presentations, videos, source code repositories, web pages, etc.” (“Altmetrics,” Aug. 3, 2017). Altmetrics are available from various entities: Plum Analytics (https://plumanalytics.com/), Altmetric.com (https://www.altmetric.com/), and ImpactStory (https://impactstory.org/), with access to 29.7 million, 5 million, and 1 million papers respectively (“Altmetrics,” Aug. 3, 2017). The metrics that may be accessed may be by views, discussions, saved documents, cited documents, recommended documents, and other measures (“Altmetrics,” Aug. 3, 2017).

There are varying critiques of alternate metrics. One is that these are highly volatile because there are insufficient data sources to provide “a balanced picture of impact for the majority of papers” (“Altmetrics,” Aug. 3, 2017).


4. What are some common ways that those who run publication metrics ensure that these metrics are accurate? What are some ways that people try to bypass the checks in publication metrics? What are some identified “misleading” publication metrics?

There are a number of ways to shore up the value of publication metrics.

  • Use reliable sources for respected publications, and get as close to an N of all as possible.
  • Avoid duplication of sources (or de-duplicate).
  • Differentiate between authors with similar names.
  • Capture all citations even if they are somewhat variant in presentation.
  • Set up the algorithms to correctly capture information.
  • Build algorithms to the real…and to what is available.
  • Test the algorithms for reliability and validity.
  • Update the publication metrics as needed.
  • Control for people’s manipulations and self-grooming behaviors.
  • Go multilingual for better coverage of a field (and related fields) and to be as accurate as possible.
  • Go broadly geographical for better coverage of a field (and related fields) and to be as accurate as possible.

Of course, this is all easier said than done, but these give a sense of ways to strengthen the capture of such metrics.

It is important to describe the methods behind such data…and to encourage human users to correctly understand these when using them.


5. How are publication metrics used by authors to burnish credentials and advance careers? How are publication metrics used by publishers and publications to burnish credentials, promote sales, and attract authors?

At this moment, publication metrics still seem to be fairly new in terms of their usage in professional academic biographies. While the prestige of publications and their influence and number have always had a role in professionals in academia evaluating each other and deciding to collaborate or not, the actual summary scores are just one data point. These may appear in Google Scholar Citations (by individual), and so be broadly available online (if made public), but in terms of curriculum vitae, grant applications, employment applications, and other contexts, these seem to be fairly new and not directly harnessed.


Examples

The WikiJournal of Medicine has made some of its citation metrics available on a public page at https://en.wikiversity.org/wiki/WikiJournal_of_Medicine/Citation_metrics.

There are many other examples on the landing pages and other websites of academic publications (both print and digital).


How To

The general “how to” regarding this topic involves methods for raising one’s publication metrics. A simple answer is to research, write, and publish important work in alignment with the professional ethics of the field. If the publication metrics are set correctly, and people respond to the work and cite it / use it, then the publication metrics will reflect that reality.

As for those who would…

  • self-cite (to inflate citation numbers),
  • collaborate with others to acquire more citations (I cite you; you cite me),
  • join social groups to increase the awareness of their work,
  • select a topic because it is popular and might draw attention,
  • create controversy to draw attention to their work,
  • publish in open-source ways in order to make their work more available (accessible) for citation, and so on,…

these all seem to be fairly superficial and ineffective efforts. It would seem that doing work the right away would benefit researchers and the field in the long run and not introduce unnecessary and problematic noise into the academic research environment. Researching and writing to the metric seems like a waste of time.


Possible Pitfalls

A major pitfall in using publication metrics is to (mis)understand them in an unthinking / unanalytical way. After all, the counts of citations may not really fully get at potential “influence.” (A construct such as “influence” would be most accurately understood as the ability to positively change a field, and the amount of citations might be one small aspect of influence, but it is certainly not all. High numbers of citations may come from a variety of factors—name recognition, being telegenic, having a book being promoted by a large publisher, and other factors—that may have nothing to do with actual and real influence in a field. Real influence on a field may have more to do with the subject of the research and its effect on industries.) Real influence may not be accurately assessed until a number of years into the future, when people in a field can gain perspective on actual contributions. It is hard to know in-the-present what is going to be found relevant in the future. Historically, some contributors to various fields have not been recognized until they had passed the scene. To accurately use publication metrics, it is important to understand the finer points about how the respective metrics are arrived at, how they are used in the field, and how to use them practically.

Technical papers have been created through computational means, such as through SCIgen (created by students at MIT and available online: https://pdos.csail.mit.edu/archive/scigen/). Various conference organizers and publishers have been fooled by these papers, and their reputations have been harmed through the inclusion of such auto-generated nonsense works. Respectable publishers from Springer to IEEE have had to remove over 120 gibberish papers (Van Noorden, Feb. 24, 2014). In other words, auto-generated papers have been out there collecting publication credits and racking up “points” for faux authors and real publications. This cautionary tale is an important one. Reading requires actual attention, and research and writing require hard work…and consideration for relevance and benefit for the larger world. A focus on a summary statistic like an “influence” metric may over-simplify what goes into the work.

Another potential pitfall is that researchers will write to the metric, which may short-change their actual work, which should be based on relevance, the researcher strengths, the researcher access and resources, and other factors.


Module Post-Test

1. What are publication metrics? What are they supposed to represent?

2. What are common publication metrics for academic periodicals and journals? How do they work?

3. What are common publication metrics for authors and authoring teams? How do they work?

4. What are some common ways that those who run publication metrics ensure that these metrics are accurate? What are some ways that people try to bypass the checks in publication metrics? What are some identified “misleading” publication metrics?

5. How are publication metrics used by authors to burnish credentials and advance careers? How are publication metrics used by publishers and publications to burnish credentials, promote sales, and attract authors?


References

“Altmetrics.” (2017, Aug. 3). Wikipedia. https://en.wikipedia.org/wiki/Altmetrics.

“Bibliometrics.” (2017, May 9.) Wikipedia. https://en.wikipedia.org/wiki/Bibliometrics.

“Citation metrics.” (2017, Oct. 8). Wikiversity. https://en.wikiversity.org/wiki/WikiJournal_of_Medicine/Citation_metrics.

“Google Scholar.” (2017, Oct. 3). Wikipedia. https://en.wikipedia.org/wiki/Google_Scholar.

“Google Scholar Metrics.” (2017). Google Scholar. https://scholar.google.com/intl/en/scholar/metrics.html.

“h-index.” (2017, Sept. 19). Wikipedia. https://en.wikipedia.org/wiki/H-index.

“Impact Factor.” (2017, Oct. 12). Wikipedia. https://en.wikipedia.org/wiki/Impact_factor.

“Journal Ranking.” (2017, May 25). Wikipedia. https://en.wikipedia.org/wiki/Journal_ranking.

“Misleading Metrics.” (n.d.) Beall’s List of Predatory Journals and Publishers. http://beallslist.weebly.com/misleading-metrics.html.

“SCImago Journal Rank.” (2017, July 20). Wikipedia. https://en.wikipedia.org/wiki/SCImago_Journal_Rank.

Van Noorden, R. (2014, Feb. 24). Publishers withdraw more than 120 gibberish papers. Nature: International Weekly Journal of Science. https://www.nature.com/news/publishers-withdraw-more-than-120-gibberish-papers-1.14763.


Extra Resources

Orduña-Malea, E.; Ayllón, J.M.; Martín-Martín, A.; Delgado López-Cózar, E. (2014). About the size of Google Scholar: playing the numbers. Granada: EC3 Working Papers, 18: 23 July 2014. https://arxiv.org/ftp/arxiv/papers/1407/1407.6239.pdf.

Publish or Perish (for Mac) (download to be used with Google Scholar). Harzing.com. https://harzing.com/resources/publish-or-perish/os-x/crossover-mac-10 (for Mac and OS X).

Publish or Perish (for Windows) (download to be used with Google Scholar). Harzing.com. https://harzing.com/resources/publish-or-perish/windows (for Windows).