v22 #4 What We Don’t Know We Don’t Know

by | Nov 22, 2010 | 0 comments

by Gregory J. Gordon (Social Science Research Network President & CEO)

Download PDF

Gregory J. Gordon is President and CEO of Social Science Research Network (SSRN), a leading multi-disciplinary online repository of working and accepted paper research in the social sciences and, recently, humanities. In addition, SSRN provides a variety of electronic distribution and conference management services.

Do you read everything in your field today? Do you even know what everything means any more? Readers of scholarly research are faced with an overabundance of information due to interdisciplinary subject areas, access to research at earlier and multiple stages, and simply more research from more scholars. My simple definition of innovation is the ability to create new things by being exposed to a broader and deeper set of existing things, but broader and deeper have their limits. There is no substitute for reading and truly comprehending a specific article, but there aren’t enough hours in the day to read everything. We need better tools to know what research we need to read. We need to know what we don’t know.

While I work with a large number of librarians, scholars, faculty, and administrators around the world, my primary experiences come from helping create and manage the Social Science Research Network (SSRN). SSRN has grown over the past 16 years from an online repository of scholarly research in Finance to a multi-disciplinary scholarly community spanning 20 distinct subject areas in the social sciences and humanities. Our eLibrary database currently has close to 300,000 papers from 140,000 authors. In the last 12 months, we received 56,000 submissions, and users downloaded 8.6 million full-text PDFs.

SSRN supports Open Access and was founded to provide an alternative distribution vehicle for scholarly research, enabling work to be shared as quickly and efficiently at the lowest possible cost — in effect providing tomorrow’s research today. Most content providers are not focused on efficient access to their research. They want to aggregate content and restrict access such that searching becomes a futile exercise in not finding or being able to get what you want. The problem is that this approach doesn’t address the concerns of the author, wanting to be read, or the reader, wanting to know what to read.

Scholarly research is divided into social science and humanities (SSH) and science, technical, and medical (STM), and most of us realize there are core differences between them. SSH researchers have a shotgun blast approach. Looking at this data, I observed X. They observe activities and apply the fundamentals of their discipline to them. They browse the literature looking for trends or patterns that can be applied currently. STM researchers have a rifle shot approach. What cures X? What is the cause of Y? They are searching for an answer to a question. They are often externally funded to address specific questions or problems.

SSH benefits from, and arguably needs, detailed and varied measures because of its overall approach to research. General publication differences between journals in each area and the longer average useful life of a SSH article further heighten these needs. Article-level metrics are several different measures used to evaluate individual articles as opposed to journal-level metrics.

Impact Factor (IF), a citation-based journal-level metric, has been criticized since shortly after Eugene Garfield created the measure in 1955. Despite a few known ways to manipulate this measure, such as increased number of review articles, reduced percentages of citable material, and timing of publication, it is arguably the most important measure in academia today. As Garfield himself noted in 1999:

Like nuclear energy, the impact factor has become a mixed blessing. I expected that it would be used constructively while recognizing that in the wrong hands it might be abused.

A significant abuse is to misuse the IF number to represent all articles published in that journal. For most journals, the 80/20 rule applies, where 80% of the IF is the result of 20% of the articles published. Yet, 100% of the articles receive the benefit of a high IF, decreasing a few articles and raising many others.

While there are several known and very real issues regarding each article-level metric, they provide a broader, more objective view of an article’s impact from different perspectives. Citations, views, downloads, comments, trackbacks/blog posts, social bookmarks, and reader ratings are the more common metrics. They are available from a few publishers but more often from online repositories and openaccess journals.

As discussed in detail below, SSRN provides downloads, citations, and Eigenfactor™ metrics to its users. We are involved in and support the PIRUS2 project, which is working to create standards for certain article-level metrics to be consolidated across multiple organizations. Providing valid, verifiable statistics across a wide variety of organizations is a long road, but creating high quality standards is the critical first step.


Downloads are a more timely indicator of interest than citations, especially for new ideas and younger scholars. The importance of scholarship cannot, of course, be captured by a single ranking, but downloads certainly generate a lot of discussion.

Downloads provide information about scholarly impact in a way that differs from other measures. They are a measure of the number of times a paper has been delivered to an interested party. SSRN takes great care to ensure that download counts are an accurate measure of usage and expends a significant amount of resources to maintain their integrity.

First, we distribute complete abstracts of every paper ensuring that interested readers make informed decisions regarding whether or not to download the full text of a particular paper, rather than uninformed explorations triggered only by a catchy or vague title. A SSRN download starts with the reader visiting the paper’s “abstract page.” Readers who still want to read the paper can then download it. In our and others’ experiences, approximately one out of four abstract views results in a download.

Second, we do not count multiple downloads of the same paper by the same person or machine, nor “robot” downloads. If SSRN permitted a single click to download a paper from another source, such as a search engine or a blog, and counted all mechanical downloads, this would inflate its download counts by a factor that has been increasing over time and is now close to six. This would degrade download counts as a signal of paper quality and substantially increase the ability of users to manipulate them.

In the last several years, download counts have taken on a higher level of importance and are used in a variety of ways. Anecdotally speaking, we are aware of download counts being included in tenure committees’ submission packages, checklists during the faculty hiring process, components of law school annual reviews, and dissertation downloads being used in grant funding evaluations.


As noted above, IF has an inherent 80/20 limitation, and unless citations are provided for a specific paper it is very difficult to predict them. In simple terms, a citation is a reference from one paper to another that helps indicate the influence of the original paper.

SSRN’s CiteReader technology, developed with ITX Corp., scans a full-text PDF file and captures the references found in it. Those references are then verified through a combination of technology and human review. The verified references are parsed into smaller metadata fields and then matched against other articles in the SSRN eLibrary. It not only provides interesting data on who is citing whom and how often, but it also provides a research timeline allowing readers to easily go backward and forward in a subject matter. The References and Citations pages are freely available for the reader to follow the flow of the literature within and across multiple disciplines.

Interestingly, approximately 13% of SSRN’s 3.9 million Citations are linked to working papers within the SSRN eLibrary.


The Eigenfactor™ Algorithm provides a methodology for determining the most important or influential authors and papers in a network. The algorithm computes a modified form of the eigenvector centrality of each node in the network under the basis that important nodes are connected to other important nodes. This is the basic concept behind Google’s PageRank algorithm.

Eigenfactor™ Scores have previously been used to rank scholarly journals, and the scores are freely available at http://www.eigenfactor.org.  Within SSRN, we use article- level citation data to extend the Eigenfactor ™ Algorithm to the author level and will apply it to the paper level in the near future. CiteReader calculates the number of times each paper in the SSRN eLibrary database has been cited by other papers in the eLibrary. This data is then used to construct an author citation network, where each author is a node.

At a more technical level, the Eigenfactor™ Scores can be seen as the outcome of two conceptually different, but mathematically equivalent, stochastic processes. The first process is a simple model of research in which a hypothetical reader follows chains of citations as she moves from node to node ad infinitum. An author’s Eigenfactor™ Score is the percentage of the time that she spends with this author’s work in her random walk through the literature.

The second process is an iterated voting procedure. Each author divides one vote equally among those authors she cites. In subsequent rounds, each author divides her current vote total, as received in the previous round, equally among those authors whom she cites. This  process is iterated indefinitely until we reach a steady state where the number of votes doesn’t change. An author’s Eigenfactor™ Score is the percentage of the total votes.

A more detailed discussion of Eigenfactor™ usage within the SSRN Community is available at: http://papers.ssrn.com/abstract=1636719.

There are numerous methods for determining which articles you should read, and they have varying levels of success. Article- level metrics, especially in SSH, provide the best opportunity for finding the latest, most impactful research. For example, you can use downloads when you need currency, citations for more established areas, and Eigenfactor™ for broader impact on a community. No one measure is perfect, and having a variety to choose from will allow you to use the best one in each situation. Approaching any measure with a reasonable degree of skepticism and minimal amount of cynicism is also a good thing.

When I think about the benefits of article-level metrics and the focus in many circles attributed to IF I remember a quote from Max Planck:

A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.

Or as a scholar reminded me the other day, new ideas progress forward funeral by funeral …

Sign-up Today!

Join our mailing list to receive free daily updates.

You have Successfully Subscribed!

Pin It on Pinterest