Nearly ten years ago, in November 2004, Elsevier released Scopus, “the world’s largest abstract and indexing database,” intending to provide citation information on the full spectrum of science-technology-medicine (STM) literature and more limited coverage of the social sciences (SS), arts and humanities (AH). Today, March 27, 2014, Elsevier will be publicly announcing the launch of an expansive upgrade to Scopus that will significantly bring Scopus and citation research to new levels of depth, breadth, and functionality.
Today, Scopus is clearly the world’s largest subscription abstract and citation database for peer-review literature, with cover-to-cover coverage of more than 21,000 titles from more than 5,000 publishers from across the globe. “Over the course of 10 years,” Cameron Ross, Elsevier Vice President of Product Management for Scopus explains, “Scopus has increasingly become the abstract and citation database of choice, not only among researchers but also among those who evaluate researcher performance and the impact of scientific output.” Elsevier is clearly investing massive resources into adding new depth and strengthening the coverage in areas of weakness with two major efforts—one focused on adding archival content into the database and the other using full-text book content that will rival the value of the Google Book project for mining citation data from monographs.
The Backfile Project
Many critics have questioned the lack of depth to the Scopus database, which has only gone back to 1996 with any certainty. Journal coverage has long been an issue for researchers, leading to a strong perception that the SS and AH are more ‘fringe’ elements of Scopus. Taking the Top Level Social Sciences journals as a set, as an example, looking at just the “A” titled journals, I identified those with 1996-ongoing or longer coverage (going back further). Of the 885 total journals, only 280 met this criterion, which would be just 31.6% of the journals. Often, looking through the journal list for Scopus has been depressing, finding results like:
Abacus 2002-ongoing, 1996-1999, 1993, 1984-1987
Academic Leadership 2005-ongoing, 2000-2003
Why is this? Dr. Wim Meester, Scopus’ Senior Product Manager for Content explains that, “some scattered coverage is due to the fact that some issues of titles were available in databases already existing at Elsevier, and these databases may not have had the policy of comprehensive indexing. For example, a database like Embase, which is focused on health care, may not cover every article in every publication if those articles are not relevant to medical research.”
The backfile project, actively promoted by Meester over the past year, will bring the journal files back to 1970. “It is the general policy of Scopus to cover journals from the year of selection and going forward. We have begun to contact publishers to begin to see if we can get those archives if they are available, so we can backfill some of these issues.” This has value across the disciplines, but especially in the social sciences, arts and humanities—fields in which information and developments evolve at a slower rate of change than in technology-driven scientific areas.
“We are really excited about our Backfile Project, which we started this year and will take about three years to complete,” Meester revealed exclusively to ATG this week. “Currently in Scopus there are no cited references for articles before 1996 and we will re-process everything in the database back to 1970 in order to add all of those cited references to the content in Scopus. By doing this we want to be sure we have accurate citation information for all the content in Scopus and that all author profiles in Scopus have accurate information, that the H-index gives us what we would expect given this enhanced database.”
“To give you an idea of the size of this project,” Meester continues, “we estimate that we will need to re-index 8 million documents in Scopus—that is on top of the 2.5 million documents that we add to Scopus each year.” Some journals will be backfiled even further than 1970—however, just going to 1970 for such a huge core of journals is a major advance.
Having this larger, more robust dataset to work with, will allow for more accurate assessments and trends over time. This will also add more comprehensiveness to the author profiles and H-index data. The first set of backfile records are expected to be apparent by the fourth quarter of this year. This additional content will be provided with no added charges for subscribers.
Scopus Books Expansion
Scopus has always included some book series, but Elsevier is embarking on an unprecedented expansion of its monographic coverage as well: The Scopus Books Enhancement Program. “Adding book series is really the same as journals, because they are serial publications,” Meester reports. Adding individual book titles presents challenges. “We try to index the information and the abstracts, if they aren’t there then we use whatever metadata is available, including author information, affiliation, cited references. In order to do this, we need the full-text of the books in order to do the cited references, footnotes, or notes imbedded in the text. All of this is added to the record in Scopus. If we aren’t able to determine if a citation is to a particular chapter, then we link it to the book as-a-whole.”
The book expansion program is also a massive investment at no added cost to subscribers. The basis for selection of books is the publisher—and the list of partners in this effort is very impressive, for example:
Delft University Press
Edinburgh University Press
Hong Kong University Press
Johns Hopkins University Press
Penn State University Press
Polska Akademia Nauk
Purdue University Press
Rutgers University Press
University of California Press
Walter de Gruyter
“If the publisher is selected, we will index the full catalog of that publisher,” Meester affirmed. “We have 30 publishers for which we are now doing the full catalog of their titles. Whether from a conference paper, journal, or book, we always create the Scopus records from the full-text of the document. There is no full-text in Scopus; we only use it in creating the records in the database. We have arrangements with the publishers to use the full-text in order to create the metadata and then the full-text is discarded or returned.” The project is expected to add 75,000 new book titles to Scopus over the next three years. The first book records were uploaded in the third quarter of 2013.
Published selection guidelines for the book expansion include the following:
- Reputation and impact of the publishers
- Size and subject area of the books list
- Availability and format of the book content
- Publication policy and editorial mission
- Quality of published book content
The project is focusing on books published since 2005 (or 2003 for A&H titles). This will include monographs, edited volumes, and graduate level textbooks, but not dissertations, undergraduate level texts, atlases, yearbooks, biographies, popular science titles, or manuals. In the past, web pages and institutional repositories were included in Scopus. “When Elsevier made the decision to retire Scirus,” Elizabeth Dyas, Senior Product Marketing Manager for Scopus, reports “the free science search engine, we removed this content from Scopus.”
“The new content enhances the search, discovery and evaluation of book-based disciplines in the social sciences and A&H and increases the breadth and depth of Scopus’ coverage globally,” Scopus reports. “Researchers and librarians will be able to create book-based citation analyses to accurately reflect and analyze output. Books content enhances the power of Scopus searches, fosters interdisciplinary collaboration, and helps administrators take a more holistic approach to the evaluation of disparate disciplines.” Scopus is not currently producing any sort of journal metric or citation index for books.
The Scopus Marketplace Today
“We have about 3,000 customers globally today,” Dyas disclosed to ATG, “and about 80% of these are in academic and government organizations and the other 20% are in corporate areas. Market penetration varies globally due to the market dynamics in each region. For instance, in countries like the U.S. and China, universities tend to buy more in a decentralized manner; than countries like Italy and Spain who have country-level consortia set up to purchase research solutions such as Scopus or ScienceDirect. In the QS Top Universities ranking, of the Top 25 ranked academic institutions internationally, 88% are Scopus customers, and in the top 10 we serve 90% of these institutions. In the past few years, we’ve been able to sign on many countries —especially in Europe—to use Scopus data for national assessments, including the United Kingdom, Australia, Portugal—are all using Scopus data for their assessments. The U.S. doesn’t have this type of structured national research assessment (though there are national funding bodies such as NIH), so the market there is a bit more distributed by institution.”
Elsevier has no intention of developing Scopus or subsets to serve specific corporate niche markets. However, that’s not to leave out adding specialized resources to Scopus or other Elsevier products, as Dyas notes: “Scopus’ mission is to be the most recognized and used broad-based, multi-disciplinary search and evaluation tool for professional researchers and information specialists at academic, government, and corporate institutions. Elsevier actually has a variety of other products like Compendex, Embase, and MD Consult; so we aren’t specifically looking at the health care market. We are certainly looking at expanding the types of content in order to serve specific research domain needs and other markets broadly.”
“With respect to business model,” Dyas continues, “our pricing model takes into consideration geographic location, research output, and the number of researchers at a particular institution. Elsevier is also a founding member and active participant of Research4Life, a public-private partnership between UN agencies, universities, and publishers, which reduces the knowledge gap between developing and industrialized countries with free and low cost access to critical scientific research. We make all of ScienceDirect and Scopus—including over 2,000 journals and 6,000 books—available through Research4life.”
Scopus uses a Content Selection and Advisory Board, “an international group of scientists, researchers and librarians who represent the major scientific disciplines” to review all suggested titles for Scopus and “works with the Scopus team to understand how Scopus is used, what content is relevant for users and what enhancements should be made. The recommendations of the CSAB directly influence the overall direction of Scopus and the prioritization of new content requests to ensure that Scopus content stays international and relevant.”
Product Integration & Other Issues
Loet Leydesdorff noted in a recent letter to Journal of Informatics “the thin lines between bibliometrics and commercial objectives” by noting that, “such thin lines may have become structural for innovative systems but may also set limitations to the possibilities to change a new version of a previously introduced indicator.” He was talking about the revised SNIP indicator; however, his concern about “black-boxing” of data is significant. When the metrics or processes aren’t clearly available, trust can become an issue.
“Transparency is very important to Elsevier,” Meester points out, “and we try to be as open as possible. The methodologies for our journal metric calculations (SNIP and SJR) are developed by third parties and are published in the journal literature. The dataset used for the calculations is the actual Scopus database, which makes it very transparent on how the values are derived. It is possible for anyone to do their own metric analysis from the data in our database, to recalculate the indexes that we present. Also, the SNIP and SJR journal metric values are publicly available and for free. Everybody can look-up a journal metric value (for current and previous years) for all journals included in Scopus. By doing this we make the journal metrics in Scopus as transparent and possible and give ours users the freedom to use this information in a way that works best for them.”
Elsevier was one of the founding partners of ORCID and is proud of their participation in efforts to make author identification less ambiguous. “Since the ORCID launch in October 2012, we, as a company, have continued to add ORCID functionality to Scopus and other products at Elsevier. At the launch in October 2012, we launched an ORCID-specific version of our Author Feedback Wizard—a free tool that lets authors give us feedback on their author profiles. This is a free layer of Scopus, if you Google “Scopus free author lookup” you will get link to where anyone can look up their profile and can give us feedback on their profiles. We did a version of the Feedback Wizard for ORCID so that people can actually update their Scopus profiles at the same time they are creating ORCID profiles, and then import their Scopus records/articles into the ORCID profile. Also, if you go to Scopus author profiles, in the right hand column you will see an icon to ‘Add to ORCID.’ In our next release in May, you will see ORCID IDs on the Scopus profiles. Additionally, Authors can also enter their ORCID data when submitting articles to Elsevier journals as well via the Elsevier Editorial System (EES). Getting this into the metadata at the time of article submission is really critical to the success of this industry-wide initiative.”
The recent acquisition of Mendeley offers more options for integration of research workflow for Elsevier. “In the original release of Scopus ten years ago,” Dyas reveals, “our developers and product team followed the principle of user-centered design to make the process of research easier and integrate into the researcher’s workflow. While have no definitive plans for mass integration, it is a basic tenet of our product development that we want to integrate our products in a way that makes sense for the researcher (user) workflow. Our product team(s) continue to focus on researcher needs and user-centric design and we continuously look at how we can optimize a researcher’s workflow. Many of the tools Elsevier has recently purchased are directly related to this focus on the researcher. We won’t be incorporating all of these great features into Scopus, we will integrate where we think it meets a user need.”
“For example,” she continues, “with Mendeley we have added features in the last six months in ScienceDirect and Scopus. From ScienceDirect and Scopus users can now export directly into Mendeley in one simple click using “Save to Mendeley” functionality. In Scopus we have also added Mendeley as one of the direct export option and we have also added Mendeley readership statistics to Scopus. When you go to the document details page, if that document has been saved to a user’s Mendeley library, a little widget shows up in Scopus showing you how many times the article has been saved and the basic demographics of those who have saved it. These are examples of the types of integration that our clients and users will see with our products.”
“Related to exporting,” Meester continues, “you can mark records in Scopus and directly export to Mendeley, and I believe the limit to that (determined by Mendeley’s API capacity) is 1,000 records at a time. We have now increased the export limits in Scopus to 20,000. We are listening to customers’ needs and have expanded the export limits based on that feedback. There are limits in terms of the pressure points in the database’s performance, but we are now at 20,000 and doubt that many would have problems with this now (20,000 is approximately the annual output of a large university). Previously it was 2,000 for Scopus.”
Competition is a Good Thing
“Thomson Reuters has been a sole source provider of citation data for years,” notes Informed Strategies consultant Judy Luther, “and Elsevier is one of the few publishers large enough to invest in developing a competing product. Building a citation database is an expensive proposition and requires access to the world’s leading literature as well as a deep backfile. The Citation Indexes have come under fire based on researchers’ concerns about misuse of Thomson’s Impact Factor (DORA is one example). Having alternatives to the Citation Indexes provides an option that addresses some of the limitations of the Citation Indexes. For example, Scopus offers two new metrics SNIP and SJR. SNIP is a metric that is relevant across all disciplines and is not skewed based on the volume of literature (i.e., an Impact Factor (IF) of 30+ in life sciences that is 3 times an IF in the Arts and Humanities. SJR weights citations based on the status of the citing journal as a measure of prestige.”
The expansion of coverage to more global resources is also a key advantage for many users. Tomaž Bartol, on the faculty of the University of Ljubljana in Slovenia sees great advantage to the Scopus approach: “In my country, however, these databases are now used mostly for the purposes of metric evaluation of researchers (authors) even though this may not have been the original purpose. But now, it is all about publish and perish. On top of that, if our articles don’t receive any citations by other authors then such articles are considered inferior. Scopus was introduced in Slovenia mostly with the aim of giving an additional possibility to scientists in the social sciences and humanities in order to gain ‘metric points’ which are then used in our national system of evaluation (Scopus seems to cover better these research areas). Scopus-based approach in Slovenia is still in an early phase as the database has just been introduced. Many problems remain, for example the evaluation of scientific books and proceedings.” The need to address the larger issues of national assessments is something that Scopus has clearly taken a strong lead.
Hitting a Home Run
Today’s announcements and ongoing development represent a major milestone for Elsevier and Scopus. Clearly the investment required to expand the core database so massively, with an ongoing commitment to use the full-text of books, journals, and other materials to find cited references wherever they may be—in footnotes, text, or reference lists—is a major step forward for citation analysis. Adding 10,000 new books each year (in addition to the 75,000 of the current project) at this level of analysis is stunning. The efforts to backfill the more than 20,000 journals in the database in just a few years is a monumental task—one that is achievable given Elsevier’s deep pockets and aggressive business style. Even long-time critics (such as myself) need to give the database a re-evaluation. The only remaining question is that now that the gauntlet has been thrown, how will Thomson Reuters and others respond?
Nancy K. Herther is Librarian for American Studies, Anthropology, Asian American Studies & Sociology, University of Minnesota, Twin Cities Campus. Her email is email@example.com.
Tom is originally from Brooklyn N.Y but has spent his entire professional career in South Carolina, most recently as Head of Reference Services at the College of Charleston. As part of the Against the Grain and Charleston Conference team, he serves as the associate editor of the print ATG as well as the co-editor of the webpage. Tom’s conference duties include coordinating the Penthouse Suite interviews as well as the conference poster sessions.
He received his MLS from the University of Buffalo, SUNY and a second master’s in public administration from the College of Charleston and the Univ. of South Carolina. His wife Carol and he live in downtown Charleston and she is an artist and a tour guide offering historic walking tours of the city.