by Todd Carpenter (Managing Director, NISO, One North Charles Street, Suite 1905, Baltimore, MD  21201;
Phone: 301-654-2512;  Fax: 410-685-5278)  <>

How we determine whether two things are the same or different depends on how we define “same.”  This question dates back to Aristotle and Plato and the differences between universal forms and instantiated forms of objects, and understandings of how items are grouped together in classification systems.  Without getting too deeply philosophical, how we group and classify things is at the heart of librarianship.  It is also core to the question of identification and description in the context of published information.

In our digital world, producing copies of an item is as easy as pressing the <F12> keys to “save as” on a PC, and distributing that information worldwide is only a matter of saving that item to a Web-accessible server.  We desperately need a common understanding of the differences that might exist between the original file and the copy.  However, to do so with every minutely changed file — even to the level of its creation metadata, e.g., the newly saved file’s date of origin is different from the original’s — is an unmanageable and sometimes unnecessary task.  In an era where duplication is easy, managing versions most certainly is not.

One of the principles of determining the differences between items is to consider their functional equivalents, a concept that has its origins in literature translation, but was developed into a metadata theory described in detail in Godfrey Rust’s and Mark Bide’s <indecs> Metadata Framework (  In this context, the distinction between when it is useful to identify a thing as being different from another thing should be done only when it is useful to do so.  By focusing our attention on when it is valuable to maintain separate “records” of a version change, it addresses some of the problems of limiting the scope of the problem.  For example, we needn’t design systems to track every possible change, if doing so is not something that people derive value from.  For example, there may be multiple draft versions that an author might write, but for the overwhelming majority of users and uses, those versions are not useful, nor are they worth the expenses of identifying, describing, and preserving them.

This issue formed the basis for the joint project between NISO and ALPSP on Journal Article Versions, led by Cliff Morgan at Blackwell-Wiley and Bernie Rous at ACM, which resulted in a NISO Recommended Practice on Journal Article Versions (JAV) (NISO-RP-8-2008) ( In addition to the recommended practice, a detailed article about the JAV project was published in this magazine in January 2007.  ( The recommendations consisted of these seven stages that correspond to stages in the publication process:

AO = Author’s Original
SMUR = Submitted Manuscript Under Review
AM = Accepted Manuscript
P = Proof
VoR = Version of Record
CVoR = Corrected Version of Record
EVoR = Enhanced Version of Record

The rationale for choosing these stages was that each provides unique contributions to the content by one or more players — creator, editor, or publisher — in the scholarly publication process.  There are a variety of sub-stages that could also be included, but the JAV working group excluded other stages because they did not add substantial value over the previous stage and might be too complicated to clearly identify.  For example, once a manuscript is submitted, it might go through a series of revisions and resubmissions prior to being finally accepted.  The differences among the first, second, or various other iterations of a paper might not be significant enough to track, and not all papers will go through multiple revisions.  Some papers could go directly from author’s original (AO) to accepted manuscript (AM) without any changes, or even straight to version of record (VoR).  While metadata structures for each of the stages were discussed, it was considered out of scope for the initial group.

Other organizations have also considered the issue of journal article versions and have added their own perspectives.  The Joint Information Systems Committee (JISC) in the UK conducted a Scoping Study on Repository Version Identification (RIVER) followed by the Version Identification Framework (VIF) project.  One important distinction between the NISO/ALPSP work and the JISC work is that the former focused solely on the issues surrounding journal article versions in the scholarly publication chain, whereas the JISC work focused on broader issues of a variety of content objects that primarily resided in repositories, although they often had been submitted for formal publication as well.

The initial study funded by the JISC — The VERSIONS (Versions of Eprints – a user Requirements Study and Investigation of the Need for Standards) — was conducted by the London School of Economics and Political Science and the Nereus Consortium of European research libraries in economics.  The goal of the study was to “address the issues and uncertainties relating to versions of academic papers in digital repositories.”  The recommendations included a set of definitions for: Draft, Submitted Version, Accepted Version, Published Version, and Updated Version.  These track closely with the NISO/ALPSP recommendations and were issued at approximately the same time.  The final toolkit from the project ( included suggestions for authors and repository managers to improve the identification, use of, and recognition of version terminology.

The RIVER ( project developed another recommendation based on a set of use cases that included a variety of content forms, such as learning objects, digital images, wikis, documents, software, data files, or search results in a repository.  The project outlined various scenarios where the content could be collocated, might need to be disambiguated, or might need version control.  The project, after reviewing the use cases and existing industry practice, put forward a number of data elements that might be used to identify versions of the content in question.  The project group recommended some follow-up work, which was done in the subsequent Version Identification Framework (VIF) project.

The VIF project (, which built on both the VERSIONS and RIVER projects, looked at the broad range of content forms that researchers use in their work and how that content is managed and stored, with an eye toward the role of repositories in that process.  The project team conducted surveys of how researchers, teachers, students, and others are using digital objects and managing their personal digital resources.  The project focused a lot of attention on the workflow issues of creation, revision, dissemination, storage, and especially the issue of how versioning terminology can be integrated into version control.  The team then produced a framework for version control, which included information both on metadata that should accompany and be embedded in objects to maintain good version control.  The Framework developers identified “Essential Versioning Information” consisting of: Defined Dates, Identifiers, Version Numbering, Version Labels or Taxonomies, and Text Description.  Other information was recommended to be embedded within an object including: ID Tags and Properties Fields, a Cover Sheet, Filename, and a Watermark.

While these different structures and approaches have different specifics, the core of the problem remains clear: users have to be able to understand the differences between different instances of what appears to be the same content.  At their core, the different structures proposed by the JISC and the NISO/ALPSP recommendations are not so dissimilar as to require much distinction.  Where the JISC has pushed forward is in developing a more robust system, extending beyond journal articles into other content forms.  The VIF project has also proposed a more robust metadata framework, which will be particularly useful.  As with all standards projects, pushing the adoption of these recommendations in the community and making them lingua franca among the scholars who use these content forms are the biggest challenges.  Hopefully, as more attention is focused on the issue, researchers and systems managers will adopt the existing terminology and require the necessary metadata to ensure clarity.