Bibliometrics case study

An innovative example of the use of bibliometrics in digital libraries is an evaluation study conducted by Bollen and Luce (2002). In an attempt to quantitatively evaluate the impact of a digital library's collection and services, and how well the collections and services addressed users' needs, Bollen and Luce used transaction log data to examine document relationships. They found that, by examining users' retrieval patterns, they could generate a community-specific measure of document impact. Specifically, they were able to determine which documents in the digital library were viewed as similar by users, and which documents were most frequently retrieved.

Before describing the method used by Bollen and Luce (2002) to generate a measure of document impact on a digital library user community, some of the ideas underlying their approach must be clarified. Here are the assumptions underlying their approach:

  1. When two documents are retrieved in close temporal proximity, they are said to be co-retrieved.
  2. Two documents would be co-retrieved because there is some level of similarity between them.
  3. The strength of the relationship (similarity) between documents can be determined by the frequency with which the documents are co-retrieved by a community of digital library users.
  4. Each time a given pair of documents is co-retrieved, the weight (strength) of the relationship between them can be increased by a small amount. The weight between pairs of documents is indicative of the degree of similarity between the documents as perceived by the community of users.
  5. Document network maps can be constructed from the generated document weights. These networks can be analyzed to generate measures of document impact, such as the Journal Consultation Frequency (JCF), which is a measure based on patterns of usage rather than on frequency of citation (which is of special value in a digital library because it can include documents of various languages or media types).

With an understanding of the above ideas and assumptions, generating document relationships for user transaction logs is fairly simple:

  1. Define what qualifies as a co-retrieval event. (Bollen and Luce (2002) defined a co-retrieval event as “a pair of sequential retrieval requests for a pair of documents by the same user within a given period of time.”)
  2. Sort your transaction logs by time and IP number. (Co-retrieval events can be reconstructed from your transaction logs once you have sorted your transaction logs by time and IP number.)
  3. Generate a table of co-retrieval events. (Once your data are sorted by IP number and time, you can determine which events are co-retrieval events, i.e., those transactions whose date and time stamps differ by less than a pre-specified quantity. Bollen and Luce (2002) used a value of 3600 seconds.)
  4. Generate weighted document relationships. (You can do this by increasing the relationship weight between co-retrieval documents by a small amount (r), every time they appear as co-retrieval events.)
  5. Calculate document impact. (Document impact can be calculated using the JCF measure. JCF is the sum of the number of connections from other documents to the specified document (X) added to the number of connections from X to other documents in the library.)

Bollen, Luce, Vemulapalli, and Xu (2003) describe another application of this approach. They make a convincing argument for its utility as the developers of digital libraries face difficult decisions about acquisitions, especially when resources are tight.