Masked by Trust: Bias in Library Discovery (2019), written by Matthew Reidsma, Web Services Librarian at Grand Valley State University, and published by Library Juice Press, an imprint of Litwin Books, takes the burgeoning literature in critical algorithm studies and applies it to library discovery systems.
About the Author
Matthew Reidsma is the Web Services Librarian at Grand Valley State University and the author of two other books: Responsive Web Design for Libraries: A LITA Guide (published by ALA in 2014), and Customizing Vendor Systems for Better User Experiences: An Innovative Librarian’s Guide (published by Libraries Unlimited in 2016). He has provided various keynotes, conference presentations, and publications on user experience, including several titled “Ethical UX.” You can find him online at matthew.reidsrow.com.
The book opens by explaining algorithms and search engines, using works from critical algorithm studies, including work by Safiya Umoja Noble and Cathy O’Neil. Reidsma provides an excellent overview of this literature and an introduction to issues of design and implementation in algorithms and search engines.
He then applies this literature specifically to library discovery tools where he provides an extensive look at Summon’s Topic Explorer and Research Starters, which presents the searcher with results from Wikipedia, Credo Reference, or Gale Virtual Reference Library. However, Reidsma found that this information is often incorrect, devoid of context, unrelated, and/or biased. Regarding the Wikipedia content presented, Reidsma found that “for the entire life of Summon 2.0 and the Topic Explorer, the Wikipedia content has been frozen with information written before the product was actually available to libraries” (p. 84). He provides thorough examples and screenshots of searches in Summon with incorrect information pulled from Wikipedia, including showing that certain people are still living, that Barack Obama is still president, or missing new works by certain authors.
From here, Reidsma returns to a general discussion of algorithms and search engines to discuss bias, arguing that bias is embedded in algorithms because they aren’t divorced from society. With this framework, Reidsma returns to library discovery tools, specifically Summon, EBSCO’s EDS, Primo, and OCLC’s WorldCat Discovery, opening the chapter with the claim that “[b]ias in library discovery systems is merely the latest example of bias in LIS practices” (p. 117). This is perhaps one of my favorite sentences in the book. In this chapter, he examines bias in the autosuggest or autocomplete features, suggested topics, Topic Explorer, Research Starters, and relevancy rankings. With regard to responses from these vendors, he says that the typical responses are that “[t]hey either block the result. . .or they choose to do nothing” (p. 146). He ends the book with some suggestions to improve these tools, presented under six headings: recognize the limitations of algorithms (p. 148), stop focusing on tools (p. 150), ethnography (p. 153), design (p. 157), infrastructure and staffing (p. 161), and teaching (p. 166).
Overall, I greatly enjoyed the book. It makes an incredibly important and significant contribution to LIS literature, specifically literature that contradicts the notion of library neutrality or objectivity while also further complicating the relationship between libraries and vendors. The book is a fairly easy and quick read. I’d love to see it introduced in LIS courses. For readers with considerable knowledge of critical algorithm studies, limited time, or both, chapters 3 and 5 are the most directly applicable to libraries and librarianship since they deal directly with library discovery systems and the final chapter has useful recommendations for practitioners.
On Bias and Library Discovery
“Essentially, this argument underlies the central premise of the book: bias in library discovery is masked by the trust that libraries engender.”
For a deeper dive into the text, I’m going to focus on the two chapters that I found most rewarding: chapters 3 & 5, which deal specifically with issues of library discovery. Reidsma begins the third chapter by arguing that libraries have claimed that their search tools are neutral, objective, and more trustworthy than tools like Google or Bing and that libraries have taken Google as an aspiration for their discovery systems. Essentially, this argument underlies the central premise of the book: bias in library discovery is masked by the trust that libraries engender. Tangentially, Reidsma also points out that because libraries pay vendors for discovery systems, they avoid the surveillance capitalism of search options like Google, which Reidsma argues, in Chapter 2, is primarily a marketing platform. However, he doesn’t contend with the flipside of surveillance capitalism: personalization (more on this below). He points out a few other differences between library discovery and “general purpose search engines” including that library discovery tends to be focused on a more specific (“vertical”) search, the results are often the same for different users for the sake of reproducibility (another reason for the lack of personalization), and that there are restrictions on the use of library systems because of agreements with vendors (which Reidsma points out as one difficulty with studying these systems—in addition to the usual difficulties of the proprietary black box). Reidsma shows how library discovery tools privilege their own databases when aggregating content and directing users either entirely to support their own content or because of the ease of linking out to their own resources.
The meat of the chapter, to me, is Reidsma’s study examining problematic results that appear in Summon, specifically because of the Topic Explorer. As mentioned earlier, the information being presented from Wikipedia doesn’t update as Wikipedia entries update. For example, Reidsma includes screenshots from Summon stating that Barack Obama was the president (p. 75), Michele Obama was first lady (p. 77), and Donald Trump wasn’t remotely involved in politics (p. 76). While Summon’s world might be one many of us would rather live in, the information provided is entirely inaccurate, providing a primer for patrons that doesn’t provide effective priming. Reidsma’s conclusion is that the creators assumed changes in popular culture wouldn’t be important to academic users: “[r]ight away I was confronted with what appeared to be an assumption about how library patrons will use a discovery system: not for searching for information about pop culture icons (except, perhaps those that [sic] are deceased), but for searches on ‘academic’ topics” (p. 77). He provides a few additional examples before making his final claim that the Wikipedia data was ingested in 2013 and never updated. Following this, he demonstrates how Credo Reference and Gale Virtual Reference Library both also provide inaccurate information either because they aren’t constantly updated (Credo’s entry for Osama bin Laden claims he’s still alive) and aren’t written well for the way they’re presented (the introduction that’s pulled and displayed doesn’t always make sense).
“And ultimately, this is the point: while I expected bias in these library discovery systems, I don’t think I expected to experience it almost exactly the same way that I do via Google.”
Before touching on bias in library discovery (the topic of the fifth chapter), I hope we can all agree that the Dewey Decimal System and Library of Congress Subject Headings have been garbage (despite what men tell me on dating apps), so Reidsma’s claim that bias in LIS isn’t new shouldn’t be shocking. In this chapter, Reidsma provides copious examples to show how bias impacts the results of searches in library discovery systems. He claims that 93 percent of the 8,000 searches he recorded in Summon returned a Topic Explorer result that was somewhat topically accurate. He provides a few pages of examples of searches, the provided result from Summon’s Topic Explorer, and his interpretation of the result. My favorite (and Reidsma’s) is that “’branding’ returns ‘BDSM’”[i]. On the more problematic side, a search for “the bible” in Summon returns a Wikipedia entry for “The Bible and Homosexuality” (p. 120), a search for “rape in united states” returns a Wikipedia entry on “Hearsay in United States Law” (p. 124), a search for “rape in united states” returns on autosuggested search for “race in united states” (p. 124-125), a search for “women in film” returns a Wikipedia entry on “Women in Prison Film” (p. 126), and so on. Reidsma provides endless examples of the ways that the Topic Explorer, Research Starters, autosuggested searches, and other features of library discovery systems reproduce the biases in our society. His examples of autosuggest results for searches on “Muslims are,” “women are,” “immigrants are,” and “Asians are” all seem like something you’ve seen before on Google but something you don’t expect from the library. And ultimately, this is the point: while I expected bias in these library discovery systems, I don’t think I expected to experience it almost exactly the same way that I do via Google. More importantly, these results reflect poorly on us as libraries and librarians even if they’re a result of vendors we work with rather than something internal.
Challenges & Limitations
I had a few minor issues with the book that might appear nitpicky. I do want to reiterate that overall I enjoyed the book and would highly recommend it to others.
In terms of accessibility, I found the text for the book to be quite small. It might be only slightly smaller than what I’m used to, but it was definitely noticeable. I didn’t see an eBook version on the Litwin Books website, but if one becomes available, I’d recommend the eBook for the ability to change text size. Further, in terms of layout, I appreciated that the notes were provided as footnotes at the end of each page rather than having to flip to a section in the back of the book. However, the indexing of the book was strange to me; certain entries were left out, and I’m not sure why. For example, Safiya Umoja Noble, who is cited extensively throughout the book, isn’t included in the index, though Chris Bourg, who is cited considerably less frequently (including omission of her blog post on library neutrality) is included in the index. In terms of decisions that just aren’t my personal preference, Reidsma provides lengthy introductions for many of the authors he cites. For example, Jill Lepore is referred to as “the David Woods Kemper ’41 Professor of American History at Harvard University,” which seems unnecessary to me. At best, this is just tedious to read, but at worst, it’s elitist. While I appreciate his consistency and the differentiation between academic researchers, industry researchers, and industry executives, it occasionally seems over-the-top. Regarding his introductions of people, he also misgenders Erin White when citing them in the conclusion. To segue from the conclusion to the opposite end of the book, I think it could use a better introduction. At times, it’s kind of confusing where the book is going, though the primary claim of the book is apparent from the cover (“Search tools can be biased or spread inaccuracies. Even library tools called discovery systems exhibit bias”). The method of the actual study that he refers to are also somewhat unclear.
For issues that deal more specifically with the claims of the text, Reidsma seems very committed to the idea that librarians think library discovery is like Google, he uses the word “transgenderism” in a way that I think demonstrates one of his own points, and he doesn’t really talk about the work of librarians alongside library discovery.
In a parenthetical addition in the third chapter, Reidsma writes “[t]hat libraries willingly stopped railing against Google, as they had done for a decade or more, and used it as an aspiration for their own systems shows that power of the public’s trust in the search giant” (p. 58). I disagree strongly with most of this claim. He’s partially referring to the way that librarians might call the library discovery system Google for library resources. However, he interprets this claim to mean that we aspire for our systems to be like Google, and he attempts to translate all of Google’s claims about its system to our own claims about library systems. In my experience, this explanation (Google for the library) has been used as a way to explain library discovery by referring to something most people are familiar with (Google). More importantly though, as a librarian, I am certainly unwilling to stop railing against Google. Throughout the book I had a few similar moments where I disagreed with Reidsma’s interpretation or reading of certain claims or results, but I don’t think it hurts his argument.
In relation to the value of Google, Reidsma briefly mentions the use of personalization for general search engines like Google, but he only contends with this to say that personalization would impede reproducibility and that surveillance required for personalization would be against our professional ethics. However, in a footnote, Reidsma claims that “the increasing use of Google Analytics and the tendency to use identifying information about web content in the URL, coupled with universities increasingly moving to Google for email service, means that authorities could probably get even more information about a user’s reading and search habits from asking Google for analytics data than they could from a library’s Integrated Library System (ILS)” (p. 70). Dealing with this might be outside the scope of this book, but I think eventually LIS research and libraries will have to contend with user’s expectations regarding personalization and the impact of this on search relevancy (for the specific user).
One of Reidsma’s examples for bias in library discovery was searching for “transgender” in Summon. In October 2018, the Topic Explorer result was a Credo Reference entry on “Transgender Law.” Reidsma argues that a general Wikipedia entry would be more useful to searchers. In February 2019, the search returns the more problematic result of a Credo Reference entry on “Internalized Transphobia,” which explains internalized transphobia as “discomfort with one’s own transgenderism [sic].” Reidsma then asks “[w]hy show a result like this that automatically bring up issues of shame and discrimination, rather than an entry on transgenderism [sic] itself.” I find this deeply ironic because essentially Reidsma reads the Credo Reference entry that is likely outdated and mimics its language. His own writing provides an example of the harm of relying on these static encyclopedia entries that don’t keep up with the pace of language change.[ii] Further, I disagree with his claim against bringing up issues of discrimination when those issues are so integral to the way certain people experience our society. Making more people aware of discrimination against trans people doesn’t seem like a bad thing, though I do think a general entry on trans people (or as it’s later change, the “Transgender Movement”) is probably more useful to a user searching for “transgender.”
Finally, where are the librarians? Librarians hardly appear throughout the majority of the text except to chastise us as in the claim about Google for libraries above, and we mostly appear in the final chapter on “moving forward.” The value and trust of libraries comes from the labor and care of librarians. While users are certainly able to search the libraries’ resources without any assistance from librarians, we provide instruction, reference, and technical support and more to users to improve their results, and I am troubled by the lack of representation of our labor in this book. Similarly, the way Reidsma writes about bias always feels like it has this undercurrent like it’s possible to get rid of. I find that troubling. I also don’t think this is the way that we should think about libraries and library systems as librarians, which is why I mentioned Chris Bourg’s omitted blog post on library neutrality. I’d like to see us design systems towards a responsible bias or a bias with a social justice bent—a bias that privileges those who are systematically oppressed in our society.
[i] As a note to the reader, Reidsma claims that BDSM stands for Bondage, Discipline, Sadism, and Masochism, which, as the Wikipedia page will inform you, is inaccurate. BDSM is a combination of abbreviations (B/D, D/S, S/M). It’s more accurate to say that BDSM stands for Bondage & Discipline, Dominance & Submission, and Sadomasochism (or Sadism & Masochism).
[ii] My quick search in Google of “transgenderism” returns a variety of right-wing articles, mostly claiming that being trans is a mental disorder, which the -ism suffix suggests. I don’t think Reidsma’s use of the term is meant to align him with those who argue against the validity of trans existence, but again, this is the danger of our biased systems.
Featured image by Mahendra Kumar, via Unsplash
This work is licensed under a Creative Commons Attribution 4.0 International License
The expressions of writer do not reflect anyone’s views but their own