The role of the “Listen in: Discussions from/in the field” column is to highlight interesting, exceptional, or provocative research published in LIS literature. This year we hope to emphasize a variety of research methods and the experiences of those often ignored in LIS literature.
Academic Libraries + Patron Privacy + Google
As a stated value of the American Library Association and a grave concern of millions of people world-wide, the topic of privacy has recently been given a lot of attention. For this reason, the work of Patrick O’Brien, Scott W.H. Young, Kenning Arlitsch, and Karl Benedict in their article Protecting Privacy on the Web: A Study of HTTPS and Google Analytics Implementation in Academic Library Websites is especially timely for librarians. The authors assert that libraries in the US and abroad are likely violating their own privacy ethics through the use of software that tracks users in order to assess the efficacy of library websites and platforms. While their study only takes a look at the websites for academic libraries, their findings can easily be applied to public libraries, museums, digital libraries, government websites, and so many others.
In their study of the home pages of 279 academic libraries in the US, Canada, the UK and others, the authors used a webometrics methodology to answer two research questions:
Question 1: Do libraries implement HTTPS with proper redirect practices?
Question 2: Do libraries that use Google Analytics implement the available privacy-protection measures?
Answers to these questions would give the authors an idea as to how many academic libraries are using Google Analytics and possibly sharing patron data with 3rd parties and of those libraries, which were doing something to protect patron privacy.
The interpretation and conclusion of the Library Bill of Rights, in regards to privacy, states that “The American Library Association affirms that rights of privacy are necessary for intellectual freedom and are fundamental to the ethics and practice of librarianship.” Essentially, privacy is necessary for library patrons to read, write, and research freely. Because of that, ALA recommends that:
Libraries should not share personally identifiable user information with third parties or with vendors that provide resources and library services unless the library has obtained the permission of the user or has entered into a legal agreement with the vendor. Such agreements should stipulate that the library retains control of the information, that the information is confidential, and that it may not be used or shared except with the permission of the library.
This policy, written in 2002 and amended in 2014, likely could not have anticipated how ubiquitous Google – and its services – would become. As libraries attempt to better know their users and create digital platforms that better serve them, librarians have turned to services like Google Analytics, a free service that tracks visitation to a website. O’Brien et al explain that on its own, Google Analytics will not leak user information across sites, but when combined with other services like Google AdSense, Doubleclick tracker, or Google Tag Manager, possibilities for tracking user data increases significantly (O’Brien et al, 735). The authors provide an example of how this tracking works:
A user logs into Gmail and then visits a library website that has implemented Google Analytics or Google Tag Manager. This user then searches for tax relief resources through the library website. Because Google 1) identifies and authenticates users via their Google IDs and passwords and 2)identifies and authenticates the library website through Google Analytics or Tag Manager, Google can link users’ library website activity to individual users’ Google profiles. Depending on the library’s Google implementation, this user activity may also be shared with Google’s advertising network, which targets users with personalized ads, such as credit cards or personal loan services, even after the user has left the library web site. (O’Brien et al, 735).
This case exemplifies the reality that Google tracking, and therefore the sharing of user data, can happen whether the library is using Google Analytics or not, because Google itself is certainly using their own services. When a library is using Google services, this possibility for personally identifiable information to be tracked and shared increases, possibly more than librarians anticipate. This information can be anything from search terms used by patrons, user-agent software, geolocation, to time of day (O’Brien et al, 736).
Because library websites are just as likely to allow tracking of user data, due to its own use of Google Analytics or others’ usage, there are steps that libraries can take to minimize and reduce the amount of data collected. This is why the authors of the article were particularly interested if library homepages implemented HTTPS and other privacy protection protocols. By properly configuring library homepages, HTTPS provides an encrypted connection between the user’s browser and the library website. For libraries who use Google Analytics, a patron’s privacy can be further protected by selecting options for IP Anonymization.
The authors conducted their privacy audit of 279 academic libraries through the webometrics methodology, which they defined as “covert observation research” of “information structures publicly hosted on machines” (O’Brien et al, 739). Given the objects of the study are information infrastructure, they note that there were no ethical concerns from this form of observation research. Their research questions were two fold. First, they wanted to discover the HTTPS redirect practices of academic libraries. Second, for libraries that made use of Google Analytics, they wanted to find out how many made a secure connection between their website and Google’s servers. They chose their sample size by limiting to libraries with memberships to any of three professional organizations with mission statements focused on research, and with over 100 members. The three organizations selected were Association of Research Libraries, OCLC Research Library Partnership and Digital Library Federation. The final sample size included 279 libraries, 448 unique URLs, and 16 countries.
To find out whether a website offered a secure HTTPS connection, they checked for digital certificate by requesting pages with https:// connections. A website had a secure connection if the resolving URL still contained “https://”. They checked for automatic secure and non-secure redirects – when a http:// connection request resolved to a https:// URL, and vice versa.
To study the libraries’ use of privacy measures within Google Analytics, they first analyzed whether web pages used Google Analytics by looking for the Google Analytics tracking code, or the Google Tag Manager tracking code on the page’s script. For websites that did, they determined whether the connection between the library website and Google’s servers was secure by looking for specific codes on the page: “forceSSL” or “www.googletagmanager.com”. This would determine the presence of a HTTPS connection. Finally, they looked for the code “anonymizeIp” to see if libraries had implemented the anonymization feature of Google Analytics that allows for IPs of users to indicate a geographical region, without identifying a specific user.
Overall, the authors found that libraries could do more to implement privacy practices on their websites. Only 62% of libraries had implemented HTTPS at the time of study, and only 32% had permanent redirects to HTTPS. As many as 15% actually had permanent redirects from HTTPS to HTTP. Many libraries in the study were thus putting the patron’s privacy at risk, without their informed consent, of Man-In-The-Middle attacks.
Of the 88% who had implemented Google Analytics, only 1% had implemented a secure connection between the library website and Google analytics. While 14% had implemented IP anonymization procedures, 0 libraries had implemented both secure connection AND IP anonymization for Google Analytics.
“The authors note that practices can easily be improved to protect privacy, or at least, that libraries have a duty to inform their patrons of the risk, if an increase in privacy protection cannot be achieved.”
This study clearly highlighted the room for growth in libraries’ protection of patron privacy. The authors note that practices can easily be improved to protect privacy, or at least, that libraries have a duty to inform their patrons of the risk, if an increase in privacy protection cannot be achieved. The authors make five recommendations for libraries moving forward: HTTPS for all library web servers, and between the library’s servers and Google’s servers, IP anonymization, user education, informed consent, and risk/benefit analysis of third party services. They note that a risk/benefit analysis is important because using third party tracking services might pose a threat to libraries’ value of privacy and intellectual freedom.
“Even with all of the recent attention given to internet privacy, ALA’s own website does not provide a secure, encrypted HTTPS connection, which is one of the most basic steps they could take on behalf of the user.
While the sample size of the study was relatively small and exemplified only a subset of the vast array of academic library websites, it exposes the distance between pledged values and the reality of practice. It is important to note that the sample for this study was collected in 2016 and since, with the controversies surrounding privacy concerns on Facebook and other social networks, websites of all types have greatly improved their privacy policies and user education. Most websites, when visited for the first time, will require the user to agree to a “cookie” policy and provide links to privacy policies and terms-of-use pages. Even with all of the recent attention given to internet privacy, ALA’s own website does not provide a secure, encrypted HTTPS connection, which is one of the most basic steps they could take on behalf of the user. Now, 20 years after the ALA and librarians all over the US stood up to section 215 of the Patriot Act in an effort to protect patron privacy, it is time we look inward at our own institutions and decide if we are really doing enough.
Keep the Conversation Going
- What infrastructures are most suited to enforce privacy enhancing protocols on library websites? Since the current associations’ statements do not seem to have impacted the practices of their member libraries, what is preventing the switch?
- Are you aware of the privacy practices of the catalog your library uses? What protocols are in place? What could be improved?
O’Brien, P., Young, S., Arlitsch, K., & Benedict, K. (2018) Protecting privacy on the web: A study of HTTPS and Google Analytics implementation in academic library websites. Online Information Review, Vol. 42 Issue: 6, pp.734-751, https://doi.org/10.1108/OIR-02-2018-0056
- Python scripts used for analysis: https://doi.org/10.5281/zenodo.1323403
- ALA Privacy Toolkit: http://www.ala.org/advocacy/privacy/toolkit
- Library Freedom Project: https://libraryfreedom.org/
- A National Forum on Web Privacy and Web Analytics: https://osf.io/gnfpu/
About the Authors
Charlotte Brun is a Public Services Assistant at the King County Library System in Washington State. She is passionate about information access, critical information literacy in academic and public settings, feminism in the library and social justice. Currently transitioning into the public sector from academia, Charlotte is interested in exploring the dynamics that surround research in these different environments. As a returning editor of “Listen In”, she’s particularly looking forward to highlight scholarship that is accessible outside of paywalls. Find her on twitter: @cha_cjb
Symphony Bruce is a Resident Librarian at American University in Washington, D.C. After six years as a high school English teacher, she switched to librarianship to be a champion for information literacy and access, with a specific interest in critical pedagogy and a developing interest in privacy and digital safety. She enjoys cooking with vegetables, hanging out with her cats, and drinking her coffee black. Find her on twitter: @curlsinthelib
This work is licensed under a Creative Commons Attribution 4.0 International License
The expressions of the writer do not reflect anyone’s views but their own