Use library resources for TDM and LLMs

If you plan to use resources provided by the University of Arizona Libraries for text & data mining (TDM) elements of your research or training large language models (LLMs), follow this guide to learn your rights and responsibilities.

On this page

Understanding the basics

What kinds of material does this cover?

The UA Libraries subscribe to thousands of electronic journals, books, datasets, and other databases for use by authorized users (typically restricted to current faculty, students, and staff and, in most cases, on-site visitors). The terms and conditions for using these resources are set out in electronic resource license agreements that the Libraries sign with each publisher.

Where does library content come from?

Most of the digital research materials you access through the University of Arizona Libraries aren’t owned by the University. Instead, they’re owned by commercial publishers, academic societies, and other content providers who create and distribute scholarly resources.

Think of using the Libraries’ electronic resources like watching Netflix: you can watch movies and shows on Netflix, but Netflix doesn’t own most of that content – they pay licensing fees to film studios and content creators for the right to make it available to you. So do the Libraries.

In fact, each year, the Libraries sign license agreements and pay substantial licensing fees (millions of dollars annually) to publishers like Elsevier, Springer, Wiley, and hundreds of other content providers so that you can access their journals, books, and databases for your research and coursework.

What is a library license agreement?

A license agreement is a legal contract between UA Libraries and each publisher that spells out exactly how the UA community can use that publisher’s content. These contracts typically cover:

  • Who can access the content (usually current faculty, students, researchers, and staff)
  • How you can use it (reading, downloading, printing individual articles)
  • What you can’t do (automated mass downloading, sharing with unauthorized users, making commercial uses of it)
  • Special restrictions (including rules about AI tools and text and data mining)

Any time you access a database or use your UA credentials to log in to a resource, you must comply with the terms of the license agreement that the Libraries have signed. All the agreements are different.

Why are all the agreements different? Can’t the Libraries just sign the same agreement with everyone?

Unfortunately, no. Each publisher has their own standard contract terms, and they rarely agree to identical language. Here’s why:

  • Different business models: Some publishers focus on journals, others on books or datasets – each has different concerns
  • Varying attitudes toward artificial intelligence: Some publishers embrace AI research, others are more restrictive
  • Disciplinary variations: Publishers licensing content in different fields (e.g. business, data) typically offer different restrictions than those in other disciplines
  • Legal complexity: Text data mining and AI are relatively new, so contract language is still evolving

Can’t you negotiate better terms?

We can’t force publishers to accept uniform language, and we can’t guarantee that every resource allows TDM or AI usage. This is why the Libraries need to check each agreement individually when you want to use content with TDM or AI tools.

How do I know if I’m using Libraries-licensed content?

The following kinds of materials are typically governed by Library license agreements:

  • Materials you access through the UA Libraries Search (the Libraries’ online catalog)
  • Articles from academic journals accessed by the Libraries
  • E-books available through the Libraries’ databases
  • Research datasets licensed by the Libraries
  • Any content accessed through the Libraries’ database subscriptions
  • Materials that require you to log in with your UA NetID

What if I get the content from a website not licensed by the Libraries?

If you’re downloading or mining content from a website that is not licensed by the Libraries, you should read the website’s terms of use, sometimes called “terms of service.” They will usually be found through a link at the bottom of the web page. Carefully understanding the terms of service can help you make informed decisions about how to proceed.

Even if the terms of use for the website or database restrict or prohibit text mining or AI, the provider may offer an application programming interface, or API, with its own set of terms that allows scraping and AI. You can/should also try contacting the provider and requesting permission for the research you want to do.

Text and data mining (TDM) & Artificial Intelligence (AI)

The UA Libraries try to secure TDM and AI usage and training rights across all of our e-resource licenses. However, the scope of TDM and AI rights vary, and licenses may not allow for use or training of their content with public versions of AI tools.

Before using artificial intelligence with any licensed Library resource, please contact us (see below) for more information about governing terms.

What can and can’t I do with these licensed electronic resources?

Licenses vary from publisher to publisher; however, in general the following applies:

Generally permitted

  • Uses consistent with the Fair Use provisions of the United States Copyright Act (Have questions about fair use? Contact the Scholarly Communication librarian.)
    • Viewing, downloading, copying, printing, and saving a copy of search results
    • Viewing, downloading, copying, printing, and saving individual articles
    • Using e-resources for scholarly, educational, or scientific research, teaching, private study, and clinical purposes
    • Sending a copy of an article to another authorized user (i.e. current faculty, students, or staff)
    • Posting the URL to the publisher’s version of the article on a restricted class website (publisher links will allow only authorized users access)
    • Modifying the resource format in compliance with accessibility laws

Generally not permitted

The following activities generally are not permitted. Further, engaging in any of the activities included in the list below may result in an immediate suspension of the patron’s access to online library resources.

  • Use of robots or intelligent agents to do systematic, bulk, or automatic downloading of content in violation of a license agreement
  • Systematic downloading or printing of entire journal issues or volumes, or large portions of other e-resources
  • Using e-resources for commercial gain (i.e. reselling, redistributing, or republishing licensed content, as well as any other uses that are intended for or directed towards commercial advantage or monetary compensation). This includes using resources for outside work not connected with degree requirements.
  • Transmitting, disseminating, or otherwise making online content available to unauthorized users (i.e. sending to mailing lists or electronic bulletin boards)
  • Posting the publisher's version or PDF of an article to a website on the open web (instead, post the URL to the article which will allow only authorized users access)
  • Removing, obscuring, or modifying any copyright or other notices included in the materials
  • Sharing passwords/log-in information with unauthorized users

Some vendor agreements do not allow retaining data and other downloaded material past the termination of the Library’s license agreement. In the event downloaded material from a terminated agreement is being used in an ongoing project or upcoming publication, the downloaded material should be deleted from personal devices within a reasonable time frame following publication acceptance or completion of the work.

Violating the terms of the university’s licensing agreements with publishers can result in the entire campus losing access to critical research resources, and potentially expose you and the University to legal liability. It is the responsibility of individual authorized users to ensure that their use of electronic resources does not breach the terms and conditions specified in the license agreements.

But my research is a “fair use”!

We agree. But there’s a distinction between what copyright law allows and how license agreements (which are contracts) affect your rights under copyright law.

Copyright law gives you certain rights, including fair use for research and education. In contrast, contract law can override those rights when you agree to specific terms. When the UA Libraries sign a license agreement with a publisher so you can use content, both the University and its users (that’s you) must comply with those contract terms.

Therefore, even if your AI training or text mining would normally qualify as fair use, the license agreement you’re bound by might explicitly prohibit it, or place specific qualifications on how AI might be used (e.g. use of AI permissible; training of AI prohibited).

Your responsibilities

What do I have to do?

If you intend to use any Library electronic resources with AI tools or for text and data mining research, check what’s allowed under our license agreements by contacting the UA Libraries and we’ll check the license agreement and tell you what’s permitted.

Do I have to comply? What’s the big deal?

As mentioned above, breaching license agreements can result in losing access to critical research resources for the entire UA community – and potentially expose you and the University to legal liability and lawsuits.

For the University:

  • Loss of access: Publishers can immediately cut off access to critical research resources for everyone on campus
  • Legal liability: The University could face costly lawsuits. Some publishers might claim millions of dollars worth of damages
  • Damaged relationships: Violations can harm the library’s ability to negotiate future agreements, or prevent us from getting you access to key scholarly content

This doesn’t just affect the University – it also affects you. Violating the agreements can result in:

  • Immediate suspension of your access to all library electronic resources
  • Legal exposure: You could potentially be held personally liable for damages in a lawsuit
  • Research disruption: Loss of access to essential materials for your work

What if I’m using a campus-licensed AI platform?

Even when using University of Arizona’s licensed AI platforms (like Gemini or ChatGPT), you still need to check on whether you can upload Library-licensed content to that platform. The fact that the University provides the tool doesn’t automatically make all Library-licensed content okay to upload to it.

What if I’m using my own ChatGPT, Anthropic, or other generative AI account?

The fact that you subscribe to the tool doesn’t mean you can upload Library-licensed content to it.

Do I really have to contact you? Can’t I just look up the license terms somewhere?

We wish it were that simple, but the Libraries sign thousands of agreements each year with highly complex terms. Contact us early in your research planning process so we can help you with your project.

  • Be specific: The more details you provide, the faster we can give you guidance
  • Ask questions: We’re here to help, not to block your research

Get help

Do you have questions about a specific database or resource?

This guidance is for informational purposes and should not be construed as legal advice. When in doubt, always contact library staff for assistance with specific situations.

This page (released under a Creative Commons Attribution 4.0 International License) was adapted from “Before you scrape and before you train…” by Timothy Vollmer and “Conditions of use and licensing restrictions for electronic resources” by University of California Berkeley Library, both licensed under a Creative Commons Attribution-Noncommercial 4.0 License.