API Introduction
Last updated on 2025-10-14 | Edit this page
Overview
Questions
- What is an API?
- What is a 401 status code?
Objectives
- Understand what an API is and how it works
- Understand what HTTP requests are
Introduction
HTML websites are a widespread means of sharing information on the internet. It is unsurprising then that scraping websites is a common practice (in research) to obtain information from the web in an automated way.
However, scraping websites has many downsides; it would be much easier if a computer program could instead communicate with a data provider directly, requesting exactly the information that is needed for the research purpose. This is what Application Programming Interfaces (APIs) accomplish.
Nowadays, many organizations have restricted access to their public APIs drastically. The open-source community remains strong, however, and there are plenty good examples of public open APIs such as the scholarly database OpenAlex.
Their API can be reached at:
Exploring an API
When you open this link in your browser, you won’t see much at first. That’s because we haven’t actually specified the data we would like to retrieve. Luckily, on that page, you will find a link to the API documentation, a very crucial source of information when communicating with an API.
Reading through the documentation, you will find many so called endpoints, URLs that represent resources such as publications:
https://api.openalex.org/works
You can copy-paste this link to a browser. Alternatively, on the
command line you can use curl
. In the next chapter, we will also see how we can
use Python and R to obtain data from the API.
Here is a subset of the data displayed when opening the
works
URL in a browser. OpenAlex returns data as JSON, a
common data format for these types of APIs. We won’t cover the specifics
of the format here, but a quick web search can give you some answers.
JSON
{
"id":"https://openalex.org/W1775749144",
"doi":"https://doi.org/10.1016/s0021-9258(19)52451-6",
"title":"PROTEIN MEASUREMENT WITH THE FOLIN PHENOL REAGENT",
"display_name":"PROTEIN MEASUREMENT WITH THE FOLIN PHENOL REAGENT",
"publication_year":1951,
"publication_date":"1951-11-01"
}
When looking at the meta.count field, you’ll notice that the
total number of publications available via works
is
270,765,445 which is quite large. For the purpose of this workshop, we
would like to reduce this number by applying filters.
This is what most APIs are designed to do and it can be very useful if
you only wish to obtain specific subsets of data.
Let’s say we are only interested in publications from 2024 written by at least one author from the University of Amsterdam. To filter by institutions, OpenAlex uses the so called ROR identifier.
We can modify the URL like this:
https://api.openalex.org/works?filter=institutions.ror:04dkp9463,publication_year:2024
Take a look at the API documentation to explore other filter parameters.
HTTP status codes
We’ve probably all encountered the famous 404 error message when being redirected to a website that does not exist. APIs usually provide more informative error messages.
For example, when inserting a typo into one of our parameters, let’s say replacing publication_year with pppublication_year, we will get a message like this:
Invalid query parameters error
This error tells us what to look for.
When making HTTP requests from Python or R, we will see that handling error messages is necessary for your code to run without interruptions. Below you will find a list of the most common status codes and what they mean.
Code | Name | Meaning |
---|---|---|
200 | OK | The request has succeeded. |
204 | No Content | The server has completed the request, but doesn’t need to return any data. |
400 | Bad Request | The request is badly formatted. |
401 | Unauthorized | The request requires authentication. |
404 | Not Found | The requested resource could not be found. |
408 | Timeout | The server gave up waiting for the client. |
500 | Internal Server Error | An error occurred in the server. |