The Guardian API
- Availability
- Airbyte Cloud Airbyte OSS
- Support Level
- Community
- Latest Version
- 0.1.0
- Definition Id
- d42bd69f-6bf0-4d0b-9209-16231af07a92
Overview
The Guardian API source can sync data from the The Guardian
Requirements
To access the API, you will need to sign up for an API key, which should be sent with every request. Visit this link to register for an API key.
The following (optional) parameters can be provided to the connector :-
q
(query)
The q
(query) parameter filters the results to only those that include that search term. The q
parameter supports AND
, OR
and NOT
operators. For example, let's see if the Guardian has any content on political debates: https://content.guardianapis.com/search?q=debates
Here the q parameter filters the results to only those that include that search term. In this case, there are many results, so we might want to filter down the response to something more meaningful, specifically looking for political content published in 2014, for example: https://content.guardianapis.com/search?q=debate&tag=politics/politics&from-date=2014-01-01&api-key=test
tag
A tag is a piece of data that is used to categorise content. All Guardian content is manually categorised using these tags, of which there are more than 50,000. Use this parameter to filter results by showing only the ones matching the entered tag. See here for a list of all tags, and here for the tags endpoint documentation.
section
Use this to filter the results by a particular section. See here for a list of all sections, and here for the sections endpoint documentation.
order-by
Use this to sort the results. The three available sorting options are - newest, oldest, relevance. For enabling incremental syncs set order-by to oldest.
start_date
Use this to set the minimum date (YYYY-MM-DD) of the results. Results older than the start_date will not be shown.
end_date
Use this to set the maximum date (YYYY-MM-DD) of the results. Results newer than the end_date will not be shown. Default is set to the current date (today) for incremental syncs.
Output schema
Each content item (news article) has the following structure:-
{
"id": "string",
"type": "string"
"sectionId": "string"
"sectionName": "string"
"webPublicationDate": "string"
"webTitle": "string"
"webUrl": "string"
"apiUrl": "string"
"isHosted": "boolean"
"pillarId": "string"
"pillarName": "string"
}
The source is capable of syncing the content stream.
Setup guide
Step 1: Set up the The Guardian API connector in Airbyte
For Airbyte Cloud:
- Log into your Airbyte Cloud account.
- In the left navigation bar, click Sources. In the top-right corner, click +new source.
- On the Set up the source page, select The Guardian API from the Source type dropdown.
- Enter your api_key (mandatory) and any other optional parameters as per your requirements.
- Click Set up source.
For Airbyte OSS:
- Navigate to the Airbyte Open Source dashboard.
- Set the name for your source (The Guardian API).
- Enter your api_key (mandatory) and any other optional parameters as per your requirements.
- Click Set up source.
Supported sync modes
The Guardian API source connector supports the following sync modes:
Feature | Supported? |
---|---|
Full Refresh Sync | Yes |
Incremental Sync | No |
Namespaces | No |
Performance considerations
The key that you are assigned is rate-limited and as such any applications that depend on making large numbers of requests on a polling basis are likely to exceed their daily quota and thus be prevented from making further requests until the next period begins.
Changelog
Version | Date | Pull Request | Subject |
---|---|---|---|
0.1.0 | 2022-10-30 | #18654 | 🎉 New Source: The Guardian API [low-code CDK] |