Close

July 25, 2023

Software Heritage GraphQL Explorer: More power to your APIs

Sharing the collected and preserved source code with the stakeholders is an important part of the Software Heritage mission. We are glad to unveil the newest feature to search and retrieve information from the archive – Software Heritage GraphQL Explorer.

While Software Heritage’s REST APIs are a convenient and simple way for a client to retrieve data from the archive, the Software Heritage GraphQL Explorer provides an efficient alternative that uses GraphQL to handle use cases.

GraphQL allows a client to fetch the server data using a query language and enables them to create powerful requests.

The data model in GraphQL is represented using a schema with a set of objects and relations between them. A client can enter and navigate the schema using a set of predefined root types. It can request specific fields from one or more objects, with a query language, in a single request.
This mechanism offers a unique set of advantages for GraphQL over the REST APIs.

The GraphQL advantages

  1. Unlike REST, which exposes a set of endpoints, GraphQL offers a single POST endpoint. This avoids the complexity related to API versions in the client.
  2. Data can be requested from different objects in a single request. This could avoid multiple server round trips and improve the client’s overall performance.
  3. A client can request only the fields they are interested in an object. Points 2 and 3, combined together, will fix the over-fetching and under-fetching problems that are often associated with the REST APIs.
  4. Server response will always be predictable and type-safe which could make clients less prone to errors.
  5. GraphQL offers dynamic schema documentation that is easy to follow.
  6. GraphQL offers specific, more descriptive errors compared to REST.

👉 You can interact with the SWH GraphQL service using the GraphQL Explorer here.

The UI lets you explore the schema documentation and write and test queries.

Executing a very big query comes with a cost and we restrict its usage due to performance implications.

Comparatively bigger queries are allowed if you log in with your SWH credentials.

A few examples using the GraphQL Explorer in Software Heritage

Query to get the latest snapshot from an origin

The following GraphQL query is requesting the unique identifier (SWHID) of the latest snapshot from a specific origin with the URL “https://github.com/python/cpython”. The response to this query will provide the requested SWHID, which can be used to uniquely identify the snapshot

and access the latest snapshot (including all repository branches, releases and content) in the Software Heritage archive associated with the given origin.

Query to retrieve raw content of a codemeta.json file

To achieve the objective of checking the presence of a file named “codemeta.json” and getting its content if present, we can use a more elegant GraphQL query that fetches both the directory information and the content of the file in a single request. Starting from the directory SWHID:

swh:1:dir:ec88e5b901c034d5a91aa133e824d65cff3788a3;origin=https://github.com/rdicosmo/parmap;visit=swh:1:snp:25490d451af2414b2a08ece0df643dfdf2800084;anchor=swh:1:rev:db44dc9cf7a6af7b56d8ebda8c75be3375c89282

GraphQL, though a bit harder to get started compared to REST, could be very useful for complex clients with specific data requirements. Apart from this, the explorer can be used to get some quick insights into data.

We encourage you to experiment with the service either using the explorer or using custom clients.

👉 Willing to know more? See the documentation here.

All the examples in this blog post and other sample queries to get you started are available here.


— Jayesh Velayudhan

July 25, 2023

Sharing the collected and preserved source code with the stakeholders is an important part of the Software Heritage mission. We are glad to unveil the newest feature to search and retrieve information from the archive – Software Heritage GraphQL Explorer.

While Software Heritage’s REST APIs are a convenient and simple way for a client to retrieve data from the archive, the Software Heritage GraphQL Explorer provides an efficient alternative that uses GraphQL to handle use cases.

GraphQL allows a client to fetch the server data using a query language and enables them to create powerful requests.

The data model in GraphQL is represented using a schema with a set of objects and relations between them. A client can enter and navigate the schema using a set of predefined root types. It can request specific fields from one or more objects, with a query language, in a single request.
This mechanism offers a unique set of advantages for GraphQL over the REST APIs.

The GraphQL advantages

  1. Unlike REST, which exposes a set of endpoints, GraphQL offers a single POST endpoint. This avoids the complexity related to API versions in the client.
  2. Data can be requested from different objects in a single request. This could avoid multiple server round trips and improve the client’s overall performance.
  3. A client can request only the fields they are interested in an object. Points 2 and 3, combined together, will fix the over-fetching and under-fetching problems that are often associated with the REST APIs.
  4. Server response will always be predictable and type-safe which could make clients less prone to errors.
  5. GraphQL offers dynamic schema documentation that is easy to follow.
  6. GraphQL offers specific, more descriptive errors compared to REST.

👉 You can interact with the SWH GraphQL service using the GraphQL Explorer here.

The UI lets you explore the schema documentation and write and test queries.

Executing a very big query comes with a cost and we restrict its usage due to performance implications.

Comparatively bigger queries are allowed if you log in with your SWH credentials.

A few examples using the GraphQL Explorer in Software Heritage

Query to get the latest snapshot from an origin

The following GraphQL query is requesting the unique identifier (SWHID) of the latest snapshot from a specific origin with the URL “https://github.com/python/cpython”. The response to this query will provide the requested SWHID, which can be used to uniquely identify the snapshot

and access the latest snapshot (including all repository branches, releases and content) in the Software Heritage archive associated with the given origin.

Query to retrieve raw content of a codemeta.json file

To achieve the objective of checking the presence of a file named “codemeta.json” and getting its content if present, we can use a more elegant GraphQL query that fetches both the directory information and the content of the file in a single request. Starting from the directory SWHID:

swh:1:dir:ec88e5b901c034d5a91aa133e824d65cff3788a3;origin=https://github.com/rdicosmo/parmap;visit=swh:1:snp:25490d451af2414b2a08ece0df643dfdf2800084;anchor=swh:1:rev:db44dc9cf7a6af7b56d8ebda8c75be3375c89282

GraphQL, though a bit harder to get started compared to REST, could be very useful for complex clients with specific data requirements. Apart from this, the explorer can be used to get some quick insights into data.

We encourage you to experiment with the service either using the explorer or using custom clients.

👉 Willing to know more? See the documentation here.

All the examples in this blog post and other sample queries to get you started are available here.


— Jayesh Velayudhan

July 25, 2023

Sharing the collected and preserved source code with the stakeholders is an important part of the Software Heritage mission. We are glad to unveil the newest feature to search and retrieve information from the archive – Software Heritage GraphQL Explorer.

While Software Heritage’s REST APIs are a convenient and simple way for a client to retrieve data from the archive, the Software Heritage GraphQL Explorer provides an efficient alternative that uses GraphQL to handle use cases.

GraphQL allows a client to fetch the server data using a query language and enables them to create powerful requests.

The data model in GraphQL is represented using a schema with a set of objects and relations between them. A client can enter and navigate the schema using a set of predefined root types. It can request specific fields from one or more objects, with a query language, in a single request.
This mechanism offers a unique set of advantages for GraphQL over the REST APIs.

The GraphQL advantages

  1. Unlike REST, which exposes a set of endpoints, GraphQL offers a single POST endpoint. This avoids the complexity related to API versions in the client.
  2. Data can be requested from different objects in a single request. This could avoid multiple server round trips and improve the client’s overall performance.
  3. A client can request only the fields they are interested in an object. Points 2 and 3, combined together, will fix the over-fetching and under-fetching problems that are often associated with the REST APIs.
  4. Server response will always be predictable and type-safe which could make clients less prone to errors.
  5. GraphQL offers dynamic schema documentation that is easy to follow.
  6. GraphQL offers specific, more descriptive errors compared to REST.

👉 You can interact with the SWH GraphQL service using the GraphQL Explorer here.

The UI lets you explore the schema documentation and write and test queries.

Executing a very big query comes with a cost and we restrict its usage due to performance implications.

Comparatively bigger queries are allowed if you log in with your SWH credentials.

A few examples using the GraphQL Explorer in Software Heritage

Query to get the latest snapshot from an origin

The following GraphQL query is requesting the unique identifier (SWHID) of the latest snapshot from a specific origin with the URL “https://github.com/python/cpython”. The response to this query will provide the requested SWHID, which can be used to uniquely identify the snapshot

and access the latest snapshot (including all repository branches, releases and content) in the Software Heritage archive associated with the given origin.

Query to retrieve raw content of a codemeta.json file

To achieve the objective of checking the presence of a file named “codemeta.json” and getting its content if present, we can use a more elegant GraphQL query that fetches both the directory information and the content of the file in a single request. Starting from the directory SWHID:

swh:1:dir:ec88e5b901c034d5a91aa133e824d65cff3788a3;origin=https://github.com/rdicosmo/parmap;visit=swh:1:snp:25490d451af2414b2a08ece0df643dfdf2800084;anchor=swh:1:rev:db44dc9cf7a6af7b56d8ebda8c75be3375c89282

GraphQL, though a bit harder to get started compared to REST, could be very useful for complex clients with specific data requirements. Apart from this, the explorer can be used to get some quick insights into data.

We encourage you to experiment with the service either using the explorer or using custom clients.

👉 Willing to know more? See the documentation here.

All the examples in this blog post and other sample queries to get you started are available here.


— Jayesh Velayudhan

July 25, 2023