The first SWHID publicly available specification is out!
Identifying precisely software artifacts and their versions is of paramount importance for a variety of stakeholders, ranging from industry to academia, from cultural heritage to public administration. This issue has been a main concern for Software Heritage from the very beginning, as it is a key prerequisite to accomplish its mission to collect, preserve, and share software source code for present and future generations.
To this end, Software Heritage has been using for years an intrinsic unique identifier called SWHID, that enables software to be unambiguously referenced and retrieved, facilitating the preservation and sharing of software source code. Currently, over 30 billion software artifacts in the Software Heritage archive are identified using SWHIDs.
Considering the broad range of applications of these identifiers, an open process has been put in place in order to produce a publicly available specification that describes precisely how these identifiers are defined and computed, easing adoption for all stakeholders, and assembling a working group to produce an open specification.
The SWHID Working Group kicked off on March 27th, with a webinar that recalled the key concepts behind SWHIDs and drove participants through the tools and processes used to produce the specification. All the onboarding material is available from the website of the SWHID working group, where one can also find the rules of participation and the current status of the specification.
👉 You can find the SWHID kickoff presentation here: https://hal.science/hal-04121507
After three months of intense work, we are delighted to share the news that the first stable version of the SWHID specification has now been approved, and is available online.Â
Software Heritage followers will surely notice that the SWHID acronym now stands for “Software Hash IDentifier ”, and no longer for “Software Heritage IDentifier”: this change, supported by a clear majority of the working group participants, is in line with the relevance of SWHIDs, that goes way beyond source code and Software Heritage.
This first version of the specifications is an important milestone, as it describes precisely how SWHIDs are computed, but it is not the end of the road.Â
We are now calling all stakeholders to look at the specification, and help in identifying additional features that may be helpful to add to the SWHID in order to address their use cases. An example of such a feature is the addition of a qualifier to identify precisely fragments of binary files, which is already open for discussion.