Recommendations for Good Practices When Initiating Linked Open Data (LOD) in Museums and Other Cultural-Heritage Institutions
These recommendations are excerpted from the AAC publication, Overview and Recommendations for Good Practices [PDF]
The following text is meant for museums and other types of cultural-heritage institutions but often uses “museum” to simplify wording. The text is addressed to any staff member in charge of an institution’s data.
At a Glance: Recommendations for Good Practices When Initiating LOD
- Establish Your Digital Image and Data Policies
- Choose Image and Data Licenses That Are Easily Understood
- Plan Your Data Selection
- Recognize That Reconciliation and Standards Are Needed to Make Most Effective Use of LOD
- Choose Ontologies with Collaboration in Mind
- Use a Target Model
- Create an Institutional Identity for URI Root Domains
- Prepare Your Data and Be Sure to Include Unique Identiﬁers
- Be Aware of Challenges When Exporting Data from Your CIS; Develop an Extraction Script, or API
- If Outsourcing the Mapping and Conversion of Your Data to LOD, Do Not Assume the Contractor Understands How Your Data Functions or What You Intend to Do with It
- Accept That You Cannot Reach 100 Percent Precision, 100 Percent Coverage, 100 Percent Completeness: Start Somewhere, Learn, Correct
- Operationalize the LOD within Your Museum
1. Establish Your Digital Image and Data Policies
Before or as you begin implementing Linked Open Data (LOD) in your museum, you should establish an institution-wide agreement on the proper use of your images and data. Instituting a plan for usage may require many layers of sign-off and therefore take time to complete. The process of working on your data and images in an LOD initiative may help your organization extend its thinking about what is possible and desirable for stakeholders and constituents.
LOD is about being able to interconnect your museum’s digital records with those in other collections. Although you may choose to convert only your data to LOD, you will likely be interested in eventually including images. A few good examples of statements on digital image usage include the Policy on Digital Images of Collection Objects Usage formulated by the Walters Art Museum, Baltimore, Maryland (https://thewalters.org/rights-reproductions.aspx), and those devised for the Yale Center for British Art, New Haven, Connecticut (http://britishart.yale.edu/collections/using-collections/using-images), and the National Gallery of Art, Washington, DC (https://images.nga.gov/en/page/openaccess.html). You are also encouraged to consult RightsStatements.org, described in the next section, on licenses.
2. Choose Image and Data Licenses That Are Easily Understood
A valuable resource for rights on image usage is http://RightsStatements.org. It focuses on a range of common international options for image rights that the museum community will likely increasingly consult and support.
For data, you will need to provide a license that clearly states how others may use your museum’s data. When you engage in LOD, the “open” part means that you are allowing public use. The most widely adopted licenses recognized worldwide are the Creative Commons (CC) licenses, which have been developed speciﬁcally for the distribution of data via the web (and thus internationally). A CC license conveys the right to the public to share and build upon a published work (see https:// creativecommons.org/licenses). Several types of licenses, each with pros and cons, are available. CC0 allows full use with no restrictions. The other CC licenses offer a set of permissions that you may select individually or in combination:
- CC BY requires that attribution always be cited (e.g., “created BY this person/ institution”)
- CC NC allows only non-commercial use (e.g., “you cannot sell this content or derivatives that you make from this content or use it in commercial projects/ presentations”)
- CC SA requires that you “share alike,” meaning that anything produced by others based on your content must carry the same license conditions that you have given in your license, no more and no less.
Depending on your institution’s needs, you can variously combine these three permissions to fashion the license you want. CC BY-NC (“provide credit to us for the use of our data and only use this noncommercially”) and CC BY-NC-SA (“provide credit to us for the use of our data, make whatever you produce available to others with these same license conditions, and use this only non- commercially”) are common combinations seen on websites. Other licenses have a number after the CC (e.g., CC4) to identify the version. Previous licenses remain valid, and updated licenses do not take away license rights from earlier versions.
Any of the more restrictive versions may make sense for your museum. If you are part of a collaborative multi-institutional project, however, that choice may require your partners to accept the more restrictive terms as well, as the data query or page display may expose content from multiple institutions.
3. Plan Your Data Selection
Museums are the purveyors of vast information resources. Object information is the most obvious to make available as LOD, but equally important data comes from bibliographic, archival, exhibition, curatorial, and conservation sources. In addition, within each of those categories are quantitative data (dates, dimensions) and narrative data (object descriptions, curatorial notes, educational content). Selecting which data to contribute to an LOD project requires that you carefully consider project goals and time frames and pragmatically assess what is achievable. At the same time, you want to balance your short-term objectives with the long-term aim for the LOD to serve multiple purposes and align with other institutions in the future.
While LOD can provide rich results with a full set of data, converting all the data related to a theme or collection at one time is rarely feasible. Limitations on resources and legal or administrative constraints can render some relevant data unavailable for LOD projects. Curatorial records, for example, offer a wealth of information but may be more proprietary in nature than object data from a collections database. Since it simply may not be realistic for your collection management and digitization plans to include all your data initially, you may
prefer an incremental approach—beginning with the basic label copy, or “tombstone” data, for an object, and later adding more descriptive and educational data. When making your choices, consider how the data will be used, particularly in combination with partners and other institutions.
Depending on an institution’s size, a varied group of professionals might be needed to deﬁne and identify the appropriate content for an LOD project. The team may be drawn from the ranks of information technology, collections, curatorial, education, and design departments, among others.
4. Recognize That Reconciliation and Standards Are Needed to Make Most Effective Use of LOD
While you may wish to maintain local standards for use in your institution, remember that LOD is about data that is open. One of the key beneﬁts of LOD is its capacity to link data across collections! Opening the usage of your data is part of increasing your institution’s visibility. Scholars, the public, other institutions, and developers may wish to link to your data and/or create applications.
If your LOD consists mainly of sketchy and/or unstructured data (nonstandard vocabularies, text strings without unique identiﬁers, etc.), it will diminish the potential to interconnect the information for global use and be difﬁcult to reconcile with other linked resources, such as the Union List of Artist Names (ULAN), one of the LOD projects of the Getty Vocabulary Program by the Getty Research Institute, Los Angeles. You will want the name, of an artist, and additional descriptive information to be precise enough to determine a match with an artist listed in ULAN.
Resolving decades-long problems with legacy data—such as the disparate ways information on dates, dimensions, titles of works of art, “unknown” values, and other basic details about objects has been recorded—is challenging but critical. If the cultural-heritage community wants to share information, it needs to identify solutions and seek broad agreement around the problematic issues of legacy data.1 Consider working with organizations such as American Association of Museums (AAM), Arlington, Virginia; Museum Computer Network (MCN), New York; and International Council of Museums (ICOM), Paris, and/or apply for grants to establish community consensus on recommended solutions and tools for these information-sharing obstacles.
5. Choose Ontologies with Collaboration in Mind
From the outset you will need to decide what ontology or ontologies you will deploy to take full advantage of the precision, or “semantic glue,” LOD provides. If you choose to use more than one ontology, which is likely, make sure your mapping tool can handle multiple ontologies.
Not many ontologies are speciﬁc to the discipline of cultural heritage. The two that museums most commonly adopt are the Europeana Data Model (EDM) and the Conceptual Reference Model (CRM) produced by ICOM’s International Committee for Documentation (CIDOC). Optionally, a museum can adapt an ontology to create a proﬁle that best suits its needs, such as the target model that the AAC formed to simplify the CIDOC CRM, and/or incorporate additional, commonly used ontologies from the web community (see https://linked.art). It is important to emphasize that the AAC target model is not another ontology. As stated, it is a proﬁle of the CRM. Note that new and evolving ontologies continue to be produced. The archive and library communities have their own ontologies, which many other types of institutions will want to incorporate into their LOD. One of the many beneﬁts of using LOD is that it is feasible to model data in ways that incorpo- rate multiple ontologies for speciﬁc purposes, if necessary.
If you choose the CIDOC CRM, recognize that it poses challenges. In some cases, the CRM’s ability to capture details will depend on the availability of curators or scholars to provide information and express appropriate relationships. Nevertheless, just because the CRM was created to work with cultural information at a highly detailed level dœs not mean it is not helpful if applied more generally. The CRM concepts are structured in hierarchies. One can choose to start on a more general level. Some users of the CRM may argue, however, that a museum should apply the CRM as deeply as possible at the ﬁrst opportunity, partly because the institutions’ focus and funding to prepare additional data may not come around again. Plan accordingly.
6. Use a Target Model
Whether your institution is working alone or on a collaborative LOD initiative with other museums, developing or using an existing target model for mapping your data is a top priority (see https://linked.art for the target model that AAC developed). The model should be a subset of all the mapping possibilities relevant to your data. The model helps eliminate guesswork, keeps the mapping consistent, and signiﬁcantly reduces the modeling and design effort required in the project. It also provides a reference that developers can use across multiple projects.
7. Create an Institutional Identity for URI Root Domains
Uniform Resource Identiﬁers (URIs) are unique identiﬁers that designate objects, people, places, and things in a way that can be read by computers. They are key components of LOD. Thus, Resource Description Framework (RFD) triples—the underlying data format for LOD—are composed of URIs, not “plain English.” To establish authority and persistence for the data you are converting to LOD, you should select an institutional root domain (the top-level hierarchy of a URI address). Selecting the root domain requires forethought. Changing root domains results in broken links among the data (akin to broken links in web pages), which create problems for those who will rely on that data in the applications they develop.
When selecting a URI root domain, consult with your IT staff to ensure your chosen name identity is not being considered elsewhere in the museum for other pur- poses. To avoid an overlap, it is a good practice to identify a single root domain that will be used throughout the institution for all its current and future LOD. A discrete subdomain (e.g., “data.thewalters.org”) or a completely new domain (e.g., “thedigitalwalters.org”) distinct from an institution’s website URL (e.g., “thewalters. org”) is recommended.
A second consideration is ownership of the root domain. You want proprietorship so that you can retain control over it. In the case of creating a completely new domain, ownership can be accomplished by registering the domain with a registration service. The procedure is the same as registering a website domain. Creating a subdomain of your existing website requires adding the subdomain name (e.g., “data”) to your website’s Canonical Name (CNAME) registration.
8. Prepare Your Data and Be Sure to Include Unique Identifiers
It is important to review and clean up inconsistencies in data structures, formats, and values where possible, as irregularities will cause problems for the mapping and conversion of the data to LOD. Make sure you have completed ﬁlling in all your data categories and that you have addressed outstanding issues. Check for spelling errors and content inconsistencies. (Also see recommendation 9 below for additional steps that may be needed to prepare your data.)
You should always make sure your data includes unique identiﬁers that do not change. For art objects, the identiﬁer may be an accession number or other unique identiﬁers generated by your collection information system (CIS); use whichever is the most stable and unchanging. Look at examples from existing LOD data sets at other institutions and consult https://linked.art for guidance.
9. Be Aware of Challenges When Preparing and Exporting Data from Your CIS; Develop an Extraction Script, or API
Regrettably, some CISs are implemented in ways that do not structure data in formats that facilitate mapping to ontologies such as the CIDOC CRM. Some CISs rely heavily on text ﬁelds that are narrative, or human-readable. While provisions for adding a structured linked layer to complement the text ﬁelds might be avail- able, the layers can be cumbersome to apply and access.
It is possible that your museum’s CIS has embedded Thesaurus tools, for example, that may not be implemented in ways that support LOD cataloging. Some CIS platforms contain the Getty vocabularies, but not always the latest versions, such as LOD. The difﬁculty of obtaining access to current thesauri prevents information managers from capturing and applying terms that reﬂect recent research and geographical boundaries.
It is important to encourage your CIS vendor to simplify the processes of entering structured data and getting data out of the system in ways that can be used for LOD. CIS vendors have been responsive in the past in respect to upgrading systems to reﬂect new standards and directions that have broad adoption and “staying power.” As more museums select LOD for sharing their data, CISs also need to be updated. The community’s insistence will help move the tools forward for everyone.
Given the complexities of extracting and exporting data, once your museum identiﬁes and addresses the issues that arise with in-house data preparation and extraction, you should aim to construct a work ﬂow for the process and automate as much of it as possible. A scripted extraction method, or application programming interface (API), will minimize the effort it takes for a museum to incorporate updates into its LOD at routine intervals.
10. If Outsourcing the Mapping and Conversion of Your Data to LOD, Do Not Assume the Contractor Understands How Your Data Functions or What You Intend to Do with It
Companies are starting to offer outsourcing of mapping and data conversion. Always check if the vendor has had experience working with museum data. Be prepared to invest time up-front orienting and providing your data rules to the people doing the work so they understand the nuances of the data they are handling. Some museum data can be hard to comprehend for those not educated in its norms, particularly when it may contain ambiguities, such as in date ranges, attributions, and even identiﬁers. Mapping and conversion mistakes can result, wasting the investment you make in LOD.
11. Accept That You Cannot Reach 100 Percent Precision, 100 Percent Coverage, 100 Percent Completeness: Start Somewhere, Learn, Correct
Museums are traditionally reserved and cautious about releasing their data to the public. Most museums do not want to publish data until it is “complete,” however that is deﬁned. Concerns can arise about correctness—especially when dates for an artist or work are ambiguous, with “authorities” providing different information. To some extent, the ubiquity of the World Wide Web and social media is changing restrained attitudes toward publishing information online, one reason being that data can now be easily updated, which is the case with LOD.
Remember that your LOD initiatives can be incremental. Particularly when using a target model, you can, over time, add data, which can include deeper detail as well as new types of information.
Consider that the data you are managing for LOD dœs not sit in isolation of other data and content in your museum. Other initiatives in your museum, such as implementation of the International Image Interoperability Framework (IIIF) and educational and scholarly publishing on your website, may create opportunities for integration with your LOD. All museum applications and websites require museum data in some form, so it makes sense to plan for evolution as new requirements emerge.
12. Operationalize the LOD within Your Museum
Once you have converted your museum’s data to LOD, make effective use of it and operationalize it across the museum. LOD can serve as a master resource for many of the digital applications you use to reach your audiences. As a starting point, you could update existing online collections websites and digital interactives so they can draw from your institution’s LOD.
You could also set the stage for instituting new cataloging practices. Consider cataloging LOD identiﬁers in your CIS and building reconciliation into early cataloging work by capturing IDs from, for example, the Getty vocabularies—The Union List of Artist Names (ULAN), the Art & Architecture Thesaurus (AAT), and the Getty Thesaurus of Geographic Names (TGN)—alongside the terms and descriptions you use in cataloging. Make sure that narrative and descriptive ﬁelds are complemented by structured data (artist, title, date, etc.). Finally, set up an automated system for refreshing your LOD as new records are added, much as many museums automatically update their website data on a nightly basis.
- Examples of legacy data issues include:Dates: The inconsistency in date form has been an issue when extracting a useful and correct value or assigning an accurate relationship. Data for an artist’s birth and/or death are sometimes exact, but data sometimes includes only the year or words such as “born,” “active,” or “circa” embedded in the data ﬁeld, which makes extracting the numeric value complex.Dimensions: Some museums have dimensions of artworks entirely separated into length, width, and depth, with the measurements and the units of measure. Most museums, however, only have a paragraph containing all the measurement information, which is difﬁcult to parse.IDs: Unique identiﬁers for important data are essential for constructing URIs for things like objects and people. Most museums include IDs for artworks and/or artists, but many forget to include them. Moreover, objects may have two IDs: the accession number (which can change) and a system ID (which can change as systems are replaced).