These FAQ are excerpted from the AAC publication, Overview and Recommendations for Good Practices [PDF]
What is Linked Open Data (LOD)?
In computing, Linked Data describes a method of publishing structured data so that it can be interlinked and therefore useful in web implementations. Tim Berners-Lee, director of the World Wide Web Consortium (W3C), coined the term. Linked Open Data (LOD) refers to data that is made available for public use via Linked Data.
What is the Semantic Web?
“Semantic Web” refers to the vision and set of technologies that would enable identities to be linked semantically via the web so that accurate web searches are possible. To achieve the “semantic glue” that provides context and meaning in doc- uments, W3C’s Resource Description Framework (RDF) is used to tag information, much like Hypertext Markup Language (HTML) is used for publishing on the web.
RDF is a W3C standard model (i.e., web based and web friendly) for interchanging data on the web. RDF breaks down knowledge into discrete pieces, according to rules about the semantics, or meaning, of those pieces, which it represents as a list of statements with three terms: subject, predicate, and object, known as triples.
Each subject, predicate, and object is a Uniform Resource Identiﬁer (URI) or, for the object, a literal value such as a number or American Standard Code for Information Interchange ASCII string. An organization that produces LOD must select one or more ontologies to play the key role of deﬁning the meaning of the terms used in the subject/predicate/object statements. In essence, RDF and the ontology give context and meaning to a statement, and the URIs provide the identiﬁers for all the entities described and published as LOD, which allows them to be discovered and connected. Approximately 149 billion triples are currently in the LOD cloud.
What are RDF and URI?
As stated above, Resource Description Framework (RDF) is a W3C standard model (i.e., web based and web friendly) for interchanging data on the web. RDF is a way for computers to work with facts and express statements about resources. RDF breaks down knowledge into discrete pieces, according to rules about the semantics, or meaning, of those pieces, which it represents as a list of statements in the form of subject, predicate, and object, known as triples. The format mimics English sentence structure, which accommodates two nouns (the subject and object) and a type of relationship expressed as a verb (the predicate).
Each subject, predicate, and object is a Uniform Resource Identiﬁer (URI) or, for the object, a literal value such as a number or ASCII string. A URI is typically expressed as a Uniform Resource Locator (URL), which provides the location of the identiﬁed resource on the web.
What is an ontology?
Simply stated, an ontology is a schema or conceptual framework that gives data meaning. As described by Wikipedia, an ontology in the ﬁeld of information science is the “representation of entities, ideas, and events, along with their prop- erties and relations, according to a system of categories.”1 An ontology provides a set of deﬁnitions for the meaning of URIs used in Linked Data. Ontologies, which are deﬁned using speciﬁcations standardized by the W3C, allow the deﬁnition of classes, relationships, and properties to be used in a data model.
What is the CIDOC CRM?
CIDOC CRM, the Conceptual Reference Model (CRM) created by the International Committee for Documentation (CIDOC) of the International Council of Museums, is an extensive cultural-heritage ontology containing eighty-two classes and 263 properties, including classes to represent a wide variety of events, concepts, and physical properties. It is recognized as International Organization for Standardization (ISO) 21127:2006. For more information about the CIDOC CRM, visit the website http://www.cidoc-crm.org.
Is all LOD linkable, regardless of which ontology is used?
Yes, all LOD is linkable because the links are between the subject and object entities regardless of the ontology and syntax that are used to model the data. You can therefore link artists in your speciﬁc ontology format to the Getty’s Union List of Artist Names (ULAN) or DBpedia, for example, and link places to GeoNames or the Getty Thesaurus of Geographic Names (TGN), and the like. You can also link your object LOD to information about an object from another museum that is in LOD, even if that institution uses a different ontology.
What are a triplestore and SPARQL endpoint?
A triplestore is a purpose-built database for the storage and retrieval of triples (statements in the form of subject, predicate, and object) through semantic queries. Triplestores support the Semantic Protocol and RDF Query Language (SPARQL).
SPARQL is a query language used to search RDF data, and the endpoint is a net- worked interface created to facilitate these queries. An endpoint is used to query a triplestore and deliver results, primarily for researchers with speciﬁc queries.
What is GitHub, and is it necessary to use it?
GitHub is a web-based data and code-hosting service. AAC’s GitHub was the central working space for producing LOD. Each participating museum had a separate directory in GitHub in which to store its exported data. As the students working at the Information Sciences Institute (ISI) at University of Southern California (USC), Los Angeles, mapped the data, the AAC museums, consultants, and advisers could track their progress and post comments within each museum’s repository in GitHub. While GitHub was a key component for AAC, it is not necessary to use GitHub to produce LOD.
What is a target model?
A target model is a subset of all your data mapping possibilities. It acts like a road map when mapping data to minimize the guesswork and help provide consistency across your data. You will need to select your ontology or ontologies before creating or using an existing target model.
If you chose the CIDOC CRM, remember that there are different schools of thought about how it should be applied. Your target model should reﬂect how you wish to apply the CRM.
What is AAC’s target model?
AAC’s target model is a proﬁle of the CIDOC CRM Linked Data model designed to work across many museums and enable functional applications to be built using the model. More speciﬁcally, because the AAC target model is standardized, many insti- tutions can use it to publish and share their data with little reworking of data. The AAC target model supports varying levels of completeness, or detail, in the data, as it uses the CIDOC CRM alongside other RDF ontologies available outside the museum community, where needed. It is closely aligned with common controlled vocabularies (currently, the Getty’s Art & Architecture Thesaurus). The AAC target model minimizes the learning curve about LOD modeling for institutions’ staff. It also supports the important concept that the data should be able to travel “round trip,” meaning that the data converted from source systems to Linked Data can be converted back into the format of the original system with no loss of data or change in the level of detail.
The AAC target model currently covers 90 percent of the possibilities the CRM offers, with only 10 percent of the complexity of the full CRM ontology.
What decisions shaped AAC’s target model?
Several challenges inﬂuenced the shape of AAC’s target model. Among them were issues of legacy data, which partners could not resolve within the scope of the project; the complexity of the CIDOC CRM, which depends on details that partner data did not include; places where the CRM did not work well with the aim of the web delivery of LOD, so other ontologies need to be incorporated alongside CRM; and a conscious effort to make data mapping basic for the museum community. A full description of AAC’s target model can be found at https://linked.art.
The AAC GitHub link captures the dialogue that inﬂuenced AAC’s target model. These discussions can be found within individual partner directories at https:// github.com/american-art.
What is Linked Art?
Linked Art describes itself on its website (https://linked.art): “Linked Art is a Community working together to create a shared Model based on Linked Open Data to describe Art. We then implement that model in Software and use it to provide valuable content. It is under active development and we welcome additional partners and collaborators.” Linked Art provides patterns and models that enable cultural-her- itage institutions to easily publish their data for event-based digital research projects and non-cultural-heritage developers. It includes the AAC target model and applies it to additional projects, such as those developed by the J. Paul Getty Museum, the Getty Research Institute (Getty Provenance Index), and Pharos, an international consortium of fourteen European and North American art historical photo archives.
What is IIIF?
The International Image Interoperability Framework (IIIF) is a set of agreements for standardized image storage and retrieval. A broad community of cultural-heritage organizations and vendors created IIIF in a collaborative effort to produce an interoperable ecosystem for images. The goals of the project are to:
- Give scholars an unprecedented level of uniform and rich access to image-based resources hosted around the world
- Deﬁne a set of common application programming interfaces (APIs) that support interoperability between image repositories and supporting image viewers
- Develop, cultivate, and document shared technologies, such as image servers and web clients, that provide a world-class user experience in viewing, compar- ing, manipulating, and annotating
For more information about the IIIF, visit the website http://iiif.io.
What open-source tools are available to map, produce, review, and reconcile LOD?
The Open Community Registry of LOD for GLAM (Galleries, Libraries, Archives, Museums) Tools is a good resource. For information, see the spreadsheet https://docs.google.com/spreadsheets/d/1HVFz7p-8Rm3kmDK0apMsrwV_Q0BaDSsw59hLViJSnvs/edit#gid=0. Some key mapping tools for museums are:
- 3M: The 3M online open-source data mapping system has been jointly developed by the Foundation for Research and Technology–Hellas (FORTH) information systems laboratory in Greece and Delving BV in The Netherlands. It is touted as a tool that allows a community of people to view and share mapping ﬁles to increase overall understanding and promote collaboration between the different disciplines necessary to produce quality results. It was expressly designed for making good use of the CIDOC CRM. For more information about 3M, visit the website http://www.ics.forth.gr/isl/index_main.php?l=e&c=721.
- A data generation tool within Karma applies mappings to the data sets to create the RDF data and load it directly into a
- A mapping validation tool, produced by Design for Context, a consultant for AAC, provides a speciﬁcation of the precise ontology mapping and a corre- sponding query that returns the data only if it has been correctly mapped. For more information about the mapping validation tool, see https://review.americanartcollaborative.org.
- A link curation tool, designed and produced by ISI, allows users to review links to other LOD resources such as See https://github.com/american-art/linking.
- IIIF translator tool, produced by ISI, automatically creates IIIF manifests (the format required to run IIIF viewers and applications) from the museum data that the AAC project has mapped to the CIDOC CRM In addition to creating the required metadata, the IIIF translator follows the links for every image provided by the museums, determines the size of each image, and creates a thumbnail of each image as part of the process of creating the IIIF manifests. See https://github.com/american-art/iiif.
Which tools did AAC use?
AAC used ISI’s Karma to map all the data and produce RDF. ISI’s link-curation tool was used to link AAC names to ULAN. Design for Context’s validator tool was used to proof the AAC data for consistency.
What is the difference between an ontology and the Karma mapping tool?
As stated earlier, an ontology is a schema or conceptual framework that gives data meaning. Karma is a tool for mapping data to an ontology and using that mapping to produce RDF.
Dœs our institution need to have a usage policy in place for our data and images before engaging in LOD?
Developing an institution policy requires time for all the appropriate signatures and authorizations. It is recommended, therefore, that your institution develop a digital usage policy before or as soon as you engage in LOD. In theory, you could produce LOD without including images, but most museums will want to provide public access to its images, so having a usage policy for images as well as for data is important.
What are my options for selecting a license?
When you engage in LOD, you should choose a Creative Commons (CC) license, which enables the right to share, use, and build upon a work. For more information about CC licenses, visit the website https://creativecommons.org/licenses, which outlines various types of licenses that are available. For a better understanding of the choices, see recommendation 2, “Choose Image and Data Licenses That Are Easily Understood,” in part 2 of this guide, “Recommendations for Good Practices” (also posted as separate document at http://americanartcollaborative.org). You may wish to select different licenses for images and data. In some cases, you may even decide to apply different licenses to different images.
Which person in a museum should undertake the mapping of data to LOD?
The staff members involved in mapping your museum’s data to LOD will depend on how your museum is organized and the technical knowledge of the staff. A team effort is ideal. The actual mapping process and use of a tool such as 3M or Karma (http://karma.isi.edu) will require some technical knowledge, which could come from information technology staff. To beneﬁt from the depth of coverage of the CIDOC CRM, curators as well as collection managers and registrars with in-depth knowl- edge of your museum’s data should be involved.
What technical skills are helpful for staff who are publishing and maintaining LOD?
How can my institution begin long-range planning for LOD?
Developing an institutional policy for publishing data and images and choosing a license would help set the stage for your museum’s LOD initiative. A good example can be seen on the website https://thewalters.org/rights-reproductions.aspx for the Walters Museum of Art, Baltimore, Maryland. The museum’s Policy on Digital Images of Collection Objects, states: “The Walters Art Museum believes that digital images of its collection extend the reach of the museum. To facilitate access and usability, and to bring art and people together for enjoyment, discovery, and learning, we choose to make digital images of works believed to be in the public domain avail- able for use without limitation, rights- and royalty-free.”
Planning what data you would like to convert to LOD and reviewing your institution’s data for completeness would be the next logical steps. See recommendation 3, “Plan Your Data Selection,” in “Recommendations for Good Practices” (part 2 of this guide and a separate document at http://americanartcollaborative.org), which covers planning, exporting your data, and legacy data issues.
It would also be useful to educate yourself about LOD by reviewing the educational brieﬁngs, papers, and presentations posted on the AAC website.
What minimum data is needed to begin LOD, and can I add more data in phases?
A museum could choose to begin with the basic descriptions of its objects, some- times called “tombstone data,” and add more data over time. Others may wish to take a project-based approach by using LOD to develop a theme in depth, such as focusing on a certain artist or collection.
If I need to update my data, do I have to remap everything?
If you are using Karma (see http://karma.isi.edu) to map your data, many updates to the data can be made without requiring additional work. The three possible scenarios are:
- The format of your updated data is the same as the data that was published. In other words, you have updates to some of the data values but no additional ﬁelds or other changes. In this case, the Karma model that you built will apply directly to the updated data without any additional
- The format of the data has changed for a data set that was previously mapped. In this case, you will need to load the revised data set into You can apply the earlier Karma model to this data, but the model will need to be updated to reﬂect the changes in the data set. Once this is done, you will need to save the model and apply it to future versions of this data set only if the data values change.
- The format is a new data set that you have not previously modeled in Karma. In this case, you will need to load the new data set into Karma, then construct and save a model of the data
Note that you can save time updating if you prepare a script to automate your work ﬂow for extracting your data before mapping. Scripts would also be helpful for taking the converted data and moving it into the triplestore and other repositories. Scripted methods will minimize the effort a museum must make to incorporate updates into its LOD at routine intervals.
Do I need to link to the Getty vocabularies before I convert our data to LOD?
It is not necessary to link to the Getty vocabularies before converting data to LOD. Should you wish to link to the Getty vocabularies before mapping, please note that the Getty Vocabulary Program’s website contains a quick reference guide. See http:// www.getty.edu/research/tools/vocabularies/vocab_contributions_ﬂier.pdf.
How do I link to the Getty vocabularies once I am using LOD?
The Getty Vocabulary Program is working with AAC to produce an API that will streamline submission of LOD vocabularies and return the URIs or IDs to the contributing institutions for incorporation into their collection information systems (CISs).
What are some of the LOD resources to consider for linking?
The most common LOD resources today are DBpedia, the Getty vocabularies, GeoNames, and Virtual International Authority File (VIAF).
Once I convert my museum’s data to LOD, can I interconnect it with LOD from other museums?
Yes! The “Linked” part of Linked Open Data is a critical component, but it dœs not occur automatically. As web pages link to one another by adding in the links to the HTML, so can LOD link to other LOD resources. For example, you can choose to link your data to related resources as mentioned in the FAQ above.
Will my institution lose its identify or authority over usage of our object information with LOD?
The idea of LOD is to make it widely available so that other institutions, scholars, and the public can connect with it or use it. It is thus important to state clearly your institution’s license conditions, if any. The data published as Linked Data from a museum should use a URI that designates the museum’s identity. Anyone using the data will give authority to the institution that is publishing the data.
Approximately how much time dœs it take to convert museum data to LOD?
The time involved in converting museum data to LOD will largely depend on the ontology or ontologies you choose for mapping, the tool you select for converting the data, who performs the mapping, their familiarity with your data, and their level of expertise in using mapping tools. The Yale Center for British Art estimates that it took about two years to map some ﬁfty thousand objects (paintings, sculp- ture, prints, drawings, watercolors, and frames). The time included the intellectual mapping, writing/code for the script that dœs the transformation, and putting in place the triplestore and various other pieces of their digital infrastructure to share their CRM-based RDF dataset.
In the case of the AAC, the mapping was outsourced and corrected at intervals as needed. The AAC grant ran for eighteen months, during which time the participants attended many educational brieﬁngs, workshops, and in-person meetings and dedicated time as needed for preparing and extracting their data, prooﬁng data, and performing similar tasks. The average time spent by the AAC participants was 169 hours over the duration of the grant.
What resources will I need to host LOD?
As an entry-level approach, a museum could choose to produce JSON-LD and put it in a web server to be served statically or via another software application
that works with that format (the AAC browse demo application uses ElasticSearch). In the case of AAC, the initial recommendation was for each museum either to have its own server that could support a triplestore/SPARQL endpoint or form a hub and share a SPARQL endpoint (see FAQ, “What are a triplestore and SPARQL endpoint?”).
During the production phase of AAC, ISI hosted the data using the SPARQL server Apache Jena Fuseki 2+, which is free; for details, see https://jena.apache.org/documentation/fuseki2/index.html. Note that an institution could also use a cloud-computing solution and would therefore be renting space, not requiring any hardware.
How do options for hosting LOD influence how it is accessed and used?
Once the data is generated, people will want to access the data in several ways. One is via a SPARQL endpoint, through which a developer can run Linked Data queries directly against the LOD graph. This function is useful for semantic developers who want to ask speciﬁc research questions or build live applications against the data, but it is unreliable and expensive to maintain.
Another possibility for hosting and accessing the data is via a dump, providing an easy, bulk download for all the generated triples. This would be useful for researchers or developers who want to build an application or run a research project using large amounts of the data. The bulk download is inexpensive and easy to host, but it limits access to developers who can not only write SPARQL but also set up their own triplestores.
Finally, the data can be presented via an API, so that developers can then use a common set of functions that request and receive responses via Hypertext Transfer Protocol (HTTP) such as GET and POST that provides Linked Data doc- uments (typically, one document per entity), which are speciﬁc, curated subsets of the data. Creating an API is the easiest way for non-semantic developers to work with the data. But each document is only a subset of data, so if the research question or application you want to describe is different from the one that the data curator assumed, the result can be disappointing. The data presented for an entity can also leave out some data that might be connected to the entity in the graph—there is no guarantee that all the triples are available in that speciﬁc view.
What did AAC’s browse demo set out to achieve?
The browse application aimed to provide a way for users—including museum staff, scholars, and eventually public art enthusiasts—to engage with the data from multiple institutions and see beneﬁts that can arise only from the way the data is linked.
The AAC partners wanted an application that allows users to ﬁnd connections between works that come to the surface by the links in the data. The goal is not to search for a speciﬁc item of interest. Rather, it is to move from item to item based on the compelling—and sometimes unexpected—relationships between them.
Serendipity, discovery, and rich relationships were desired to illustrate the value of using a Linked Data approach.
The AAC produced a browse demo in part to help the partner institutions see what was possible given the unique links that could be established across the works of the partner institutions. Owing to limitations in the amounts of data that could be provided, the initial demonstration version dœs not offer all the richness we envision with LOD, but it points the way to future capabilities.
What challenges dœs LOD present in designing a browse function?
The main challenges that arise when creating an application to browse LOD data are the same ones that ensue when producing other types of applications. Is the data you want to use available within the source systems, and dœs it have a sufﬁ- cient quality to be used by computers? Sometimes, computer applications require speciﬁc formats that are different from the original, human-readable formats of the data—they need machine-readable dates, for example, or dimensions in which the numbers and units of measure are in separate ﬁelds. Another challenge is making sure, when there are multiple values for a data item (such as the title of an artwork), that the computer knows which title is the preferred title, to be used by default.
Data in source systems carry a lot of museum-specialized assumptions, rules, and norms that need to be checked by subject specialists in the museum. LOD itself, as a data format, requires an understanding of the technical syntax on the part of the people who work with it. The browse application helps museum staff see their data in a familiar, human-readable way.
The technical environment of the triplestore may not be sufﬁciently robust for use in real time. AAC’s approach was to use the triplestore for complex, specialized analysis and at the same time export the RDF triples through a conversion pro- cess that produced JSON-LD. This allows for simpler processing that supports high-performance, scalable day-to-day use.
As new LOD becomes available, how do I learn about and connect to it?
This “discovery” challenge is a recognized issue within the cultural-heritage com- munity, and standards-based solutions are emerging. In the meantime, the LOD for Libraries, Archives, and Museums (LODLAM) community is an excellent source as new data sets become available.
What do I tell my administration about LOD to convince them to support it?
- The internet is undergoing another revolution, with Linked Data formats increas- ingly being used as the underlying lingua franca for web
- The internet is changing from an internet of documents to the internet of knowledge!
- Until now, when people have searched the internet, they have been presented with an array of hyperlinks to potentially relevant pages. The researcher must review all the links to determine which are relevant to the search at hand.
- A new way to publish information is called Linked Open Data (LOD), which precisely links and interconnects information so that searches are direct, accurate, and immediate. The links contain expressions about why two things are linked, and the meaning of the relationships between them, stated in ways that computers as well as humans can
- LOD uses a markup language called Resource Description Framework (RDF) that, when combined with an ontology, interconnects concepts (people, places, events, and things). The result is that a search connects to the exact concept being sought and avoids the “noise” that sometimes confuses online searching.
- LOD is making headway in the commercial, communications, and publishing worlds. Google, Facebook, the New York Times, US government agencies, Defense Advanced Research Projects Agency (DARPA), and many other institutions are implementing The European Union is building bridges across its libraries, archives, and museums using the digital platform Europeana, the Linked Open Data Initiative of the Europeana Foundation, The Netherlands.
- LOD is here to stay. Its ﬂexible approach to creating meaningful links is the way of the future for data
- The beneﬁts of LOD for museums are huge: LOD could connect data about one artist, for example, across all museums. Millions of people researching that artist would discover who has what art by that artist and where. LOD will increase museum visibility; LOD will reveal relationships among works of art because it will make connections among hundreds of related works; By linking concepts such as events, dates, people, and places across all domains, LOD will expose new information about a work of It will thus boost research that will lead to new discoveries; By its nature, LOD is a collaborative platform that museums can use to deepen audience engagement. Like Wikipedia, LOD provides an opportunity for the public to participate and help supply informa- tion (note that unlike Wikipedia, LOD provides no editing capability, but it is possible to build an application on top of Linked Data that would allow people to contribute suggested changes).
- In summary, LOD leverages the power of digitization.
What are some of the unique features that can be achieved with LOD over traditional research?
We are all looking for the concrete performance of LOD that illustrates its value. As more data becomes available as LOD, it will be possible to demonstrate the beneﬁts of being able to search across several collections; have cross-domain access; create new opportunities that focus on its collaborative structure, and the like. One feature of LOD that has already intrigued art historians is the graphical display of LOD that can point to interconnections of time and place, for example, all the artists associated with a certain café in Paris during a speciﬁc period. The capability of LOD to produce graphical networks or connections can lead to new observations and conclusions.2
What LOD museum projects are under way in the cultural-heritage domain?
In addition to the LOD Initiative of the American Art Collaborative (AAC), some current projects engaged in LOD are the Arachne Project, Cologne; Arches, a col- laboration between the Getty Conservation Institute, Los Angeles, and the World Monuments Fund, New York; Art Tracks, Carnegie Museum of Art, Pittsburgh, Pennsylvania; Canadian Heritage Information Network (CHIN) of the Department of Canadian Heritage; CLAROS (CLassical Art Research Online Research Services), a federation led by the University of Oxford; Finnish National Gallery, Helsinki; Germanische Nationalmuseum, Nuremberg; the Getty Provenance
Index (GPI) and Getty Vocabulary Program, Getty Research Institute, Los Angeles; Pharos, the International Consortium of Photo Archives; ResearchSpace located in the British Museum, London; and Yale Center for British Art, Yale University, New Haven, Connecticut.
How do I join AAC?
The AAC has an ongoing interest in helping the broad museum community engage in LOD. Owing to fixed grant funding, however, we have had to limit the number of institutions involved thus far. The good news is that we are providing guidelines for the approaches, practices, and many of the tools that museums can utilize to produce LOD. We plan to seek additional funding to expand application of LOD beyond the subject of American art and produce tools that will particularly help small museums to explore LOD. As additional museums produce LOD, we hope that those who are implementing it will contact the AAC (via Eleanorfink@earthlink.net) to discuss opportunities to link or interconnect data and further demonstrate the value of LOD.
- Wikipedia, s. , “Ontology (information science),” last modiﬁed October 9, 2017, https://en.wikipedia. org/wiki/Ontology_(information_science).
- Paul B. Jaskot, “Digital Art History: Old Problems, New Debates, and Critical Potentials” (keynote address for symposium Art History in Digital Dimensions, University of Maryland, College Park, and Washington, DC, October 19–21, 2016).