The origins of this misconception are fairly easy to understand. A major focus of Semantic Web technologies is the attempt to make it possible to integrate heterogeneous data across many sources. Furthermore, information in the Semantic Web is identified by means of a URI. In addition, SPARQL—the query language of the Semantic Web—lets developers pick and choose what sources of information should be searched for the answers to a query. Therefore, it is somewhat natural to conclude that a fundamental characteristic of Semantic Web applications is that they access data via federated (or distributed) queries.
In reality, the choice of data technology (i.e. Semantic Web vs. relational vs. something else) and the choice of integration paradigm (i.e., federation/EII vs. warehouse/ETL vs. something in between) are independent. People can (and do) perform federated data access using relational technology. Moreover, people can (and do) build ETL pipelines that populate Semantic Web warehouses.
Generally speaking, a warehouse/ETL approach provides better interactive query performance, eliminates runtime complexity, and guarantees consistency between information from different data sources. A federated query approach, on the other hand, avoids copying any data prematurely and can preserve source data security contexts. In both cases, choosing a Semantic Web data model gives additional flexibility that simplifies the process of extending and refining the integrated data model.
“Semantic Web solutions replace existing systems.”
The Semantic Web technology stack is designed to be non-disruptive. This family of technologies provides the flexibility and expressiveness required to integrate a variety of data from a number of different sources; they’re not designed to replace existing transactional databases, CRM systems, or XML Web Services. Instead, Semantic Web solutions take an overlay approach that virtualizes information from existing (non-semantic) source systems, imports that information into the Semantic Web data model, and then links together information between various connected systems.
To this end, the Semantic Web technology stack includes standards explicitly developed to help map data in legacy systems to RDF:
- R2RML is a markup language that allows you to specify how to map data from a relational database schema to RDF.
- GRDDL is a standard for associating XML documents with transformations that can be automatically run to convert XML into RDF.
“Semantic Web technologies are all about understanding natural language.”
Just as some people mistakenly equate Semantic Web technologies with artificial intelligence, others expect that Semantic Web technologies are all about using text analytics to understand natural language. While a great number of reasons may exist for choosing Semantic Web technologies as a vehicle for implementing NLP solutions, the Semantic Web itself does not deal with unstructured content; instead, it is about representing not only structured data and links but also the meaning of the underlying concepts and relationships. More about the relationship between Semantic Web and natural language can be found in these two Semantic University lessons:
Note: These two articles will be published soon. Stay tuned!
- Semantic Web vs. Semantic Technologies.
- Semantic Web and NLP.
We hope this clears up many of the common misconceptions surrounding Semantic Web technologies.