By Michael Cataldo, CEO, Cambridge Semantics

There is a revolution occurring in the Internet space that has been labeled Web 3.0 and is also known as “the Semantic Web.” As with the original internet revolution, there are significant enterprise implications for this technology, and innovators across industries, from banking to bio-pharma, are starting to figure out ways to use it to their advantage. Semantic Technology is an entirely new method for accessing, combining, using and sharing data across disparate information sources. This new method is embodied in a set of standards developed by the W3C, which recently have matured to the point where they can be put into practice. Sir Tim Berners-Lee - the inventor of the Web - has identified semantics as the key technology for the next generation of the Internet.

How it Works
The semantic method is based on establishing connections between data stored in source systems (with varied structures and terminologies) to concepts and contextual information that describe that data, regardless of where it lives. The data itself is just an instance of a concept. A set of concepts describing a certain category of data is known as an ontology. For example, a sub-prime mortgage is a loan that can be described with properties such as the loan amount, interest rate, borrower's income, etc. This becomes a “concept” within an ontology called “sub-prime mortgage.” Users or machines can then find loans with these properties by simply asking for “sub-prime mortgages,” even though none of the loans are known by that name in any of the source systems.

The real power of Semantics lies in the fact that once this concept layer is created, data can be discovered and combined regardless of the data structure of the source system. This enables real-time integration at the point in time when the user or machine demands it. Imagine the implications for a technology where users can decide what data they need to combine and it just happens without having to even consult with IT.

A Flexible, Self-Describing Data Model
Adding new data is as easy as combining existing data. The Semantic method for representing data, Resource Description Framework (RDF), is schema-less, meaning that it can accommodate new data as soon as it is described to the system. To add a new type of data, one simply adds a concept to the ontology and links the data to the concept and it is immediately part of the model and discoverable by anyone who needs it. Not only can users combine data at will, but they can also add data as it becomes available, all without having to wait for IT to develop a data warehouse of an application specific to that problem.

Available Technology & Products
The semantic standards provide a framework for a new paradigm, but that is where they stop. The underlying technology required to make it work is extremely complex and, as with any new technology paradigm, tools for developing semantic solutions have been limited to proof-of- concept tools or hand-coded solutions that required months or years and millions of dollars to develop. In many ways, this defeated the purpose of having these advanced technologies to begin with. More advanced tool kits are now becoming available that enable anyone with the technical skills of an Excel power user to rapidly develop and deploy enterprise-scale solutions. Innovators and early adopters of technology are benefitting from these tools today. At the current rate with which they are evolving, you can expect to see semantic technologies being deployed more broadly in the very near future.

Tapping Technology's Potential
Two years ago, researchers at a major U.S. drug maker saw the potential for semantic technologies in facilitating the collection of assay results from independent research partners across the globe. Each researcher ran assays and delivered the results via email in an Excel format. The problem was that each format was different and all of the results needed to be uploaded into a legacy research database that had its own data model. To accomplish this, the IT organization created data entry templates to receive the assay results and upload them to the research database. As the results arrived via email, highly compensated scientists would cut and paste the results into corresponding templates. It was necessary to use scientists in order to make sure that the incoming data was understood and entered properly. The process was time-consuming, error-prone and expensive. This was a perfect application for Semantics given the variation in data models and terminologies across the thousands of results spreadsheets and the target research database.

The solution involved creating an ontology that embodied the concepts contained within the research database. From there the Anzo for Excel tool from Cambridge Semantics was used to create templates that could overlay the assay result spreadsheets. The semantically linked templates automatically associated ranges in the spreadsheets to the results ontology. The result (no pun intended) was that assay data flowed into the linked spreadsheets, and from there, automatically into the legacy research database, without human intervention. Semantic technologies provided a concept layer that lived between all the data sources and destinations, bridging the gap between variations in data structure, but more importantly giving meaning to the data that stayed with it as it traveled from source to destination. This removed the need for scientists to interpret the data and allowed them to do what they were actually paid to do, research.

The same company has now applied identical technology to assist scouts in the field as they look to license technologies and develop partnerships. The program is aimed at linking data sources from Excel spreadsheets to relational databases to provide real-time, role-specific views for scouts as they do their jobs. Scouts often need to make decisions in real-time that require combinations of data that cannot always be anticipated in advance. Moreover, they collect new data on the fly that must easily be absorbed into the corporate knowledge base.

Prior to semantic technologies, the knowledge was transferred in disjointed processes facilitated by phone calls and emails. Valuable knowledge was lost either because it was not captured in the process or because as scouts left the company, the knowledge left with them. With Semantic technologies, a fabric exists that can easily allow the scouts to create views combining the data they need, when they need it. Because semantic data can be self-describing, the system learns how the scouts use the data and makes that knowledge available to others down the line. Lastly, the system allows new data to be added and stored as it becomes available so nothing is lost to disconnected emails, phone conversations or attrition.

Semantic technologies offer the potential to rise above the traditional barriers to collaboration around data and answer critical questions - quickly. As semantic technologies become more accepted, standard ontologies will evolve and make it even easier to link data not only within a business enterprise, but across enterprises.

Tip of the Iceberg
A semantic implementation effectively creates a fabric to which all enterprise data can be connected. One benefit of this is the ability to discover provenance and history. As data is added to or changed, it is now possible to see who changed it, how it was changed and even recall the original version, potentially a boon for everything from research to cost containment. Imagine being able to see any relevant research, where and when it was done, by whom, as well as the associated results. From a research perspective, provenance and history are particularly valuable for validation and better understanding of data.

Engines that apply rules and that can infer additional data from them are enabled by semantics because of the way the data is organized, helping facilitate research and avoid problems before they occur.

For the complex world of bio-pharma research and discovery, semantic technologies offer the promise of solutions to numerous problems, and provide a level of visibility and access to data that can expose and mitigate risk in ways that just aren't practical with traditional database technologies.

The future for this technology is bright, and the possibilities are virtually endless. It is impossible to say where the power of semantic technology will eventually be leveraged. But there is no question that it will be, and the impact is sure to be great.

Mr. Cataldo is CEO of Cambridge Semantics, a Boston-based provider of semantic technology applications for the enterprise.