Tuesday, 30 November 2010

DITA Session 08 - The Semantic Web - Why Semantics?


The Semantic Web as a concept is simple by definition; to make the resources and data accessible and understandable to computers within the context they reside, but inherently complex as we are attempting to ask computers to define the entire knowledge held on the web. In the real world this is something our human brains do on a daily basis. We take simple pieces of information, such as  Tom hearing Suey and I talking in the pub about music and how I am loving the Sex Pistols, and Suey says he wouldn't like to meet Sid Vicious down a dark alley. Tom may infer:
Chris likes Punk Rock Music
Punk Rock Music scares Suey
Through our ability to gain knowledge and learn what the words mean: Tom knows that Chris and Suey are people, ‘likes’ and ‘scares’ are verbs, ‘Punk Rock Music’ is a particular type of music (that is not to everyone’s taste !) A moderately educated person like Tom can understand what these terms mean, and understand the construct of the sentence. Thus through logical reasoning Tom can answer questions like: ‘Who likes Punk Rock Music?’    
If we really don’t understand the meaning of words we have tools to help us. We can look up the word in a dictionary, understand the definition through the fact that these new words are described by other terms we do understand. We have supplemented our knowledge and can then go on to interpret further meanings in the future. Meaning is derived through understanding a sequence of symbols, e.g. the example above is an English language grammatical structure in the form: “subject-verb-object.”
We look for the meanings through the structure and placement of words in sentences, which in turn give us context. Words often have several meanings and thus several definitions dependant on their context. Following hyperlinks in an online dictionary from the word ‘semantic.’ E.g. ‘relating’ and we will be lead to another definition. We could go on and on thus we can say a dictionary is ‘an ontology’ of language, in other words it is ‘self referencing.’  http://dictionary.reference.com/ (definition 1 below)
‘Semantic’ adj 1. of or relating to meaning or arising from distinctions between the meanings of different words or symbols
‘relating’–verb (used with object)
1. to tell; give an account of (an event, circumstance, etc.).
2. to bring into or establish association, connection, or relation: to relate events to probable causes.
–verb (used without object)
3. to have reference (often fol. by to ).
4. to have some relation (often fol. by to ).
5. to establish a social or sympathetic relationship with a person or thing: two sisters unable to relate to each other.
If we then use the same principle and apply it to data and documents stored on the web, it would stand to reason to want to establish connections between data to give it meaning. The creator and the consumer of that data could agree the meaning through reference to the XML schema in place, but would we want to do that with every piece of data on the web? In conversation we would have to set the rules each time we met someone new. Moreover, computers cannot gain knowledge about real world objects in the same way the human brain…..or can they?
Establishing objects and their relationships to subjects, we could say, is a one to one relationship. W3C schools has been working on the technologies and standards that need to be established to facilitate the idea of semantically describing ‘objects’, ‘subjects’ and relationships between them for data stored on web pages or in databases.
The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF). See also the separate FAQ for further information.
The Semantic Web is about two things. It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents. It is also about language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing.
Quoted from: http://www.w3.org/2001/sw/ Accessed 28-11-2010
We may type the word ‘apple’ into a search engine, do we want to look for a picture of an apple or the computer manufacturing giant ‘apple’ computers: Granny Smith or MacBook Pro ?; type ‘Orange’ into Google, we may get a picture of a juicy fruity orange coloured fruit or the home page of the phone company ‘Orange’ (Seville, Mandarin or 12 month Care Protection Plan for an I-phone 4). Matching text strings has been the traditional mechanism to recover related material, however, taken out of context, these words have different meaning dependent on the context within which they are used.
For many years now we have been hearing that the semantic web is just around the corner. In 2008 Tim Berners-Lee declared the semantic web "open for business" (Paul Miller, 2008). The reality for most libraries, however, is that we are still grappling with 2.0 technologies. Few among us have yet embraced web 3.0, also known as the web of linked data, or the semantic web. The promise of the semantic web is a dazzling one. By marking up information in standardized, highly structured formats like Resource Description Framework (RDF), we can allow computers to better "understand" the meaning of content, rather than simply matching on strings of text. This would allow web search engines to function more like relational databases, providing much more accurate search results - the ability to distinguish between a book that is written about a person, as opposed to a book that is written by a person, for example. For most librarians this concept is fairly easy to understand. We have been creating highly structured machine-readable metadata for many years, after all, and we already understand the benefits.
The second part of the linked data vision is where things really begin to get heady. By linking our data to shared ontologies that describe the properties and relationships of objects, we begin to allow computers not just to "understand" content, but also to derive new knowledge by "reasoning" about that content. As a simple example: Shakespeare wrote Macbeth. "Wrote" is the inverse of "WrittenBy" therefore Macbeth was written by Shakespeare. The real power of the semantic web lies in this ability for "intelligent" search engines to disambiguate terms (Apple the computer vs. apple the fruit, for example), to understand the relationships between different entities, and to bring that information together in new ways to answer queries. E.g., Show me all of the articles that have been written by people who have ever worked at any of the same institutions as Lisa Goddard.
Introducing Appropriate Technologies to Enable The Semantic Web
XML has developed as a markup language to define elements of data, and allow sharing of data between applications by giving it a user definded tag, e.g in a music database using XML tags to label elements such as.
<name>Michael Jackson</name>
<TrackName>Bad</TrackName>
<Year>1987</year>
However in the global sense of the web, machines do not necessarily know that <name> relates to the names of a music artist, it could perhaps be used in another database as <name>Colorado</name> to define a place name.
<TrackName> defined using an XML tag could be ambiguous, the name of a racing track in motrosport ? How would a computer know ? 
Thus the tags themselves need metadata attaching so as to define what each piece of data specifically means to in that particular context. Machines not only need to read the correct pieces of data, but understand it. Removing any ambiguity as to a piece of information, meaning, is something we have done with in spoken and written language. i.e. not have to guess. 
XML is now an established tool allowing machine readable data to be passed between web applications, XML schemas define the structure of the XML document, while XML parsers read the data and displayed in other websites (RSS feeds read information written in XML for instance the BBC weather RSS feed). 
The subject now gets rather involved and in the next of 3 posts I will attempt to summarise the main technologies of the Semantic Web. These underlying technologies, being developed under the direction of W3C, are tools that sit on top of XML in a 'stack' and allow data on the web to be semantically described:
  • RDF - Resource Defnition Framework
  • URIref - Universal Resource Identifier reference
  • RDFS - RDF Schema
  • OWL - Web Ontology Language

No comments:

Post a Comment