Tuesday, 30 November 2010

The Semantic Web Technologies - Resource Description Framework (RDF)


While XML allows tagging of data and resources stored on the web in machine readable format, The Resource Definition Framework (RDF) sits on top as a technology employed in creating the semantic meaning to those tags. W3C has proposed RDF is the tool for giving web based resources and the data contained within them the meaning it currently lacks. In other words RDF has been developed to standardise the syntax of the metadata applied to web resources and data. (Chowdhury 2007 p.201)
Forming a common approach to identifying web resources W3C drives the standards for RDF, and it is developing a set of common tools to allow any web content to use the technology. Group also ensure compatibility with current technology (HTML, XHTML, Web Browsers etc) and become a resource for the web community to use freely and ensure conformance with the RDF specifications.
The mission of the RDFa Working Group, part of the Semantic Web Activity is to support the developing use of RDFa for embedding structured data in Web documents in general. The Working Group will publish W3C Recommendations to extend and enhance the currently published RDFa 1.0 documents, including an API. The Working Group will also support the HTML Working Group in its work on incorporating RDFa in HTML5 and XHTML5.
RDF is based on the idea of identifying things using Web identifiers (called Uniform Resource Identifiers, or URIs), and describing resources in terms of simple properties and property values. This enables RDF to represent simple statements about resources as a graph of nodes and arcs representing the resources, and their properties and values.
GG Choudhury describes RDF in simple terms as “The data model for writing simple statements about web resources (Choudhury 2007 p 201)
Universal Resource Identifiers – URIs
Similar to a URL, the Universal Resource Locator used to uniquely identify web pages, the URI is a unique label for the description of the web resource. Using URIs ensures there will be no duplication of descriptions, however, unlike a URL which only refers to a location, it describes either a resource or piece of information on the web, or infact anything in the real world, regardless of location. The specific name for URIs used in RDF is URIref and take a form:
"http" specifies the 'scheme' name,
"en.wikipedia.org" is the 'authority',
"/wiki/URI" the 'path' pointing to this article, and "#Examples_of_URI_references" is a 'fragment' pointing to this section.
RDF Triples
An RDF statement is built up from URIrefs and relies on ‘triples’ of URIrefs, consisting of a subject, an object and a predicate.
The subject: A resource on the internet e.g. a resource about music
             http://christopherdbrook.com/blog/  (my blog)
The Object: Chris Brook, as I am the author of the blog (A jpeg Image of me that is displayed on my blog to show people visiting my blog my ugly mug: could also be an object expressed as a URL
      http://christopherdbrook.com/blog/images/CB.jpeg
The Predicate: the statement to assign the link between the object and the subject Such as creator. In other words we could day:
http://christopherdbrook.com/blog/  (my blog) has a property called creator which has a value = Chris Brook
http://christopherdbrook.com/blog/  (my blog) has a property called language which has a value = en (the standard abbreviation for English)
http://christopherdbrook.com/blog/  (my blog) has a property called creation date which has a value = 20-10-2010
As the concept of the Semantic Web is a Web where machines can read and understand the meaning of all resources contained therin, we have to use a syntax that the W3C has chosen for RDF, which are arranged into statements dubbed ‘triples’:
<http://christopherdbrook.com/blog> 
<http://purl.org/dc/elements/1.1/creator> 
<http://christopherdbrook/blog/images/CB.jpeg>  
 
<http://christopherdbrook.com/blog > 
<http://christopherdbrook.com/blog/creation-date> "August 16, 1999" 
 
<http://christopherdbrook.com/blog > 
<http://purl.org/dc/elements/1.1/language> "en”
 
The http://purl.org/dc/elements  URI is a unique identifier utilising a ‘namespace’
 (a term appended to the URI) which comes from the ‘Dublin Core Metadata Initiative’ (explained below) 
An RDF graph is a graphical representation of these RDF statements, 
the example below depicts three RDF statements about my music blog website:
 






This illustrates that objects in RDF statements may be either URIrefs, or constant values
 (called literals) represented by character strings, in order to represent certain kinds of 
property values. (In the case of the predicate http://purl.org/dc/elements/1.1/language 
the literal is an international standard two-letter code for English.) 
 
Literals may not be used as subjects or predicates in RDF statements. In drawing RDF graphs, 
nodes that are URIrefs are shown as ellipses, while nodes that are literals are shown as boxes. 
(The simple character string literals used in these examples are called plain literals
to distinguish them from the typed literals to be introduced in Section 2.4. The various 
kinds of literals that can be used in RDF statements are defined in [RDF-CONCEPTS]
Both plain and typed literals can contain Unicode [UNICODE] characters, allowing information 
from many languages to be directly represented.)
 
Paraphrased from http://www.w3.org/TR/rdf-primer/ Accessed 30-11-2010
Dublin Core Metadata Initiative
An organisation was set up in 1995 called the Dublin Core Metadata Initiative, so-called following the first meeting of a group in Dublin Ohio. The organisation aimed to standardise the way we describe resources, in the same manner that library catalogues follow a standard formats for describing library resources,
such as MARC 21. MARC 21 provides the protocol by which computers exchange, use, and interpret bibliographic information. Its data elements make up the foundation of most library catalogs used today. http://en.wikipedia.org/wiki/MARC_standards Accessed 28-11-2010)
Dublin Core is an initiative to create a digital "library card catalog" for the Web. Dublin Core is made up of 15 metadata elements that offer expanded cataloging information and improved document indexing for search engine programs. http://searchsoa.techtarget.com/definition/Dublin-Core Accessed 28-11-2010
The Dublin Core prescribes 15 data elements that can be used as containers for metadata. They are designed to be an agreed standard for describing resources , and is described by ISO ISO Standard 15836, and NISO Standard Z39.85-2007
Mandatory elements that are used in DCMI to describe a resource: 
Name: A token appended to the URI of a DCMI namespace to create the URI of the term.  
Label: The human-readable label assigned to the term.
URI: The Uniform Resource Identifier used to uniquely identify a term.
Definition: A statement that represents the concept and essential nature of the term.
Type of Term: The type of term as described in the DCMI Abstract Model [DCAM].
The full 15  fields for describing a resource as set out in DCMI:
  1. Title
  2. Creator
  3. Subject
  4. Description
  5. Publisher
  6. Contributor
  7. Date
  8. Type
  9. Format
  10. Identifier
  11. Source
  12. Language
  13. Relation
  14. Coverage
  15. Rights  
The DCMI Type Vocabulary provides a general, cross-domain list of approved terms that may be used as values for the Resource Type element to identify the genre of a resource. So if we need to apply metadata to an image stored on a website we would look at the following rules and regulations set out under Dublin Core:
Term Name:  Image 
Label: Image Definition: A visual representation other than text. 
Comment: Examples include images and photographs of physical objects, paintings, prints, drawings, other images and graphics, animations and moving pictures, film, diagrams, maps, musical notation. Note that Image may include both electronic and physical representations. 
Type of Term: Class 
In order to represent subjects, objects and predicates as XML documents,  we can use a XML implementation of RDF, which used the DCMI URIrefs as above. To represent the image of me o my blog as an XML RDF statement we could write:

  1. <rdf: RDF
  2. xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#  
  3. xmlns:dcterms=http://purl.org/dc/terms/dcmitype”>  
  4. <rdf:Description rdf:about=http://christopherdbrook.com/blog/cb.jpg>  
  5. <dcterms:image>ChrisBrook</dcterms:image>  
  6. </rdf:Description>  
  7. </rdf:RDF>

In plain english:
Line1 tells the browser it is rdf

Line 2 tells the browser it’s a XML RDF description, 
Line 3 says it uses Dublin Core set of predicates, 
Line 4 describes the subject (the image saved under my blog URL), 
Line 5 expresses the predicate, which in this case is image, described by DCMI as:

“Examples include images and photographs of physical objects, paintings, prints, drawings, other images and graphics, animations and moving pictures, film, diagrams, maps, musical notation.  Note that Image may include both electronic and physical representations.” http://dublincore.org/2010/10/11/dctype.rdf - Image

Lines 6 and 7 Close the RDF/XML and root tags respectively.
Paraphrased DITA Session 08 Lecture Notes Butterworth 2010
The DCMI has essentially provided the rules for defning Objects, Subjects and Predicates through URIrefs. This allows only a discreet number of metadata fields to be defined and as they are URI’s they are all completely unique. 
......phew a little bit heavy, I need a coffee and a fag and I'll come back and explain the next big thing, creating a taxonomy of RDF statements for a certain domain:
The RDF Schema !!
 
 
 
 

No comments:

Post a Comment