DITA Coursework Blog

Sunday, 12 December 2010

Web 2.0 and Web 3.0 - The Semantic Web

Introduction

Before I began the DITA course and heard the term Web 2.0, it implied to me that there was a release of a new version on a specific date. Since reading Tim O’Reilly’s second go at a definition of Web 2.0 (http://radar.oreilly.com/2006/12/web-20-compact-definition-tryi.html Accessed 26-11-2010) and writing my article: http://chrisbrookditablog.blogspot.com/2010/11/dita-session-05-introducing-web-20-and.html (Accessed 13-11-2010), I’ve discovered that Web 2.0 is a genre of websites and technologies which open up the web for creating, publishing, sharing, recycling, re-using, and re-arranging information. Instigated by increases in geographical broadband coverage, bandwith and reduced cost is now being exploited by ‘a set of social, architectural, and design patterns (websites), which say Governor et al (2009), is resulting in a mass migration of business to the internet as a platform.

A definition of Web 3.0 is even more ambiguous, and I’d rather refer to ‘The Semantic Web’. A common thread running through the concepts of digital architectures and the technologies being developed under the labels of Web 2.0 and the Semantic Web, not too dissimilar to my previous blog entry is ‘unstructured vs structured’ representation of information.

Publishing using Wikis

Govenor et al (2009) describe ‘concepts’ which are features of many [Web 2.0] applications and services found today. They are defined as patterns such as ‘Participation – Collaboration’, ‘Collaborative Tagging’, ‘Software as a Service’, ‘Mashup’, and ‘Structured Information’.

Wikipedia (http://chrisbrookditablog.blogspot.com/2010/11/dita-session-05-introducing-web-20-and.html Accessed 13-11-2010) exemplifies a Web 2.0 website, and has become a household name as an on-line Encyclopedia. Although new articles in 2010 are not being published so often, (http://www.time.com/time/magazine/article/0,9171,1924492,00.html#ixzz17AB03UyU Accessed 02-12-2010), collaborative tagging, editors and readers, who also have a ‘power to edit’ are collaborating and discussing articles, adding further sub categories, helping editors delete pages no longer of relevance. Here I see the network effect: the more users collaborate - quality and accuracy of articles improves. “People all over the world who are interested in a certain topic can collaborate asynchronously to create a living, breathing work.” Governor et al 2009 p. 51) However, in my opinion Wikipedia articles are still pieces of unstructured information i.e. not arranged and governed by the strict rules of a relational database.

Wikipedia utilizes XHTML to publish content; an example of the Wikipedia Main Page XHTML source code is here: http://en.wikipedia.org/wiki/File:Wikipedia_main_page_XHTML_source.png

Recommended by W3C as a mark-up language based on HTML and compatible with XML (http://chrisbrookditablog.blogspot.com/2010/11/extensible-markup-language-xml.html Accessed 12-12-2010)

“XHTML consists of all the elements in HTML 4.01, combined with the strict syntax of XML. Today's market consists of different browser technologies, some browsers run on computers, and some browsers run on mobile phones or other small devices. The last-mentioned do not have the resources or power to interpret a "bad" markup language.”

http://www.w3schools.com/xhtml/xhtml_why.asp Accessed 04-12-2010

This technology gave Wikipedia the ability to become even more accessible in 2008 with the launch of the mobile version of the site. Optimised for mobile browsers, none essential elements of the page are stripped out including sidebars and headers, with collapsible sub sections of articles. Cross device compatibility is one of the key components of Web 2.0 applications.

Bespoke ‘Wikis’ offer a way to publish, disseminate and collaborate within a bounded set of users. In my professional experience I helped publish a user manual for an Electronic Document Management System using wiki technology while working as an information specialist as part of the design team on a new metro project. The purpose was to educate the design team how to use the EDRMS for the project, and disseminate new procedures as and when needed. It would have been difficult and time consuming to produce a traditional manual covering all the differing levels of expertise and needs of the users. For example a Project Manager needing to search for a single PDF document does not need to know features a CAD operator needs to know e.g. creating metadata and how to reference sets of model files to create quality assurance compliant CAD drawings.

A ‘bare bones’ manual was created and relied on the participation of the design team members to ‘flesh out’ content. All had the common interest of making the project a success and initially users contributions added a high degree of value through harnessing their collective knowledge. However, after the initial setting up contributions to the wiki dried up. The shift from a top-down editorial approach to a bottom-up approach is a painful reversal for people who expect only expert advice when they look up something.” (Governer et al 2009 p. 51)

Identifying technologies for use in the Information Sciences

Wikipedia is a published set of linked resources, organized visually through a template and through collaboration of the web community, has built up a vast network of articles and subjects. Articles are arranged by subject and interlinked through hyperlinks; categories are organized into a hierarchy (Interestingly Web 3.0 is a category under the subject Web 2.0). If data relating to subjects and categories were marked-up with XML and metadata applied to give context, rather then just XHTML and were referenced using unique reference identifiers that explicitly described meaning, Wikipedia could become a much more powerful resource for information scientist. W3C schools has recommended the technologies and standards that need to be established to facilitate the idea of semantically describing information as ‘objects’, ‘subjects’ and relationships or ‘predicates’ through RDF Resource Description Framework referred to as RDF triples.

“URI’s can identify anything as a resource, the subject of an RDF statement can be a resource, and predicates in RDF statements are always resources. Because URI’s uniquely identify resources (things in the real world) they are considered ‘strong identifiers’. There is no ambiguity about what they represent, and they always represent the same thing, regardless of the context we find them in.” Sergaran et al 2009 p.66) Thus organizing the important facts that have been added, in a rather ad-hoc fashion to subjects in Wikipedia using RDF statements containing URIrefs, which are built from standard terms (an example of which is provided by the Dublin Core Metadata Initiative http://dublincore.org/ Accessed 02-12-2010) the information becomes highly re-usable.

RDF allows properties to be invented independently regardless of the subject domain. It can be converted to XML and using the simple triple makes it simple to identify a resource (Chowdhury 2007 p.203). Graphically representing RDF statements is a powerful way to visualize the interlinking of objects, so simple triples can be aggregated into complex webs of relationships using simple nodes and connectors. I have drawn a simple RDF graph of my music blog, describing the triples for three objects that ‘belong’ to my music blog: http://chrisbrookditablog.blogspot.com/2010/11/semantic-web-technologies-resource.html (Accessed 26-11-2010)

Wikipedia exemplifies the Web 2.0 principle of collaborative tagging, which could be described as user driven Taxonomy of the world’s knowledge. “Web 2.0 is an informal flexible way of integrating disparate web services. It requires less dependence on shared vocabularies and provides workable rather than totally perfect solutions. (Burke 2009), For Information scientists, this gives rise to ambiguity in describing web resources. The Semantic Web standards and technologies being championed by W3C and its partners attempts to remove this ambiguity.

Representing and organizing data using standard metadata built upon an overarching ontology of semantic meaning, real world ontology is applied to web resources in the same manner as librarians do for traditional printed resources using MARC 21. Using XML and RDF to represent resources using standard metadata such as the Dublic Core and URIrefs, creates a library catalogue potentially extending to the whole World Wide Web.

Ontology is developed within subject domains to model real life objects. They apply a set of rules as to how these objects can relate to one another. “An ontology provides a precise vocabulary with which knowledge can be represented, how they can be grouped, and what relationships connect the together.” (Segaran et al 2009 p. 127). This would further help to structure the data held in Wikipedia, and as it is written using XHTML it is directly compatible with XML, thus making it machine readable and machine understandable. If applied correctly, Ontology could be created based a taxonomy provided by resources on Wikipedia. The W3C standard OWL (Web Ontology Langauge http://chrisbrookditablog.blogspot.com/2010/12/semantic-web-technologies-owl.html Accessed 12-12-2010) a mark-up technology for describing ontology allows machines to understand the relationships and hierarchy of subjects and objects.

Information specialists currently have a great vested interest in utilising these for stacked technologies to interpret data in new ways. Simple RDF graphs can be joined through new relationships, which since built on simple ontological rules allow complex reasoning to be performed using many more variables than we may normally consider. Thus URI’s act like primary keys in a relational database.

Projects to create semantic wikis, from either pre-defined or user created ‘folksonomies’ where “some portion of its data in a way that can be queried elsewhere. Typical uses of such data include querying it within the wiki (sometimes using standard query languages like SPARQL), aggregating it in displays like tables, maps and calendars; exporting it via formats like RDF, OWL or CSV; and reasoning with it, to calculate new facts from the given facts. (http://en.wikipedia.org/wiki/Semantic_Wikipedia Accessed 12-12-2010)

Freebase (http://www.freebase.com/home Accessed 02-12-2010) is an example of a semantic wiki, where articles can be built automatically form multiple resources.

Utilising Web 2.0 and Semantic Web Technologies

Exploring the possibilities of utilizing this now semantically structured data such as held in Freebase or the Government’s proposal to publish its datasets in RDF format, offer the possibility to query and mash together information and data in new ways. “Companies and businesses often need to gather data from a range of sources, XML can serve as a uniform data exchange format, and thus can facilitate such gathering, processing, re-use and distribution of data across various applications” Chowdhury.G 2007 p. 164.

The UK Department for Communities an Local Government in conjunction with Local Authorities planning departments developed and rolled out a standard on-line planning application form called 1APP (http://www.planningportal.gov.uk/PpApplications/genpub/en/Ecabinet. Accessed 04-12-2010). From 6 April 2008, The Standard on-line form allowed Local Authorities in England and Wales to receive planning applications digitally. Through the application of an XML schema, details of planning applications could be captured digitally such as applicant details, type of development, number of housing units , floorspace of commercial development and a range of information that can be directly uploaded into bespoke back-office Planning Systems.

This greatly improved efficiency from the old paper based system, eliminating data entry and scanning. Local Authorities tend to use large and complex databases for dealing with planning application and to develop and roll out a web based system for document and event handling would be far too costly. However, the use of the online XML schema has the power to make the data collected re-usable in other applications. Data could be read and fed into other departments systems and used to calculate statistics on housing and commercial development, and identify trends to appraise the success of land use planning policy.

The flexible nature of data that is semantically described leaves information professionals be able to look for new overlaps of information, taking high level ontological rules and attempting to realise the same relationships in multiple datasets. For example the thousands of datasets held by government departments written in RDF. Ontological rules can be applied to Census data, crime data, housing tenure, population projections, ethnic breakdown, socio-economic classifications, which all describe real world objects. Ordnance Survey have also created an ontology to describe geographical entities using OWL-DL, making geographical locations and defined areas explicit to machines through RDF. http://www.ordnancesurvey.co.uk/oswebsite/ontology/Accessed 01-12-2010.

This has implications for information professionals working in policy research for instance being able to write spatial queries to explore questions not normally possible: What is the prevalent socio-economic class of 25-34 year olds of Somali origin who live in areas in the top 10% areas for crime against the person and live in Council rented dwellings? Here, multiple data sets are queried, including the ability to Geocode pieces of data. Instead of copying data sets into GIS systems to query, the work is all done over the web, thus the information specialist is hopefully assured that she is using the most current data.

Linked data and the governments drive to create the ‘Open Data Movement’ are essentially making data available as a commodity that can be taken, manipulated by private enterprise to create more opportunity for information systems development (http://data.gov.uk/ Accessed 10-12-2010) Data from planning applications, linked using RDF graphs representing all manner of data published by the ONS, can provide local government with information which would be invaluable to service planning.

For example in the future land use planning must consider environmental and societal changes such as rising sea levels and over population and guide the development of the infrastructure to support this.

Planning applications give data on the number of houses to be built. The data is also geocoded so it can easily be mapped using OS geography data published in RDF format. Environment agency publishes flood risk zones, again geocoded. Population projections, give official population statistics. Complex statistical queries could be built up from the data represented by RDF statements to determine where development must be directed in the future to mitigate against flooding, and where services will need to be located to cope with the environmental changes we will see in the future. Policy making thus becomes far more efficient. Using RDF graphs and SPARQL to quickly observe relationships in the real world without the need to manually bring together disparate datasets in a GIS system, the web becomes a platform for government to formulate policy based on evidence provided by linked data sets.

Conclusion

Web 2.0 applications make publishing accessible with minimal effort and rely on user defined tags as metadata. The Semantic Web uses marked-up sections of machine readable data found in databases or documents on the web, and describes them through real-world semantic models or ontologies.

“The Semantic Web is about two things. It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents. It is also about language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing." (http://www.w3.org/2001/sw/ Accessed 28-11-2010)

The Semantic Web movement will hopefully lead to more merging of web-based data. Discoveries of new information from existing information will become possible by looking for overlaps between new and existing data and could lead to new advances in science, medicine and improve our general understanding of the real world.

References

http://chrisbrookditablog.blogspot.com/2010/12/web-20-and-web-30-semantic-web.html
(URL for this blog post)

Burke, M., The Semantic Web and the Digital Library Aslib Proceedings 61 (3) 2009

Chowdhury, G. G., Chowdhury S., Organizing information from the shelf to the web 2007 Facet

Nickull, D., Hinchcliffe D., Governor, J. Web 2.0 Architectures 2009, O’Reilly

Sergaran, T., Evans C., Taylor J. Programming the Semantic Web 2009 O’Reilly

http://www.w3.org/2001/sw/ Accessed 28-11-2010

http://en.wikipedia.org/wiki/MARC_standards Accessed 28-11-2010)

http://radar.oreilly.com/2006/12/web-20-compact-definition-tryi.html Accessed 26-11-2010

Wikipedia.org Accessed 04-12-2010).

http://en.wikipedia.org/wiki/File:Wikipedia_main_page_XHTML_source.png Accessed 04-12-2010

http://en.wikipedia.org/wiki/Semantic_Wikipedia Accessed 12-12-2010

http://www.freebase.com/home Accessed 02-12-2010

http://chrisbrookditablog.blogspot.com/2010/11/dita-session-05-introducing-web-20-and.html Accessed 13-11-2010

http://chrisbrookditablog.blogspot.com/2010/11/extensible-markup-language-xml.html Accessed 12-12-2010

http://chrisbrookditablog.blogspot.com/2010/11/semantic-web-technologies-resource.html Accessed 26-11-2010

http://chrisbrookditablog.blogspot.com/2010/12/semantic-web-technologies-owl.html Accessed 12-12-2010

http://data.gov.uk/ Accessed 10-12-2010)

http://dublincore.org/ (Dublin Core Metadata Initiative) Accessed 02-12-2010

http://www.ordnancesurvey.co.uk/oswebsite/ontology/ Accessed 01-12-2010

http://www.planningportal.gov.uk/PpApplications/genpub/en/Ecabinet. Accessed 04-12-2010

Sunday, 5 December 2010

The Semantic Web Technologies - OWL

OWL is a language for processing web information.

What is OWL?

OWL stands for Web Ontology Language
OWL is built on top of RDF
OWL is for processing information on the web
OWL was designed to be interpreted by computers
OWL was not designed for being read by people
OWL is written in XML
OWL has three sublanguages
OWL is a W3C standard

What is Ontology?

Ontology is about the exact description of things and their relationships.
For the web, ontology is about the exact description of web information and relationships between web information.

Why OWL?

OWL is a part of the "Semantic Web Vision" - a future where:

Web information has exact meaning
Web information can be processed by computers
Computers can integrate information from the web

OWL was Designed for Processing Information

OWL was designed to provide a common way to process the content of web information (instead of displaying it).
OWL was designed to be read by computer applications (instead of humans).

OWL is Different from RDF

OWL and RDF are much of the same thing, but OWL is a stronger language with greater machine interpretability than RDF.
OWL comes with a larger vocabulary and stronger syntax than RDF.

OWL Sublanguages

OWL has three sublanguages:

OWL Lite
OWL DL (includes OWL Lite)
OWL Full (includes OWL DL)

OWL is Written in XML

By using XML, OWL information can easily be exchanged between different types of computers using different types of operating system and application languages.

OWL is a Web Standard

OWL became a W3C (World Wide Web Consortium) Recommendation in February 2004.
A W3C Recommendation is understood by the industry and the web community as a web standard. A W3C Recommendation is a stable specification developed by a W3C Working Group and reviewed by the W3C Membership.

http://www.w3schools.com/rdf/rdf_owl.asp Accessed 05-12-2010

In computer science and information science, an ontology is a formal representation of knowledge as a set of concepts within a domain, and the relationships between those concepts. It is used to reason about the entities within that domain, and may be used to describe the domain.
In theory, an ontology is a "formal, explicit specification of a shared conceptualisation".^[1] An ontology provides a shared vocabulary, which can be used to model a domain — that is, the type of objects and/or concepts that exist, and their properties and relations.^[2]
Ontologies are used in artificial intelligence, the Semantic Web, systems engineering, software engineering, biomedical informatics, library science, enterprise bookmarking, and information architecture as a form of knowledge representation about the world or some part of it. The creation of domain ontologies is also fundamental to the definition and use of an enterprise architecture framework.

http://en.wikipedia.org/wiki/Ontology_%28information_science%29 Accessed 05-12-2010

The Semantic Web Technologies - RDFS

RDF describes resources with classes, properties, and values.

In addition, RDF also need a way to define application-specific classes and properties. Application-specific classes and properties must be defined using extensions to RDF.
One such extension is RDF Schema.

RDF Schema does not provide actual application-specific classes and properties. (This is dealt with OWL in the next post)

Instead RDF Schema provides the framework to describe application-specific classes and properties.
Classes in RDF Schema is much like classes in object oriented programming languages. This allows resources to be defined as instances of classes, and subclasses of classes.

E.g.
<?xml version="1.0"?>

<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xml:base="http://www.animals.fake/animals#">

<rdf:Description rdf:ID="animal">
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
</rdf:Description>

<rdf:Description rdf:ID="horse">
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
<rdfs:subClassOf rdf:resource="#animal"/>
</rdf:Description>

</rdf:RDF>

In the example above, the resource "horse" is a subclass of the class "animal".
http://www.w3schools.com/rdf/rdf_schema.asp

The purpose of RDFS is then to relate the categories and their hierachy (taxonomy) into a syntax that allows the elements to be grouped into classes using standard vocabulary.
In terms of distinguishing a taxonomy from an ontology, we can use a defnition from Chowdhury 2007:

Machines not only need to read the correct pieces of data, and through the RDF statements can identify a framework for using common metadata standards, however each subject domain has it’s own structure with categories and sub categories. We call these an example of a taxonomy (much like the Wikepedia example where subjects have various categories, a parent-child relationship). “Taxonomy could take the form of a web directory such as Yahoo, subject heading list (e.g the Library of Congress subject heading list) Chowdhury 2007. The RDFS defines a syntax for the taxonomy, defining the parent-child relationships that exist in that hierarchy. (Pidock 2003)?? Quoted from Chowdhury.

The RDFS vocabulary builds on the limited vocabulary of RDF.

[edit] Classes

rdfs:Resource is the class of everything. All things described by RDF are resources.
rdfs:Class declares a resource as a class for other resources.

A typical example of an rdfs:Class is foaf:Person in the Friend of a Friend (FOAF) vocabulary. An instance of foaf:Person is a resource that is linked to the class foaf:Person using the rdf:type property, such as in the following formal expression of the natural language sentence : 'John is a Person'.
ex:John rdf:type foaf:Person
The definition of rdfs:Class is recursive: rdfs:Class is the rdfs:Class of any rdfs:Class.
The other classes described by the RDF and RDFS specifications are:

rdfs:Literal – literal values such as strings and integers. Property values such as textual strings are examples of RDF literals. Literals may be plain or typed.
rdfs:Datatype – the class of datatypes. rdfs:Datatype is both an instance of and a subclass of rdfs:Class. Each instance of rdfs:Datatype is a subclass of rdfs:Literal.
rdf:XMLLiteral – the class of XML literal values. rdf:XMLLiteral is an instance of rdfs:Datatype (and thus a subclass of rdfs:Literal).
rdf:Property – the class of properties.

[edit] Properties

Properties are instances of the class rdf:Property and describe a relation between subject resources and object resources. When used as such a property is a predicate (see also RDF: reification).

rdfs:domain of an rdf:predicate declares the class of the subject in a triple whose second component is the predicate.
rdfs:range of an rdf:predicate declares the class or datatype of the object in a triple whose second component is the predicate.

For example, the following declarations are used to express that the property ex:employer relates a subject, which is of type foaf:Person, to an object, which is of type foaf:Organization:
ex:employer rdfs:domain foaf:Person
ex:employer rdfs:range foaf:Organization
Given the previous two declarations, the following triple requires that ex:John is necessarily a foaf:Person, and ex:CompanyX is necessarily a foaf:Organization:
ex:John ex:employer ex:CompanyX

rdf:type is a property used to state that a resource is an instance of a class.
rdfs:subClassOf allows to declare hierarchies of classes.

For example, the following declares that 'Every Person is an Agent':
foaf:Person rdfs:subClassOf foaf:Agent
Hierarchies of classes support inheritance of a property domain and range (see definitions in next section) from a class to its subclasses.

rdfs:subPropertyOf is an instance of rdf:Property that is used to state that all resources related by one property are also related by another.
rdfs:label is an instance of rdf:Property that may be used to provide a human-readable version of a resource's name.
rdfs:comment is an instance of rdf:Property that may be used to provide a human-readable description of a resource.

[edit] Utility Properties

rdfs:seeAlso is an instance of rdf:Property that is used to indicate a resource that might provide additional information about the subject resource.
rdfs:isDefinedBy is an instance of rdf:Property that is used to indicate a resource defining the subject resource. This property may be used to indicate an RDF vocabulary in which a resource is described.

[edit] See also

SPARQL Query Language for RDF

http://en.wikipedia.org/wiki/RDF_Schema

Tuesday, 30 November 2010

Web 2.0 - Definition of Open Standards

Open Standards generally have the following characteristics:

· They are not controlled by one private entity, and they can’t be changed at the will of any one entity without an input process that facilitates consideration of the points of view and input of others.

· They’re developed by organisations that are operating with an open and transparent process, allowing stakeholdes to have a say in their development.

· They’re not encumbered by any patents or other intellectual property claims that result in unfair distribution of the ability to implement them, whether for commercial or non commercial purposes.

· They’re designed to benefit the whole community of users rather than one specific subset of users for financial or other gains.

Governor, J., Hinchcliffe, D., Nickull, D. Web 2.0 Architectures, 2009. O'Reilly, Sebastopol CA

The Semantic Web Technologies - Resource Description Framework (RDF)

While XML allows tagging of data and resources stored on the web in machine readable format, The Resource Definition Framework (RDF) sits on top as a technology employed in creating the semantic meaning to those tags. W3C has proposed RDF is the tool for giving web based resources and the data contained within them the meaning it currently lacks. In other words RDF has been developed to standardise the syntax of the metadata applied to web resources and data. (Chowdhury 2007 p.201)

Forming a common approach to identifying web resources W3C drives the standards for RDF, and it is developing a set of common tools to allow any web content to use the technology. Group also ensure compatibility with current technology (HTML, XHTML, Web Browsers etc) and become a resource for the web community to use freely and ensure conformance with the RDF specifications.

The mission of the RDFa Working Group, part of the Semantic Web Activity is to support the developing use of RDFa for embedding structured data in Web documents in general. The Working Group will publish W3C Recommendations to extend and enhance the currently published RDFa 1.0 documents, including an API. The Working Group will also support the HTML Working Group in its work on incorporating RDFa in HTML5 and XHTML5.

http://www.w3.org/2010/02/rdfa/

RDF is based on the idea of identifying things using Web identifiers (called Uniform Resource Identifiers, or URIs), and describing resources in terms of simple properties and property values. This enables RDF to represent simple statements about resources as a graph of nodes and arcs representing the resources, and their properties and values.

http://www.w3.org/TR/rdf-primer/ Accessed 29-11-2010

GG Choudhury describes RDF in simple terms as “The data model for writing simple statements about web resources (Choudhury 2007 p 201)

Universal Resource Identifiers – URIs

Similar to a URL, the Universal Resource Locator used to uniquely identify web pages, the URI is a unique label for the description of the web resource. Using URIs ensures there will be no duplication of descriptions, however, unlike a URL which only refers to a location, it describes either a resource or piece of information on the web, or infact anything in the real world, regardless of location. The specific name for URIs used in RDF is URIref and take a form:

http://en.wikipedia.org/wiki/URI#Examples_of_URI_references

"http" specifies the 'scheme' name,

"en.wikipedia.org" is the 'authority',

"/wiki/URI" the 'path' pointing to this article, and "#Examples_of_URI_references" is a 'fragment' pointing to this section.

http://en.wikipedia.org/wiki/Uniform_Resource_Identifier Accessed 29-11-2010

RDF Triples

An RDF statement is built up from URIrefs and relies on ‘triples’ of URIrefs, consisting of a subject, an object and a predicate.

The subject: A resource on the internet e.g. a resource about music

http://christopherdbrook.com/blog/ (my blog)

The Object: Chris Brook, as I am the author of the blog (A jpeg Image of me that is displayed on my blog to show people visiting my blog my ugly mug: could also be an object expressed as a URL

http://christopherdbrook.com/blog/images/CB.jpeg

The Predicate: the statement to assign the link between the object and the subject Such as creator. In other words we could day:

http://christopherdbrook.com/blog/ (my blog) has a property called creator which has a value = Chris Brook

http://christopherdbrook.com/blog/ (my blog) has a property called language which has a value = en (the standard abbreviation for English)

http://christopherdbrook.com/blog/ (my blog) has a property called creation date which has a value = 20-10-2010

As the concept of the Semantic Web is a Web where machines can read and understand the meaning of all resources contained therin, we have to use a syntax that the W3C has chosen for RDF, which are arranged into statements dubbed ‘triples’:

<http://christopherdbrook.com/blog>

<http://purl.org/dc/elements/1.1/creator>

<http://christopherdbrook/blog/images/CB.jpeg>

<http://christopherdbrook.com/blog >

<http://christopherdbrook.com/blog/creation-date> "August 16, 1999"

<http://christopherdbrook.com/blog >

<http://purl.org/dc/elements/1.1/language> "en”

The http://purl.org/dc/elements  URI is a unique identifier utilising a ‘namespace’

 (a term appended to the URI) which comes from the ‘Dublin Core Metadata Initiative’ (explained below)

An RDF graph is a graphical representation of these RDF statements,

the example below depicts three RDF statements about my music blog website:







This illustrates that objects in RDF statements may be either URIrefs, or constant values
 (called literals) represented by character strings, in order to represent certain kinds of 
property values. (In the case of the predicate http://purl.org/dc/elements/1.1/language 
the literal is an international standard two-letter code for English.) 
 
Literals may not be used as subjects or predicates in RDF statements. In drawing RDF graphs, 
nodes that are URIrefs are shown as ellipses, while nodes that are literals are shown as boxes. 
(The simple character string literals used in these examples are called plain literals, 
to distinguish them from the typed literals to be introduced in Section 2.4. The various 
kinds of literals that can be used in RDF statements are defined in [RDF-CONCEPTS]. 
Both plain and typed literals can contain Unicode [UNICODE] characters, allowing information 
from many languages to be directly represented.)
 
Paraphrased from http://www.w3.org/TR/rdf-primer/ Accessed 30-11-2010 
Dublin Core
Metadata Initiative
An organisation was
set up in 1995 called the Dublin Core
Metadata Initiative, so-called
following the first meeting of a group in Dublin Ohio. The organisation aimed
to standardise the way we describe resources, in the same manner that library
catalogues follow a standard formats for describing library resources,
such as MARC 21. MARC
21 provides the protocol by
which computers exchange,
use, and interpret bibliographic information. Its data elements make up the
foundation of most library
catalogs used today. http://en.wikipedia.org/wiki/MARC_standards
Accessed 28-11-2010) 
Dublin Core is an initiative to create a
digital "library card catalog" for the Web. Dublin Core is made up of
15 metadata elements that offer expanded cataloging information and improved
document indexing for search
engine programs. http://searchsoa.techtarget.com/definition/Dublin-Core
Accessed 28-11-2010
The Dublin Core prescribes 15 data elements
that can be used as containers for metadata. They are designed to be an agreed
standard for describing resources , and is described by ISO ISO
Standard 15836, and NISO
Standard Z39.85-2007 
Mandatory elements that are used in DCMI to
describe a resource: 
Name: A token appended to the URI of a DCMI namespace to create the URI
of the term.  
Label: The human-readable label assigned to the term. 
URI: The Uniform Resource Identifier used to uniquely identify a term. 
Definition: A statement that represents the concept and essential nature of the
term. 
Type of Term: The type of term as described in the DCMI
Abstract Model [DCAM].
The full 15  fields for describing a resource as set out in DCMI:
Title
Creator
Subject
Description
Publisher
Contributor
Date
Type
Format
Identifier
Source
Language
Relation
Coverage
Rights  
http://en.wikipedia.org/wiki/Dublin_Core
Accessed 28-11-2010
If we take field 8. ‘Type’ which is an example of a URIrefs that can be used to identify web
resources by type. A type could be for example an image, such as .jpeg  .png  .gif 
The DCMI Type Vocabulary provides a general, cross-domain list of approved terms
that may be used as values for the Resource Type element to identify the genre
of a resource. So if we need to apply metadata to an image stored on a website
we would look at the following rules and regulations set out under Dublin Core:
Term Name:  Image 
URI: http://purl.org/dc/dcmitype/Image 
Label: Image Definition: A visual
representation other than text. 
Comment: Examples include images and
photographs of physical objects, paintings, prints, drawings, other images and
graphics, animations and moving pictures, film, diagrams, maps, musical
notation. Note that Image may include both electronic and physical
representations. 
Type of Term: Class 
Broader Than: http://purl.org/dc/dcmitype/StillImage 
Broader Than: http://purl.org/dc/dcmitype/MovingImage 
Member Of: http://purl.org/dc/terms/DCMIType 
Version: http://dublincore.org/usage/terms/history/#Image-004 
In order to represent subjects, objects and
predicates as XML documents,  we
can use a XML implementation of RDF, which used the DCMI URIrefs as above. To
represent the image of me o my blog as an XML RDF statement we could write:

<rdf: RDF
xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#
xmlns:dcterms=http://purl.org/dc/terms/dcmitype”>
<rdf:Description rdf:about=http://christopherdbrook.com/blog/cb.jpg>
<dcterms:image>ChrisBrook</dcterms:image>
</rdf:Description>
</rdf:RDF>

In plain english:
Line1 tells the browser it is rdf

Line 2 tells the browser it’s a XML RDF description, Line 3 says it uses Dublin Core set of predicates, 
Line 4 describes the subject (the image
saved under my blog URL), 
Line 5 expresses the predicate, which in this case is
image, described by DCMI as:

“Examples include images and photographs of physical objects, paintings, prints, drawings, other images and graphics, animations and moving pictures, film, diagrams, maps, musical notation. Note that Image may include both electronic and physical representations.” http://dublincore.org/2010/10/11/dctype.rdf - Image

Lines 6 and 7 Close the RDF/XML and root tags respectively.

Paraphrased DITA Session 08
Lecture Notes Butterworth 2010 
The DCMI has essentially provided the rules for defning Objects,
Subjects and Predicates through URIrefs. This allows only a discreet number of
metadata fields to be defined and as they are URI’s they are all completely
unique. 
......phew a little bit heavy, I need a coffee and a fag and I'll come back and explain
the next big thing, creating a taxonomy of RDF statements for a certain domain:
The RDF Schema !!

DITA Session 08 - The Semantic Web - Why Semantics?

The Semantic Web as a concept is simple by definition; to make the resources and data accessible and understandable to computers within the context they reside, but inherently complex as we are attempting to ask computers to define the entire knowledge held on the web. In the real world this is something our human brains do on a daily basis. We take simple pieces of information, such as Tom hearing Suey and I talking in the pub about music and how I am loving the Sex Pistols, and Suey says he wouldn't like to meet Sid Vicious down a dark alley. Tom may infer:

Chris likes Punk Rock Music

Punk Rock Music scares Suey

Through our ability to gain knowledge and learn what the words mean: Tom knows that Chris and Suey are people, ‘likes’ and ‘scares’ are verbs, ‘Punk Rock Music’ is a particular type of music (that is not to everyone’s taste !) A moderately educated person like Tom can understand what these terms mean, and understand the construct of the sentence. Thus through logical reasoning Tom can answer questions like: ‘Who likes Punk Rock Music?’

If we really don’t understand the meaning of words we have tools to help us. We can look up the word in a dictionary, understand the definition through the fact that these new words are described by other terms we do understand. We have supplemented our knowledge and can then go on to interpret further meanings in the future. Meaning is derived through understanding a sequence of symbols, e.g. the example above is an English language grammatical structure in the form: “subject-verb-object.”

We look for the meanings through the structure and placement of words in sentences, which in turn give us context. Words often have several meanings and thus several definitions dependant on their context. Following hyperlinks in an online dictionary from the word ‘semantic.’ E.g. ‘relating’ and we will be lead to another definition. We could go on and on thus we can say a dictionary is ‘an ontology’ of language, in other words it is ‘self referencing.’ http://dictionary.reference.com/ (definition 1 below)

‘Semantic’ adj 1. of or relating to meaning or arising from distinctions between the meanings of different words or symbols

‘relating’–verb (used with object)

1. to tell; give an account of (an event, circumstance, etc.).

2. to bring into or establish association, connection, or relation: to relate events to probable causes.

–verb (used without object)

3. to have reference (often fol. by to ).

4. to have some relation (often fol. by to ).

5. to establish a social or sympathetic relationship with a person or thing: two sisters unable to relate to each other.

http://dictionary.reference.com/browse/semantic Accessed 28-11-2010

If we then use the same principle and apply it to data and documents stored on the web, it would stand to reason to want to establish connections between data to give it meaning. The creator and the consumer of that data could agree the meaning through reference to the XML schema in place, but would we want to do that with every piece of data on the web? In conversation we would have to set the rules each time we met someone new. Moreover, computers cannot gain knowledge about real world objects in the same way the human brain…..or can they?

Establishing objects and their relationships to subjects, we could say, is a one to one relationship. W3C schools has been working on the technologies and standards that need to be established to facilitate the idea of semantically describing ‘objects’, ‘subjects’ and relationships between them for data stored on web pages or in databases.

The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF). See also the separate FAQ for further information.

The Semantic Web is about two things. It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents. It is also about language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing.

Quoted from: http://www.w3.org/2001/sw/ Accessed 28-11-2010

We may type the word ‘apple’ into a search engine, do we want to look for a picture of an apple or the computer manufacturing giant ‘apple’ computers: Granny Smith or MacBook Pro ?; type ‘Orange’ into Google, we may get a picture of a juicy fruity orange coloured fruit or the home page of the phone company ‘Orange’ (Seville, Mandarin or 12 month Care Protection Plan for an I-phone 4). Matching text strings has been the traditional mechanism to recover related material, however, taken out of context, these words have different meaning dependent on the context within which they are used.

For many years now we have been hearing that the semantic web is just around the corner. In 2008 Tim Berners-Lee declared the semantic web "open for business" (Paul Miller, 2008). The reality for most libraries, however, is that we are still grappling with 2.0 technologies. Few among us have yet embraced web 3.0, also known as the web of linked data, or the semantic web. The promise of the semantic web is a dazzling one. By marking up information in standardized, highly structured formats like Resource Description Framework (RDF), we can allow computers to better "understand" the meaning of content, rather than simply matching on strings of text. This would allow web search engines to function more like relational databases, providing much more accurate search results - the ability to distinguish between a book that is written about a person, as opposed to a book that is written by a person, for example. For most librarians this concept is fairly easy to understand. We have been creating highly structured machine-readable metadata for many years, after all, and we already understand the benefits.
The second part of the linked data vision is where things really begin to get heady. By linking our data to shared ontologies that describe the properties and relationships of objects, we begin to allow computers not just to "understand" content, but also to derive new knowledge by "reasoning" about that content. As a simple example: Shakespeare wrote Macbeth. "Wrote" is the inverse of "WrittenBy" therefore Macbeth was written by Shakespeare. The real power of the semantic web lies in this ability for "intelligent" search engines to disambiguate terms (Apple the computer vs. apple the fruit, for example), to understand the relationships between different entities, and to bring that information together in new ways to answer queries. E.g., Show me all of the articles that have been written by people who have ever worked at any of the same institutions as Lisa Goddard.

http://www.dlib.org/dlib/november10/byrne/11byrne.html Accessed 22-11-2010

Introducing Appropriate Technologies to Enable The Semantic Web

XML has developed as a markup language to define elements of data, and allow sharing of data between applications by giving it a user definded tag, e.g in a music database using XML tags to label elements such as.

<name>Michael Jackson</name>

However in the global sense of the web, machines do not necessarily know that <name> relates to the names of a music artist, it could perhaps be used in another database as <name>Colorado</name> to define a place name.

<TrackName> defined using an XML tag could be ambiguous, the name of a racing track in motrosport ? How would a computer know ?

Thus the tags themselves need metadata attaching so as to define what each piece of data specifically means to in that particular context. Machines not only need to read the correct pieces of data, but understand it. Removing any ambiguity as to a piece of information, meaning, is something we have done with in spoken and written language. i.e. not have to guess.

XML is now an established tool allowing machine readable data to be passed between web applications, XML schemas define the structure of the XML document, while XML parsers read the data and displayed in other websites (RSS feeds read information written in XML for instance the BBC weather RSS feed).

The subject now gets rather involved and in the next of 3 posts I will attempt to summarise the main technologies of the Semantic Web. These underlying technologies, being developed under the direction of W3C, are tools that sit on top of XML in a 'stack' and allow data on the web to be semantically described:

RDF - Resource Defnition Framework
URIref - Universal Resource Identifier reference
RDFS - RDF Schema
OWL - Web Ontology Language

Wednesday, 24 November 2010

Extensible Markup Language - XML Introduction

Web Services and APIs allow machines to read data over the internet and one of the most prominent ways to achieve this is by using XML (eXtensible Markup Language), not strictly a language, however similarly to HTML, allows information stored on the web to be described in such a way that computers know what that information referes to. It is reffered to as self describing as the markup of data is done to make it possible for humans to defnine and computers to understand, read and pass around.
HTML allows text elements to be marked up for formatting, or hyperlinks to be defined, CSS controls the look and style of web pages, but neither can help to provide structure to the so it can be used effectively by computers. Computers cannot guess what we mean when we add a title to a photograph, or names and addresses in an address book.

Background
Originally markup was performed on information on the web using SGML - Standard Generalised Mark-up Language.

It was complex, difficult to master, and had a limited (and often expensive toolset). To give the web the power of SGML without the complication, a W3C working group set out to simplify SGML. In 1998 it produced Extensible Markup Language. While XML was originally meant to be a replacement for (or at least a suppliment to) HTML as hypertext on the web, it settled instead into a role of a format for exchanging data between programs and became a key component of web services.

Governer et al (2009 p. 23)

Data stored on the web is said to be unstructured, that is to say, HTML web pages, which markup up elements of text to define hyperlinks, images, and text, and later came CSS, which sits as a separate file, called by the browser to desccirbe the look and layout of the web page. There would be no way to know exactly what the different sections or pieces of information on that web page meant unless we were expressley told.

For example, we create at a web page with photos of our holiday with labels of where they were taken using HTML tag: <img src="Big Ben.gif" alt="Big Ben London" />
, for example and then use an API such as google maps to place a marker of Big ben in London, there would be little for a computer reading the tag to know that Big Ben was a landmark in London.A guess would most likely fail.

Structuring a document as an XML file allows the parts of a web page (e.g. a caption of a photograph, place where it was taken, and the date) to be described semantically, using a syntax, which can be written by the author and understood by a computer wishing to access that data remotely and pass it around on the web.

XML Structure and Syntax

XML allows elements to be defined using the <> brackets, within which we use tags of our own choosing, and / or attributes to describe the an object. So using the example of the photograph above we could create an XML document as follows (note tags MUST be closed in XML, there is no forgiveness as with HTML:

<?xml version="1.0" encoding="UTF-8"?>

<Photos>
 <name>Big Ben</name>
 <location>London</location>
 <date>01-04-2010</date>
</Photos>

The first line is an optional declaration, put there to describe that this is and XML document and the version of XML. Additionally character encoding can be specified (e.g.UTF-8) The elements desrcibed must be inside the root elements, e.g. <Photos>. An attribute could be added, so instead of <date>01-04-2010</date> we could use <date ="01-04-2010"> at the beginning, but would have to close the the tag at the end with </date>. To describe what the tags in XML mean, in order to remove any ambiguity, an XML schema or Docuemnt Type Definition (DTD) is used (more later on XML Schemas).
Additionally a reference, can add additional mark-up with XML documents, allowing the inclusion of additional text or markup. References always begin with the character “&” (which is specially reserved) and end with the character “;” e.g.

" whcih allows the ' ' character to be used without causing a conflict in the syntax.

Paraphrased from http://www.xmlnews.org/docs/xml-basics.html Accessed 24-11-2010

The benefits of using XML

XML can keep data separated from your HTML
XML can be used to store data inside HTML documents
XML can be used as a format to exchange information
XML can be used to store data in files or in databases

Additonally XML works on any platform, removing compatibility issues. It is free (and works with all browsers, albeit with slight differences in debugging with IE) and supported extensively through forums, tutorials and by the W3C - so help can be sought in the wider web community (a nice web 2.0 concept). It is part of a family of standards that are built on XML, e.g dialects of XML such as XHTML which allow further control and functionality to control how XML documents can be used.

For example XSL is the advanced language for expressing style sheets. (http://www.w3.org/XML/1999/XML-in-10-points Accessed 24-11-2010)

W3C's Resource Description Framework (RDF) is an XML text format that supports resource description and metadata applications, such as music playlists, photo collections, and bibliographies. For example, RDF might let you identify people in a Web photo album using information from a personal contact list; then your mail client could automatically start a message to those people stating that their photos are on the Web. Just as HTML integrated documents, images, menu systems, and forms applications to launch the original Web, RDF provides tools to integrate even more, to make the Web a little bit more into a Semantic Web.

http://www.w3.org/XML/1999/XML-in-10-points Accessed 24-11-2010