DITA Coursework Blog: November 2010

Tuesday, 30 November 2010

Web 2.0 - Definition of Open Standards

Open Standards generally have the following characteristics:

· They are not controlled by one private entity, and they can’t be changed at the will of any one entity without an input process that facilitates consideration of the points of view and input of others.

· They’re developed by organisations that are operating with an open and transparent process, allowing stakeholdes to have a say in their development.

· They’re not encumbered by any patents or other intellectual property claims that result in unfair distribution of the ability to implement them, whether for commercial or non commercial purposes.

· They’re designed to benefit the whole community of users rather than one specific subset of users for financial or other gains.

Governor, J., Hinchcliffe, D., Nickull, D. Web 2.0 Architectures, 2009. O'Reilly, Sebastopol CA

The Semantic Web Technologies - Resource Description Framework (RDF)

While XML allows tagging of data and resources stored on the web in machine readable format, The Resource Definition Framework (RDF) sits on top as a technology employed in creating the semantic meaning to those tags. W3C has proposed RDF is the tool for giving web based resources and the data contained within them the meaning it currently lacks. In other words RDF has been developed to standardise the syntax of the metadata applied to web resources and data. (Chowdhury 2007 p.201)

Forming a common approach to identifying web resources W3C drives the standards for RDF, and it is developing a set of common tools to allow any web content to use the technology. Group also ensure compatibility with current technology (HTML, XHTML, Web Browsers etc) and become a resource for the web community to use freely and ensure conformance with the RDF specifications.

The mission of the RDFa Working Group, part of the Semantic Web Activity is to support the developing use of RDFa for embedding structured data in Web documents in general. The Working Group will publish W3C Recommendations to extend and enhance the currently published RDFa 1.0 documents, including an API. The Working Group will also support the HTML Working Group in its work on incorporating RDFa in HTML5 and XHTML5.

http://www.w3.org/2010/02/rdfa/

RDF is based on the idea of identifying things using Web identifiers (called Uniform Resource Identifiers, or URIs), and describing resources in terms of simple properties and property values. This enables RDF to represent simple statements about resources as a graph of nodes and arcs representing the resources, and their properties and values.

http://www.w3.org/TR/rdf-primer/ Accessed 29-11-2010

GG Choudhury describes RDF in simple terms as “The data model for writing simple statements about web resources (Choudhury 2007 p 201)

Universal Resource Identifiers – URIs

Similar to a URL, the Universal Resource Locator used to uniquely identify web pages, the URI is a unique label for the description of the web resource. Using URIs ensures there will be no duplication of descriptions, however, unlike a URL which only refers to a location, it describes either a resource or piece of information on the web, or infact anything in the real world, regardless of location. The specific name for URIs used in RDF is URIref and take a form:

http://en.wikipedia.org/wiki/URI#Examples_of_URI_references

"http" specifies the 'scheme' name,

"en.wikipedia.org" is the 'authority',

"/wiki/URI" the 'path' pointing to this article, and "#Examples_of_URI_references" is a 'fragment' pointing to this section.

http://en.wikipedia.org/wiki/Uniform_Resource_Identifier Accessed 29-11-2010

RDF Triples

An RDF statement is built up from URIrefs and relies on ‘triples’ of URIrefs, consisting of a subject, an object and a predicate.

The subject: A resource on the internet e.g. a resource about music

http://christopherdbrook.com/blog/ (my blog)

The Object: Chris Brook, as I am the author of the blog (A jpeg Image of me that is displayed on my blog to show people visiting my blog my ugly mug: could also be an object expressed as a URL

http://christopherdbrook.com/blog/images/CB.jpeg

The Predicate: the statement to assign the link between the object and the subject Such as creator. In other words we could day:

http://christopherdbrook.com/blog/ (my blog) has a property called creator which has a value = Chris Brook

http://christopherdbrook.com/blog/ (my blog) has a property called language which has a value = en (the standard abbreviation for English)

http://christopherdbrook.com/blog/ (my blog) has a property called creation date which has a value = 20-10-2010

As the concept of the Semantic Web is a Web where machines can read and understand the meaning of all resources contained therin, we have to use a syntax that the W3C has chosen for RDF, which are arranged into statements dubbed ‘triples’:

<http://christopherdbrook.com/blog>

<http://purl.org/dc/elements/1.1/creator>

<http://christopherdbrook/blog/images/CB.jpeg>

<http://christopherdbrook.com/blog >

<http://christopherdbrook.com/blog/creation-date> "August 16, 1999"

<http://christopherdbrook.com/blog >

<http://purl.org/dc/elements/1.1/language> "en”

The http://purl.org/dc/elements  URI is a unique identifier utilising a ‘namespace’

 (a term appended to the URI) which comes from the ‘Dublin Core Metadata Initiative’ (explained below)

An RDF graph is a graphical representation of these RDF statements,

the example below depicts three RDF statements about my music blog website:







This illustrates that objects in RDF statements may be either URIrefs, or constant values
 (called literals) represented by character strings, in order to represent certain kinds of 
property values. (In the case of the predicate http://purl.org/dc/elements/1.1/language 
the literal is an international standard two-letter code for English.) 
 
Literals may not be used as subjects or predicates in RDF statements. In drawing RDF graphs, 
nodes that are URIrefs are shown as ellipses, while nodes that are literals are shown as boxes. 
(The simple character string literals used in these examples are called plain literals, 
to distinguish them from the typed literals to be introduced in Section 2.4. The various 
kinds of literals that can be used in RDF statements are defined in [RDF-CONCEPTS]. 
Both plain and typed literals can contain Unicode [UNICODE] characters, allowing information 
from many languages to be directly represented.)
 
Paraphrased from http://www.w3.org/TR/rdf-primer/ Accessed 30-11-2010 
Dublin Core
Metadata Initiative
An organisation was
set up in 1995 called the Dublin Core
Metadata Initiative, so-called
following the first meeting of a group in Dublin Ohio. The organisation aimed
to standardise the way we describe resources, in the same manner that library
catalogues follow a standard formats for describing library resources,
such as MARC 21. MARC
21 provides the protocol by
which computers exchange,
use, and interpret bibliographic information. Its data elements make up the
foundation of most library
catalogs used today. http://en.wikipedia.org/wiki/MARC_standards
Accessed 28-11-2010) 
Dublin Core is an initiative to create a
digital "library card catalog" for the Web. Dublin Core is made up of
15 metadata elements that offer expanded cataloging information and improved
document indexing for search
engine programs. http://searchsoa.techtarget.com/definition/Dublin-Core
Accessed 28-11-2010
The Dublin Core prescribes 15 data elements
that can be used as containers for metadata. They are designed to be an agreed
standard for describing resources , and is described by ISO ISO
Standard 15836, and NISO
Standard Z39.85-2007 
Mandatory elements that are used in DCMI to
describe a resource: 
Name: A token appended to the URI of a DCMI namespace to create the URI
of the term.  
Label: The human-readable label assigned to the term. 
URI: The Uniform Resource Identifier used to uniquely identify a term. 
Definition: A statement that represents the concept and essential nature of the
term. 
Type of Term: The type of term as described in the DCMI
Abstract Model [DCAM].
The full 15  fields for describing a resource as set out in DCMI:
Title
Creator
Subject
Description
Publisher
Contributor
Date
Type
Format
Identifier
Source
Language
Relation
Coverage
Rights  
http://en.wikipedia.org/wiki/Dublin_Core
Accessed 28-11-2010
If we take field 8. ‘Type’ which is an example of a URIrefs that can be used to identify web
resources by type. A type could be for example an image, such as .jpeg  .png  .gif 
The DCMI Type Vocabulary provides a general, cross-domain list of approved terms
that may be used as values for the Resource Type element to identify the genre
of a resource. So if we need to apply metadata to an image stored on a website
we would look at the following rules and regulations set out under Dublin Core:
Term Name:  Image 
URI: http://purl.org/dc/dcmitype/Image 
Label: Image Definition: A visual
representation other than text. 
Comment: Examples include images and
photographs of physical objects, paintings, prints, drawings, other images and
graphics, animations and moving pictures, film, diagrams, maps, musical
notation. Note that Image may include both electronic and physical
representations. 
Type of Term: Class 
Broader Than: http://purl.org/dc/dcmitype/StillImage 
Broader Than: http://purl.org/dc/dcmitype/MovingImage 
Member Of: http://purl.org/dc/terms/DCMIType 
Version: http://dublincore.org/usage/terms/history/#Image-004 
In order to represent subjects, objects and
predicates as XML documents,  we
can use a XML implementation of RDF, which used the DCMI URIrefs as above. To
represent the image of me o my blog as an XML RDF statement we could write:

<rdf: RDF
xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#
xmlns:dcterms=http://purl.org/dc/terms/dcmitype”>
<rdf:Description rdf:about=http://christopherdbrook.com/blog/cb.jpg>
<dcterms:image>ChrisBrook</dcterms:image>
</rdf:Description>
</rdf:RDF>

In plain english:
Line1 tells the browser it is rdf

Line 2 tells the browser it’s a XML RDF description, Line 3 says it uses Dublin Core set of predicates, 
Line 4 describes the subject (the image
saved under my blog URL), 
Line 5 expresses the predicate, which in this case is
image, described by DCMI as:

“Examples include images and photographs of physical objects, paintings, prints, drawings, other images and graphics, animations and moving pictures, film, diagrams, maps, musical notation. Note that Image may include both electronic and physical representations.” http://dublincore.org/2010/10/11/dctype.rdf - Image

Lines 6 and 7 Close the RDF/XML and root tags respectively.

Paraphrased DITA Session 08
Lecture Notes Butterworth 2010 
The DCMI has essentially provided the rules for defning Objects,
Subjects and Predicates through URIrefs. This allows only a discreet number of
metadata fields to be defined and as they are URI’s they are all completely
unique. 
......phew a little bit heavy, I need a coffee and a fag and I'll come back and explain
the next big thing, creating a taxonomy of RDF statements for a certain domain:
The RDF Schema !!

DITA Session 08 - The Semantic Web - Why Semantics?

The Semantic Web as a concept is simple by definition; to make the resources and data accessible and understandable to computers within the context they reside, but inherently complex as we are attempting to ask computers to define the entire knowledge held on the web. In the real world this is something our human brains do on a daily basis. We take simple pieces of information, such as Tom hearing Suey and I talking in the pub about music and how I am loving the Sex Pistols, and Suey says he wouldn't like to meet Sid Vicious down a dark alley. Tom may infer:

Chris likes Punk Rock Music

Punk Rock Music scares Suey

Through our ability to gain knowledge and learn what the words mean: Tom knows that Chris and Suey are people, ‘likes’ and ‘scares’ are verbs, ‘Punk Rock Music’ is a particular type of music (that is not to everyone’s taste !) A moderately educated person like Tom can understand what these terms mean, and understand the construct of the sentence. Thus through logical reasoning Tom can answer questions like: ‘Who likes Punk Rock Music?’

If we really don’t understand the meaning of words we have tools to help us. We can look up the word in a dictionary, understand the definition through the fact that these new words are described by other terms we do understand. We have supplemented our knowledge and can then go on to interpret further meanings in the future. Meaning is derived through understanding a sequence of symbols, e.g. the example above is an English language grammatical structure in the form: “subject-verb-object.”

We look for the meanings through the structure and placement of words in sentences, which in turn give us context. Words often have several meanings and thus several definitions dependant on their context. Following hyperlinks in an online dictionary from the word ‘semantic.’ E.g. ‘relating’ and we will be lead to another definition. We could go on and on thus we can say a dictionary is ‘an ontology’ of language, in other words it is ‘self referencing.’ http://dictionary.reference.com/ (definition 1 below)

‘Semantic’ adj 1. of or relating to meaning or arising from distinctions between the meanings of different words or symbols

‘relating’–verb (used with object)

1. to tell; give an account of (an event, circumstance, etc.).

2. to bring into or establish association, connection, or relation: to relate events to probable causes.

–verb (used without object)

3. to have reference (often fol. by to ).

4. to have some relation (often fol. by to ).

5. to establish a social or sympathetic relationship with a person or thing: two sisters unable to relate to each other.

http://dictionary.reference.com/browse/semantic Accessed 28-11-2010

If we then use the same principle and apply it to data and documents stored on the web, it would stand to reason to want to establish connections between data to give it meaning. The creator and the consumer of that data could agree the meaning through reference to the XML schema in place, but would we want to do that with every piece of data on the web? In conversation we would have to set the rules each time we met someone new. Moreover, computers cannot gain knowledge about real world objects in the same way the human brain…..or can they?

Establishing objects and their relationships to subjects, we could say, is a one to one relationship. W3C schools has been working on the technologies and standards that need to be established to facilitate the idea of semantically describing ‘objects’, ‘subjects’ and relationships between them for data stored on web pages or in databases.

The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF). See also the separate FAQ for further information.

The Semantic Web is about two things. It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents. It is also about language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing.

Quoted from: http://www.w3.org/2001/sw/ Accessed 28-11-2010

We may type the word ‘apple’ into a search engine, do we want to look for a picture of an apple or the computer manufacturing giant ‘apple’ computers: Granny Smith or MacBook Pro ?; type ‘Orange’ into Google, we may get a picture of a juicy fruity orange coloured fruit or the home page of the phone company ‘Orange’ (Seville, Mandarin or 12 month Care Protection Plan for an I-phone 4). Matching text strings has been the traditional mechanism to recover related material, however, taken out of context, these words have different meaning dependent on the context within which they are used.

For many years now we have been hearing that the semantic web is just around the corner. In 2008 Tim Berners-Lee declared the semantic web "open for business" (Paul Miller, 2008). The reality for most libraries, however, is that we are still grappling with 2.0 technologies. Few among us have yet embraced web 3.0, also known as the web of linked data, or the semantic web. The promise of the semantic web is a dazzling one. By marking up information in standardized, highly structured formats like Resource Description Framework (RDF), we can allow computers to better "understand" the meaning of content, rather than simply matching on strings of text. This would allow web search engines to function more like relational databases, providing much more accurate search results - the ability to distinguish between a book that is written about a person, as opposed to a book that is written by a person, for example. For most librarians this concept is fairly easy to understand. We have been creating highly structured machine-readable metadata for many years, after all, and we already understand the benefits.
The second part of the linked data vision is where things really begin to get heady. By linking our data to shared ontologies that describe the properties and relationships of objects, we begin to allow computers not just to "understand" content, but also to derive new knowledge by "reasoning" about that content. As a simple example: Shakespeare wrote Macbeth. "Wrote" is the inverse of "WrittenBy" therefore Macbeth was written by Shakespeare. The real power of the semantic web lies in this ability for "intelligent" search engines to disambiguate terms (Apple the computer vs. apple the fruit, for example), to understand the relationships between different entities, and to bring that information together in new ways to answer queries. E.g., Show me all of the articles that have been written by people who have ever worked at any of the same institutions as Lisa Goddard.

http://www.dlib.org/dlib/november10/byrne/11byrne.html Accessed 22-11-2010

Introducing Appropriate Technologies to Enable The Semantic Web

XML has developed as a markup language to define elements of data, and allow sharing of data between applications by giving it a user definded tag, e.g in a music database using XML tags to label elements such as.

<name>Michael Jackson</name>

However in the global sense of the web, machines do not necessarily know that <name> relates to the names of a music artist, it could perhaps be used in another database as <name>Colorado</name> to define a place name.

<TrackName> defined using an XML tag could be ambiguous, the name of a racing track in motrosport ? How would a computer know ?

Thus the tags themselves need metadata attaching so as to define what each piece of data specifically means to in that particular context. Machines not only need to read the correct pieces of data, but understand it. Removing any ambiguity as to a piece of information, meaning, is something we have done with in spoken and written language. i.e. not have to guess.

XML is now an established tool allowing machine readable data to be passed between web applications, XML schemas define the structure of the XML document, while XML parsers read the data and displayed in other websites (RSS feeds read information written in XML for instance the BBC weather RSS feed).

The subject now gets rather involved and in the next of 3 posts I will attempt to summarise the main technologies of the Semantic Web. These underlying technologies, being developed under the direction of W3C, are tools that sit on top of XML in a 'stack' and allow data on the web to be semantically described:

RDF - Resource Defnition Framework
URIref - Universal Resource Identifier reference
RDFS - RDF Schema
OWL - Web Ontology Language

Wednesday, 24 November 2010

Extensible Markup Language - XML Introduction

Web Services and APIs allow machines to read data over the internet and one of the most prominent ways to achieve this is by using XML (eXtensible Markup Language), not strictly a language, however similarly to HTML, allows information stored on the web to be described in such a way that computers know what that information referes to. It is reffered to as self describing as the markup of data is done to make it possible for humans to defnine and computers to understand, read and pass around.
HTML allows text elements to be marked up for formatting, or hyperlinks to be defined, CSS controls the look and style of web pages, but neither can help to provide structure to the so it can be used effectively by computers. Computers cannot guess what we mean when we add a title to a photograph, or names and addresses in an address book.

Background
Originally markup was performed on information on the web using SGML - Standard Generalised Mark-up Language.

It was complex, difficult to master, and had a limited (and often expensive toolset). To give the web the power of SGML without the complication, a W3C working group set out to simplify SGML. In 1998 it produced Extensible Markup Language. While XML was originally meant to be a replacement for (or at least a suppliment to) HTML as hypertext on the web, it settled instead into a role of a format for exchanging data between programs and became a key component of web services.

Governer et al (2009 p. 23)

Data stored on the web is said to be unstructured, that is to say, HTML web pages, which markup up elements of text to define hyperlinks, images, and text, and later came CSS, which sits as a separate file, called by the browser to desccirbe the look and layout of the web page. There would be no way to know exactly what the different sections or pieces of information on that web page meant unless we were expressley told.

For example, we create at a web page with photos of our holiday with labels of where they were taken using HTML tag: <img src="Big Ben.gif" alt="Big Ben London" />
, for example and then use an API such as google maps to place a marker of Big ben in London, there would be little for a computer reading the tag to know that Big Ben was a landmark in London.A guess would most likely fail.

Structuring a document as an XML file allows the parts of a web page (e.g. a caption of a photograph, place where it was taken, and the date) to be described semantically, using a syntax, which can be written by the author and understood by a computer wishing to access that data remotely and pass it around on the web.

XML Structure and Syntax

XML allows elements to be defined using the <> brackets, within which we use tags of our own choosing, and / or attributes to describe the an object. So using the example of the photograph above we could create an XML document as follows (note tags MUST be closed in XML, there is no forgiveness as with HTML:

<?xml version="1.0" encoding="UTF-8"?>

<Photos>
 <name>Big Ben</name>
 <location>London</location>
 <date>01-04-2010</date>
</Photos>

The first line is an optional declaration, put there to describe that this is and XML document and the version of XML. Additionally character encoding can be specified (e.g.UTF-8) The elements desrcibed must be inside the root elements, e.g. <Photos>. An attribute could be added, so instead of <date>01-04-2010</date> we could use <date ="01-04-2010"> at the beginning, but would have to close the the tag at the end with </date>. To describe what the tags in XML mean, in order to remove any ambiguity, an XML schema or Docuemnt Type Definition (DTD) is used (more later on XML Schemas).
Additionally a reference, can add additional mark-up with XML documents, allowing the inclusion of additional text or markup. References always begin with the character “&” (which is specially reserved) and end with the character “;” e.g.

" whcih allows the ' ' character to be used without causing a conflict in the syntax.

Paraphrased from http://www.xmlnews.org/docs/xml-basics.html Accessed 24-11-2010

The benefits of using XML

XML can keep data separated from your HTML
XML can be used to store data inside HTML documents
XML can be used as a format to exchange information
XML can be used to store data in files or in databases

Additonally XML works on any platform, removing compatibility issues. It is free (and works with all browsers, albeit with slight differences in debugging with IE) and supported extensively through forums, tutorials and by the W3C - so help can be sought in the wider web community (a nice web 2.0 concept). It is part of a family of standards that are built on XML, e.g dialects of XML such as XHTML which allow further control and functionality to control how XML documents can be used.

For example XSL is the advanced language for expressing style sheets. (http://www.w3.org/XML/1999/XML-in-10-points Accessed 24-11-2010)

W3C's Resource Description Framework (RDF) is an XML text format that supports resource description and metadata applications, such as music playlists, photo collections, and bibliographies. For example, RDF might let you identify people in a Web photo album using information from a personal contact list; then your mail client could automatically start a message to those people stating that their photos are on the Web. Just as HTML integrated documents, images, menu systems, and forms applications to launch the original Web, RDF provides tools to integrate even more, to make the Web a little bit more into a Semantic Web.

http://www.w3.org/XML/1999/XML-in-10-points Accessed 24-11-2010

Saturday, 20 November 2010

DITA Session 06 - Introduction to Web Services and Application Program Interfaces (APIs)

Software has traditionally been a product - marketed, packaged and sold off the shelf, in the same fashion as most durables, with an array of features to serve the needs of the casual home user through to the multinational company. Competing software houses would aim to pack in as many features as possible to provide the most comprehensive product to its target audience, in an attempt to maximise profit. Typical consumers of software products range from the casual home user, small businesses through to multinational companies (and anything else in-between). Software developers have traditionally designed software to be installed and run locally from a computer or server based in-house, with the view that they must cater for all needs, which for the majority of users means paying for many features which will never be used. A far more efficient and cost-effective model would be to only pay for and consume the services we required from software without the extras we will never use. Moreover we could envisaged a case of a user wanting to consume multiple services, thus creating their own unique bespoke service, i.e. the sum of many parts tailored to a certain need.

Integration between software products is a feature of many off the shelf products, such as Bentley's Microstation, a CAD package is fully integrated with, Bentley Projectwise, an EDRMS specifically for managing .dgn drawing files produced by Microstation. Bentley Projectwise allows metadata about engineering drawings in the database to populate drawing tags, such as title, version, scale, designer etc. and appear in the drawing title block. The ability to exchange data between the two programs in this manner is nothing new or particularly complicated. However this principle has been exploited to much great extent with data stored on the web being shared using software built on Web 2.0 architectures. Network availability and increased bandwidth now allows Web 2.0 architecture to redefine what is possible in terms of sharing data using 'web services'.

Web services are created and offered over the internet to anyone who wishes to consume the service in the form of machine readable data. As the software service reside on a web server, someone wishing to use that service needs to connect to the internet and send a request. The idea of web services, being consumed in this way has been coined 'Cloud Computing'. Or as Governer et al (2009) define it "Cloud Computing refers to treating computing resources as virtualised, metered services, similar from a consumer's perspective to how we consume other utilities (such as water, gas, electricity, and pay per view cable)" (Governor et al 2009 p. 127)

A definition from WhatIS.com:

Cloud computing is a general term for anything that involves delivering hosted services over the Internet. These services are broadly divided into three categories: Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS). The name cloud computing was inspired by the cloud symbol that's often used to represent the Internet in flowcharts and diagrams.
A cloud service has three distinct characteristics that differentiate it from traditional hosting. It is sold on demand, typically by the minute or the hour; it is elastic -- a user can have as much or as little of a service as they want at any given time; and the service is fully managed by the provider (the consumer needs nothing but a personal computer and Internet access). Significant innovations in virtualization and distributed computing, as well as improved access to high-speed Internet and a weak economy, have accelerated interest in cloud computing.

http://searchcloudcomputing.techtarget.com/sDefinition/0,,sid201_gci1287881,00.html

One type of cloud computing phenomenom that has emerged over the last 5 years, Software as a Service (Saas) delivers computational functionality to users without them having to persist the entire application or system on their computers. (Governor et al 2009)

Web Services are a type of API. An application program interface (API - and sometimes spelled application programming interface) is the specific method prescribed by a computer operating system or by an application program by which a programmer writing an application program can make requests of the operating system or another application.

http://searchexchange.techtarget.com/definition/application-program-interfac Accessed 20-11-2010

Examples of APIs (or web services)

http://www.tutorialspoint.com/webservices/what_are_web_services.htm gives a good overview of web services and a good technical explanation of the components and workings of web services.

Web Services have Two Types of Uses:

1. Reusable application-components.

There are things applications need very often. So why make these over and over again?

Web services can offer application-components like: currency conversion, weather reports, or even language translation as services.

2. Connect existing software. (APIs for example)

Web services can help to solve the interoperability problem by giving different applications a way to link their data.

With Web services you can exchange data between different applications and different platforms.

http://www.w3schools.com/webservices/ws_why.asp Accessed 20-11-2010

Google offers many API's for developers to use here: http://code.google.com/more/
For example an API written by google allows a static map map to aded to a web page, with markers placed to display locations in the real world. The API is available at the following URL, which when copied into the HTML of a web page allows the map to be displayed.Parameters such as 'cener=Brooklyn Bridge can be amended to any location, markers can be placed by adding in the correct Easting and Northing etc.

http://maps.google.com/maps/api/staticmap?center=Brooklyn+Bridge,

New+York,NY&zoom=14&size=512x512&maptype=roadmap
&markers=color:blue|label:S|40.702147,-74.015794

&markers=color:green|label:G|40.711614,-74.012318
&markers=color:red|color:red|label:C|40.718217,-73.998284&sensor=false

http://code.google.com/apis/maps/documentation/staticmaps/ Accessed 20-11-2010

The advantages to using API's are that I can ad a map to a web page, or several webpages to show locations of objects in the real world, without needing to know how the actual map is coded, which is in reality probably very complex. The API is the interface provided by google, so all I need to do is call the URL as above ensuring the correct parameters are specified for displaying the map and markers.

An example of using the API is shown on my City University web page here.

Mashups

The example web page linked to above, is an example of a very simple mashup, including a Google static map, a Facebook 'like' button and a Twitter feed, all made available through the respective API. In effect we have mashed together three APIs to create our web page, with ease. I reality this is only the tip of the iceberg, with experience developers using Javascript to call on web services, manipulate the information provided and display it for our own purposes.

For example, a website may contain a directory listing businesses throughout a city stored as structured data in a database, and allow a user to search using keyword for a business, e.g. furniture retailers, within 20 miles of a specified location, and display the results in a Google map. The Javascript code would call the Google maps API, but making the necessary changes to add the markers relevant to the search results. The owner of the website may also wish to included adverts in a side bar of the website, and could then use the Google AdSense API, and target adverts depending on what the user searches for, for example advertise furniture retailers

JavaScript is used in millions of Web pages to add functionality, validate forms, detect browsers, and much more http://www.w3schools.com/js/default.asp Accessed 20-11-2010), Perl, Python and ASP.NET are similarly languages that run within the web browser, that can be used used in web development to execute and interact with web services, with advantage to using APIs and webservices regardless of the language r platform used by the consumer of the Web Service. The whole exchange of machine readable data in this way is made possible by XML (eXtensible Mark-up Langauge, and it's various incarnations). Defining information as elements using XML, much in the same way HTML tags describe formatting of a webpage, allows information to be passed around without any ambiguity. XML will be described in greater detail my next post.

Sunday, 14 November 2010

Web 2.0 Technologies - A Brief Evaluation of Web 2.0 Services - Part 3 Delicious

All popular web browsers allow bookmarks of web pages to be saved and re-visited with one click access (perhaps two clicks....!) however, what happens when we spend a long weekend working from home, discover a multitude of sites useful for us at work, turn up at work and then find we have to trawl through a search engine to find them again? Delicious.com offers a way to bookmark, categorise, organise and share all the websites that we could find useful, wherever we are.

The very fact that Delicious allows those with a registered account to publish a list of websites to others who in turn can share or comment on, thus creating a social space, propels this service into Web 2.0 technology. More-over, the delicious add-in for web-browsers such as Firefox , mean bookmarks saved on our web browser can be exported instantly to our Delicious account.

The use of tags in Delicious is a user generated attempt to provide keywords describing the content of the site (no different to tagging one of your friends in a photograph on facebook to describe who is in the photo), usually with no more than one or two words. Although highly subjective, tagging adds the metadata needed to loosely classify all websites visited by members of that social space. Delicious offers uses a list of the most popular tags used for a particular website providing it has been visited before, and we can add our own depending on what we feel best describes the content of the web page. This aggregation of tags within Delicious builds up a 'folksonomy'.

A folksonomy is a system of classification derived from the practice and method of collaboratively creating and managing tags to annotate and categorize content;^[1]^[2] this practice is also known as collaborative tagging, social classification, social indexing, and social tagging.^{[citation needed]} Folksonomy, a term coined by Thomas Vander Wal, is a portmanteau of folk and taxonomy.
Folksonomies became popular on the Web around 2004^[3] as part of social software applications such as social bookmarking and photograph annotation. Tagging, which is one of the defining characteristics of Web 2.0 services, allows users to collectively classify and find information. Some websites include tag clouds as a way to visualize tags in a folksonomy.^[4]

http://en.wikipedia.org/wiki/Folksonomy Accessed 14-11-2010

Once we build up our bookmarks, add tags, and interact with others in the social space, commenting and offering more insightful words to tag websites with, interesting things happen (as with most social spaces, the network effect starts to become apparent). We can 'explore tags' and are presented with a 'tag cloud' displaying all the most popular tags or latest being used.

Tag cloud image courtesy of http://en.wikipedia.org/wiki/Tag_cloud Accessed 14-11-2010

Typing in a word will display the latest websites bookmarked with that tag and thus trends emerge. This 'trending of tags' applied to websites can guide us to topics currently being viewed and bookmarked most frequently, which typically appear larger and more central in the cloud. Delicious goes further and highlights the popular tags in green used by you on your bookmarked sites. Are you using trendy tags for trendy sites ??? You are cool !! (or you need to get out more...).

If you are a web designer, adding web pages, menus, navigation systems etc. and trying to consider how best to tag and categorise your sites content, a visit to Delicious.com could give you some pointers to the most popular keywords being used and help guide you on your labelling system.

Information architects must try their best to design labels that speak the same language as a site's users while reflecting it's content. And, just as in dialogue, when there is a question or confusion over a label, there should be clarification and explanation. Labels should educate users about new concepts and help them quickly identify familiar ones. (Moreville and Rosenfeld 2006 p. 83

Beware then labeling your sites content with tags gleaned from Delicious ! Allowing the social network community to tag at will has it's drawbacks. Within the social space of Delicious.com, the network effect of 'power in numbers' with more users tagging the same website, thus popularises them. When adding a new bookmark, a list of popular tags is presented which you can quickly use to tag the site, so use of shared tags becomes fairly quick and users can think, "ok, the suggested most popular tags all look pretty good to me, lets just use those to tag this site. I have no time for this metadata lark."

An analogy of this approach in the traditional physical world of a reference library: letting the students of a university (by no means un-intellegent members of society ?!?) loose on a their university library catalogue, allowing any of them to tag a book with a set of terms they feel most encapsulates its content, instead of a qualified librarian with 10 years experience, and a system like the FRBR (Functional Requirements of Bibliographic Records), "created under the auspices of the AFLA [1]. It is a framework for relating data elements in bibliographic records to the needs of the users of those records." Chowdhury 2007.

Would we be able to run a keyword search on the catalog, using tags created by students and find the book we wanted? Maybe we would, maybe we would discover books more relevant than if we had used the traditional approach proposed in the FRBR, the question is open to the floor.

The approach to categorising content through the efforts of a social network and sharing ideas of what a website could be tagged has the positive notion of allowing democracy to prevail, most popular tag wins, maybe we could see a future where the members of government will be elected to power through tagging on a "facebook for your next MP" or something similar? Interesting concept.....but then that is coming in my opinion, but we are jumping into the realms of Web 4.0 and beyond, and I shall post my critique in due course.

[1] AFLA: The International Federation of Library Associations and Institutions (IFLA) is the leading international body representing the interests of library and information services and their users. It is the global voice of the library and information profession

http://www.ifla.org/ Accessed 14-10-2010

References (print only listed)

Chowdhury G. G., & Chowdhury S., (2007). Organising Information: from the shelf to the web. London; Facet

Morville, P., & Rosenfeld, L. (2006). Information architecture for the world wide web (3rd ed). Bejing; Farnham: O'Reilly

Web 2.0 Technologies - A Brief Evaluation of Web 2.0 Services - Part 2 Blogs

The communicating our ideas, thoughts, pastimes and technological or business interests to the world has never been so easy.....self indulgence is key here ! In a nutshell, a blog is an online diary kept usually by and individual or small interest group displaying 'posts' or articles authored and displayed down the page, newest on top.

A blog (short for weblog) is a personal online journal that is frequently updated and intended for general public consumption. Blogs are defined by their format: a series of entries posted to a single page in reverse-chronological order.Blogs generally represent the personality of the author or reflect the purpose of the Web site that hosts the blog. Topics sometimes include brief philosophical musings, commentary on Internet and other social issues, and links to other sites the author favors, especially those that support a point being made on a post. (Whatis.com? Accessed 13-11-2010)

The blog is accessible over the web for others to view , comment on and recently blogs now used extensively to organise ideas and promote ourselves, posting to the internet is done in a matter of seconds using blogger services (Blogger.com, Wordpress.org, Joomla.org). Content is managed by a back end Mysql database, keeping records of blog posts and associated comment threads, metadata, urls for each post and allowing and archive to built up over time without any time consuming administration.

A successeful blog, with enough traffic can become a profitable exercise through posting targeted adverts through google AdSense especially as blogs by reputable authors can become a valuable source of up to date information. http://www.newscientist.com/blog/space/2008/07/stephen-hawking-pokes-fun-at-america.html (a funny article posted about Stephen Hawkin ! Accessed 14-11-2010). Blogs are even used to allow conferences or events to be followed remotely in real-time.

With the multitude of blogs that exist and endless number of topics written about, how do we keep a breast of it all? Starter for ten.... google.com can be set to specifically narrow results of only blog articles in a conventional search. Posts are assigned labels relevant to the content (see the bottom of this post) which are index-able by google.com and other search engines. Once we find a blog we may me interested in in the future, rather than have to visit regularly to check for updates, RSS feeds are a useful way to transmit human readable information between blogs, providing real-time updates when new content is added to syndicated sites. A popular open source blogging service WordPress, explains the process of syndication:

A feed is a machine readable (usually XML explained in a future post) content publication that is updated regularly. Many weblogs publish a feed (usually RSS, but also possibly Atom and RDF and so on, as described above). There are tools out there that call themselves "feedreaders". What they do is they keep checking specified blogs to see if they have been updated, and when the blogs are updated, they display the new post, and a link to it, with an excerpt (or the whole contents) of the post. Each feed contains items that are published over time. When checking a feed, the feedreader is actually looking for new items. New items are automatically discovered and downloaded for you to read. Just so you don't have to visit all the blogs you are interested in. All you have to do with these feedreaders is to add the link to the RSS feed of all the blogs you are interested in. The feedreader will then inform you when any of the blogs have new posts in them. Most blogs have these "Syndication" feeds available for the readers to use. (http://codex.wordpress.org/Introduction_to_Blogging Accessed 14-11-2010)

Blogs are no particularly remarkable in isolation, but when linking blogs things become more interesting.....we are now entering the 'Blogosphere' oooooooh.....now that's more like it, Web 2.0 ! Trackbacks, which use HTTP GET protocol, allow bloggers to link to articles, sending information to one another as a comment. A better explanation is this:

Person A writes something on their blog.
Person B wants to comment on Person A's blog, but wants her own readers to see what she had to say, and be able to comment on her own blog
Person B posts on her own blog and sends a trackback to Person A's blog
Person A's blog receives the trackback, and displays it as a comment to the original post. This comment contains a link to Person B's post

http://codex.wordpress.org/Introduction_to_Blogging Accessed 14-11-2010

I use trackbacks on my Music Information Blog to keep abreast of what other music bloggers are posting, sharing the Knowledge of artists, new bands, old rare finds and anything of interest. Sharing posts in this way allows bloggers to cite each other, posting trackbacks as comments on related topics, effectively presenting an abstract of the content of the post, anyone else viewing my post will also be able to follow the trackback if that appears to be of interest also.

Critical evaluation of blogging also means we must mention the same old problem of authenticity of sources as in my previous article about Wikipedia. Content in a blog is really is just personal opinion. Personal opinions can be again damaging if used to the wrong ends, sources mis-quoted and plagiarism from other blogs can be a common threat to the 'who said it first' arguments that perpetuate in the era of information, news, gossip and libelous litigation.

Pingbacks are a technology, relying on XML-RPC, which unlike trackbacks, which can be edited or faked, pingbacks provide a link to the original blog article, notifying that blogger their site has been quoted somehwere else. The pingback is explained beter by our friends at WordPress:

The best way to think about pingbacks is as remote comments:

Person A posts something on his blog.
Person B posts on her own blog, linking to Person A's post. This automatically sends a pingback to Person A when both have pingback enabled blogs.
Person A's blog receives the pingback, then automatically goes to Person B's post to confirm that the pingback did, in fact, originate there.

http://codex.wordpress.org/Introduction_to_Blogging Accessed 14-11-2010

The most successful 'Micro-Blogging' site today, Twitter, allows little flexibility in that it restricts posts to 140 characters, however the ability to follow other micro-bloggers terms of news can be shared with the world.......micro-news on tap, tailor made for your own consumption.

An article on the Twitter blog show how apples own social networking service for those listening to music to share excerts from songs, artists and links to itunes downloads, will allow this to be 'tweeted'.

Every day, millions of people use Twitter to follow and share what they care about. Twitter users now send over 95 million Tweets a day, many of which are about the music they're listening to.

Starting today Ping, iTunes' new social network for music, and Twitter are making it even easier for people to share music discoveries with their friends by putting Ping activity, song previews and links to purchase and download music from the iTunes Store right in their Tweets on Twitter.com.

http://blog.twitter.com/2010/11/twitter-ping-discovering-more-music.html

Why not check out some blogs instead of picking up the News of The World on Sunday morning.....far more stimulating in my opinion.....but that's just my opinion of course, it's my blog !
Thoughts etc below please.......

Saturday, 13 November 2010

Web 2.0 Technologies - A Brief Evaluation of Web 2.0 Services - Part 1 Wikipedia

In the next three posts I will explore three current Web 2.0 services that typify the concept of the read/write web openness, freedom, connectivety within a social space and the 'network effects' that a the level of commitment to these systems brings.

Part 1 - www.wikipedia.org
Part 2 - blogs
Part 3 - www.delicious.com

Part 1. Wikipedia: On-line Encylopedia which allows contributions from anyone with access to the web, with the aim of organisation of the world' knowledge. An organisation called the Wikimedia foundation typifies the ideology of projects like wikipedia.org:

The mission of the Wikimedia Foundation is to empower and engage people around the world to collect and develop educational content under a free license or in the public domain, and to disseminate it effectively and globally. (http://wikimediafoundation.org/wiki/Mission_statement accessed 13-11-2010)

Articles are created, categorised and edited by anyone with subcategories allowing topics to be fully presented in an organised way using a standard template.

Categories allow articles to be placed in one or more groups, and allow those groups to be further categorized. When an article belongs to a category, it will contain a special link to a page that describes the category. Similarly, when a sub-category belongs to a parent category, it will contain a special link to the parent category's page. Each category page contains an introduction that can be edited like an article, and an automatically generated list of links to sub-categories and articles that belong to the category. Categories do not form a strict hierarchy or tree of categories, since each article can appear in more than one category, and each category can appear in more than one parent category. This allows multiple categorization schemes to co-exist simultaneously. It is possible to construct loops in the category space, but this is discouraged. (Wikipedia.com accessed 13-11-2010)

The ideas of linking between categories, navigating and searching my category is a key feature of Web 1.0 hypertext markup, but the ability to create articles, contribute and allow debate sits in the Web 2.0 domain. A key importance is anonymity which "encourages a freer-er exchange of information, but also downgrades the responsibility that contributors have to actually get things right" (Butterworth 2010 lecture notes)

The site is not as free and open as possibly it's founders would have liked. Content added has to confirm to editorial rules including correct referencing of material to help substantiate claims.

An interesting article I found on the Slashdot Technology website:

"The NY Times reports on an epochal move by Wikipedia (reported in August 2009) — within weeks, the formerly freewheeling encyclopedia will begin requiring editor approval for all edits to articles about living people."
Articled refering to people could give rise to libelous claims, which Wikipedia would of course would have to discourage has potentially the effect of dividing.... "Wikipedia's contributors into two classes — experienced, trusted editors, and everyone else — altering Wikipedia's implicit notion that everyone has an equal right to edit entries." or 'Internet Elitism. http://tech.slashdot.org/ (accessed 13-11-2010)

One has to then question is Wikipedia is now any different to the Encylopedia Britannica, aside from the fact it is a free resource? Encylopedia Britannica provides authoritative articles written by experts in their field and content is subject to strict editorial policy. Decisions of what should be included made through 'traditional modes of thinking' could imply that this elitist 'serious' and 'high minded' approach leads to a somewhat stale and 'out of touch with reality' content.

Wikipedia includes many articles on popular culture and up to the minute issues in the public domain that perhaps are more in touch with reality. Inviting content to be added by anyone will open up topics for debate which perhaps would not see the light of day otherwise, or more importantly for aggregation of exiting information and expanding knowledge of a topic by opening the floor to the web community.

The downside to this is that we give rise to the 'hive mind (dictionary.com accessed 13-11-2010)', where potentially those with the most time on their hands to add content, or debate issues will, by a simple process of 'shouting loudest', allow their opinions to win through, rather than using proven fact.

An article on http://www.silicon.com (accessed 13-11-2010) gives an account of a story "in which Wikipedia has taken hits for its inclusion, for four months, of an anonymously written article linking former journalist John Seigenthaler to the assassinations of Robert Kennedy and John F Kennedy."

Wikipedia vs Encyclopedia Britannica: An Equal Match?

Traditional encylopedia's, with conventional business models of financing through paid subscription are struggling to compete with Wikipedia and other free on-line encyclopedia's. Perhaps the debate on which approach is the most informative information source for organising and providing a point of reference for our knowledge base will perhaps be settled when the Encyclopedia Britannicas of this world go out of business as they can no longer make it commercially viable to produce and publish.

Wednesday, 10 November 2010

DITA Session 05 - Introducing Web 2.0 and Technologies 25-10-2010

Web 2.0 is best described as a label or concept for new ways the internet has developed, allowing the non-technical amongst us to write to the internet, rather than just read or interact with content made by others. The concept symbolizes openness, sharing of documents (text, applications, multimedia files etc.) knowledge.
Over the last 5 years the collection of websites and online services, such as Wikipedia, Facebook, blogs, Twitter, YouTube etc. has allowed anyone with access to the internet to use a web-browser to access these tools which allow easy collaboration and effectively use the web as the platform rather than the hard drive on your computer.

Tim O’Reilly, founder of O’Reilley media, and supporter of free software and open source movements gave this definition:

Web 2.0 is the business revolution in the computer industry caused by the move to the internet as platform, and an attempt to understand the rules for success on that new platform. Chief among those rules is this: Build applications that harness network effects to get better the more people use them. (This is what I've elsewhere called "harnessing collective intelligence.")

http://radar.oreilly.com/2006/12/web-20-compact-definition-tryi.html

Comapring Web 1.0 with Web 2.0

As my previous post on Web 1.0 technologies focussed on the problem of information retrieval of existing information, (the information being retrieved often was structured and held in a database or unstructured in the form of web pages), this post focussed on the “Network Effects” of the new technologies allowing us to easily create our own content and share it over the internet.

“Web 2.0 applications allow users to create and use ‘social spaces’, where social interactions can occur over the net.” (Butterworth 2010 lecture notes).

There is however there is nothing so new in the technology allowing these social spaces to work, other advances such as increased network coverage, the increased bandwidth, cheaper server space, advances in digital technology such as cameras, video, music applications and other management software that allow content to be created and uploaded to a web-server freely and easily by anyone.

Prior to this advances, photographs wood perhaps have to be scanned in, you would need to pay for sever space and know HTML and other code to enable the digital information to be displayed and available to others. Hence Web 1.0 was a read only environment to most individuals.

Web 2.0 characteristics

HTML allows text and images to be displayed in web browsers and can be styled (see post on CSS), and further made interactive using programming languages such as Javascript, However HTML 4.01 does not have any inherent support for video animation or sound. Instead application plug ins allow media to be added, eg. Flash, Ajax, jQuery (Butterworth 2010 Lecture notes). This gives the user a ‘rich user experience’ using Rich Internet Applications (RIA’s). Increaed response speed and greater functionality and usability.

A new version of HTML (HTML 5) is in the process of being developed allowing sound and video to be incorporated into web pages without these add in scripts. Taken from the W3C Schools website the background on HTML:

“The World Wide Web's markup language has always been HTML. HTML was primarily designed as a language for semantically describing scientific documents, although its general design and adaptations over the years have enabled it to be used to describe a number of other types of documents.

The main area that has not been adequately addressed by HTML is a vague subject referred to as Web Applications. This specification attempts to rectify this, while at the same time updating the HTML specifications to address issues raised in the past few years.” http://dev.w3.org/html5/spec/Overview.html#background Accessed 07-11-2010

User Participation: Providing people buy into the concept of Web 2.0 and offer a level of commitment to whether a social space on the internet is successful, then social spaces can be of great value for discourse communities i.e. groups with a common field of interest.

whatis.com? provide a definition of what social networking means:

“Based on the six degrees of separation concept (the idea that any two people on the planet could make contact through a chain of no more than five intermediaries), social networking establishes interconnected Internet communities (sometimes known as personal networks) that help people make contacts that would be good for them to know, but that they would be unlikely to have met otherwise.”

http://whatis.techtarget.com/definition/0,,sid9_gci942884,00.html

It could be true in the case of some social spaces that the more restricted the access and greater limitations imposed to allow only those with an interest to interact in that social spaces, the more successful they will be. Open participation is the very key concept of Web 2.0 however there maybe the opportunity for less constructive debate or interaction from those without an interest in that social space being successful. This can be seen in the often completely irrelevant comments left on YouTube videos, the unregulated access to comment often leads to hateful and derogatory comments.

Dynamic content: Web 2.0 technologies allow content to be created constantly, for example twitter updates will flow like a stream into the browser (flow internet). Tools to manage this include RSS (Rich Site Summary) where content the user is interested in is fed to the Browser (Butterworth 2010 Lecture notes). RSS is explained here:
http://www.whatisrss.com/
The vast amount of content created through sites such as twitter and facebook can be fed to any other site through RSS feeds (utilizing XML explained in another post) allowing the user to be alerted to new content that is of particular interest

Metadata:As Web 2.0 is about low barriers to publishing, a fully fledged metadata scheme (e.g a library catalogue) is inappropriate. Users of Web 2.0 technology would most likely not have the inclination, time or expertise to formulate an ontology or controlled vocablulary from which to populate structured metadata for all their content. That said, anything uploaded to the web needs some form of metadata for it to be discoverable and retrievable, thus rather simple techniques of tagging using single words, very short phrases or acronyms can help to keep information organised. It may also be possible for other users to add these tags (e.g. facebook allows anyone within a social network to tag a photograph with a person) which is referred to as a 'folksonomy'.

Further, the use of tag clouds is a web 2.0 concept where tags are grouped into a cloud, where more commonly used words in a document appear larger and more centralised (Butterworth 2010 lecture notes)

Example of a tag cloud:

Openness and freedom: The philosophy of web 2.0 is the concept of connecting with others using web services that allow anyone with an interest to view, publish content, interact (through commenting and responding) with one another, without (some of?) the restrictions imposed when compared to those used in traditional means of publishing. Many sites now exist, including Facebook, YouTube, Flickr, Delicious, Digg, Blogs, Forums, Wikis, etc, in many domains, helping those with common interests in a topic, or relationships inherent in there field of influence to perform an array of knowledge and infromation sharing behaviours.

Informationweek.com published an article on their blog (an example of a web 2.0 service) called:
"Is all this Web 2.0 Openess a good thing? regarding the threat to the security of others:

I went to a news conference this morning and a philosophical debate broke out. The scene was the unveiling of the Nokia N810, a new Internet tablet from the world's No. 1 handset maker, at the Web 2.0 Summit in San Francisco. The Nokia executives were extolling the virtues of openness in the Web 2.0 world, when a German journalist piped up and asked, "But aren't you just making things open for the malcreants also?"

(www.informationweek.com/blog.com accessed 11-11/2010)

The fact that Web 2.0 openness and freedom allows these connections between people over the internet, who perhaps would not ordinarily interact, allows so called "malcreants" (he probably meant miscreants).....to invade that social space, steal identities, add hateful comments or incite criminal activity, or some kind of behavior perhaps morally or legally wrong. The very open nature of web 2.0 leads many to be suspicious of giving away too much information and thus undermines the very concept. As I critically evaluate several web 2.0 services I shall draw upon real life examples of the negative side, or inherent weakness of web 2.0 architectures and present the good and bad side of our new found web freedom.

Miscreants: "a vicious or depraved person; villain." dictionary.com

(accessed 11-11-2010)