Wednesday, 24 November 2010

Extensible Markup Language - XML Introduction

Web Services and APIs allow machines to read data over the internet and one of the most prominent ways to achieve this is by using XML (eXtensible Markup Language), not strictly a language, however similarly to HTML, allows information stored on the web to be described in such a way that computers know what that information referes to. It is reffered to as self describing as the markup of data is done to make it possible for humans to defnine and computers to understand, read and pass around.
HTML allows text elements to be marked up for formatting, or hyperlinks to be defined, CSS controls the look and style of web pages, but neither can help to provide structure to the so it can be used effectively by computers. Computers cannot guess what we mean when we add a title to a photograph, or names and addresses in an address book.

Background
Originally markup was performed on information on the web using SGML - Standard Generalised Mark-up Language.
It was complex, difficult to master, and had a limited (and often expensive toolset). To give the web the power of SGML without the complication, a W3C working group set out to simplify SGML. In 1998 it produced Extensible Markup Language. While XML was originally meant to be a replacement for (or at least a suppliment to) HTML as hypertext on the web, it settled instead into a role of a format for exchanging data between programs and became a key component of web services.
Governer et al (2009 p. 23)

Data stored on the web is said to be unstructured, that is to say, HTML web pages, which markup up elements of text to define hyperlinks, images, and text, and later came CSS, which sits as a separate file, called by the browser to desccirbe the look and layout of the web page. There would be no way to know exactly what the different sections or pieces of information on that web page meant unless we were expressley told.

For example, we create at a web page with photos of our holiday with labels of where they were taken using HTML tag: <img src="Big Ben.gif" alt="Big Ben London" />
, for example and then use an API such as google maps to place a marker of Big ben in London, there would be little for a computer reading the tag to know that Big Ben was a landmark in London.A guess would most likely fail.

Structuring a document as an XML file allows the parts of a web page (e.g. a caption of a photograph, place where it was taken, and the date) to be described semantically, using a syntax, which can be written by the author and understood by a computer wishing to access that data remotely and pass it around on the web.

XML Structure and Syntax

XML allows elements to be defined using the <> brackets, within which we use tags of our own choosing, and / or attributes to describe the an object. So using the example of the photograph above we could create an XML document as follows (note tags MUST be closed in XML, there is no forgiveness as with HTML:

<?xml version="1.0" encoding="UTF-8"?>
<Photos>
 <name>Big Ben</name>
 <location>London</location>
 <date>01-04-2010</date>
</Photos>
 The first line is an optional declaration, put there to describe that this is and XML document and the version of XML. Additionally character encoding can be specified (e.g.UTF-8) The elements desrcibed must be inside the root elements, e.g. <Photos>. An attribute could be added, so instead of <date>01-04-2010</date> we could use <date ="01-04-2010"> at the beginning, but would have to close the the tag at the end with </date>. To describe what the tags in XML mean, in order to remove any ambiguity, an XML schema or Docuemnt Type Definition (DTD) is used (more later on XML Schemas).
Additionally a reference, can add additional mark-up with XML documents, allowing the inclusion of additional text or markup. References always begin with the character “&” (which is specially reserved) and end with the character “;” e.g.


&quot; whcih allows the ' ' character to be used without causing a conflict in the syntax.



Paraphrased from http://www.xmlnews.org/docs/xml-basics.html Accessed 24-11-2010



The benefits of using XML

  • XML can keep data separated from your HTML
  • XML can be used to store data inside HTML documents
  • XML can be used as a format to exchange information
  • XML can be used to store data in files or in databases
Additonally XML works on any platform, removing compatibility issues. It is free (and works with all browsers, albeit with slight differences in debugging with IE) and supported extensively through forums, tutorials and by the W3C - so help can be sought in the wider web community (a nice web 2.0 concept). It is part of a family of standards that are built on XML, e.g dialects of XML such as XHTML which allow further control and functionality to control how XML documents can be used.

For example XSL is the advanced language for expressing style sheets. (http://www.w3.org/XML/1999/XML-in-10-points Accessed 24-11-2010)



W3C's Resource Description Framework (RDF) is an XML text format that supports resource description and metadata applications, such as music playlists, photo collections, and bibliographies. For example, RDF might let you identify people in a Web photo album using information from a personal contact list; then your mail client could automatically start a message to those people stating that their photos are on the Web. Just as HTML integrated documents, images, menu systems, and forms applications to launch the original Web, RDF provides tools to integrate even more, to make the Web a little bit more into a Semantic Web.
http://www.w3.org/XML/1999/XML-in-10-points Accessed 24-11-2010

No comments:

Post a Comment