Introduction
Before I began the DITA course and heard the term Web 2.0, it implied to me that there was a release of a new version on a specific date. Since reading Tim O’Reilly’s second go at a definition of Web 2.0 (http://radar.oreilly.com/2006/12/web-20-compact-definition-tryi.html Accessed 26-11-2010) and writing my article: http://chrisbrookditablog.blogspot.com/2010/11/dita-session-05-introducing-web-20-and.html (Accessed 13-11-2010), I’ve discovered that Web 2.0 is a genre of websites and technologies which open up the web for creating, publishing, sharing, recycling, re-using, and re-arranging information. Instigated by increases in geographical broadband coverage, bandwith and reduced cost is now being exploited by ‘a set of social, architectural, and design patterns (websites), which say Governor et al (2009), is resulting in a mass migration of business to the internet as a platform.
A definition of Web 3.0 is even more ambiguous, and I’d rather refer to ‘The Semantic Web’. A common thread running through the concepts of digital architectures and the technologies being developed under the labels of Web 2.0 and the Semantic Web, not too dissimilar to my previous blog entry is ‘unstructured vs structured’ representation of information. Publishing using Wikis
Govenor et al (2009) describe ‘concepts’ which are features of many [Web 2.0] applications and services found today. They are defined as patterns such as ‘Participation – Collaboration’, ‘Collaborative Tagging’, ‘Software as a Service’, ‘Mashup’, and ‘Structured Information’.
Wikipedia (http://chrisbrookditablog.blogspot.com/2010/11/dita-session-05-introducing-web-20-and.html Accessed 13-11-2010) exemplifies a Web 2.0 website, and has become a household name as an on-line Encyclopedia. Although new articles in 2010 are not being published so often, (http://www.time.com/time/magazine/article/0,9171,1924492,00.html#ixzz17AB03UyU Accessed 02-12-2010), collaborative tagging, editors and readers, who also have a ‘power to edit’ are collaborating and discussing articles, adding further sub categories, helping editors delete pages no longer of relevance. Here I see the network effect: the more users collaborate - quality and accuracy of articles improves. “People all over the world who are interested in a certain topic can collaborate asynchronously to create a living, breathing work.” Governor et al 2009 p. 51) However, in my opinion Wikipedia articles are still pieces of unstructured information i.e. not arranged and governed by the strict rules of a relational database.
Wikipedia utilizes XHTML to publish content; an example of the Wikipedia Main Page XHTML source code is here: http://en.wikipedia.org/wiki/File:Wikipedia_main_page_XHTML_source.png
Recommended by W3C as a mark-up language based on HTML and compatible with XML (http://chrisbrookditablog.blogspot.com/2010/11/extensible-markup-language-xml.html Accessed 12-12-2010)
“XHTML consists of all the elements in HTML 4.01, combined with the strict syntax of XML. Today's market consists of different browser technologies, some browsers run on computers, and some browsers run on mobile phones or other small devices. The last-mentioned do not have the resources or power to interpret a "bad" markup language.”
http://www.w3schools.com/xhtml/xhtml_why.asp Accessed 04-12-2010
This technology gave Wikipedia the ability to become even more accessible in 2008 with the launch of the mobile version of the site. Optimised for mobile browsers, none essential elements of the page are stripped out including sidebars and headers, with collapsible sub sections of articles. Cross device compatibility is one of the key components of Web 2.0 applications.
Bespoke ‘Wikis’ offer a way to publish, disseminate and collaborate within a bounded set of users. In my professional experience I helped publish a user manual for an Electronic Document Management System using wiki technology while working as an information specialist as part of the design team on a new metro project. The purpose was to educate the design team how to use the EDRMS for the project, and disseminate new procedures as and when needed. It would have been difficult and time consuming to produce a traditional manual covering all the differing levels of expertise and needs of the users. For example a Project Manager needing to search for a single PDF document does not need to know features a CAD operator needs to know e.g. creating metadata and how to reference sets of model files to create quality assurance compliant CAD drawings.
A ‘bare bones’ manual was created and relied on the participation of the design team members to ‘flesh out’ content. All had the common interest of making the project a success and initially users contributions added a high degree of value through harnessing their collective knowledge. However, after the initial setting up contributions to the wiki dried up. The shift from a top-down editorial approach to a bottom-up approach is a painful reversal for people who expect only expert advice when they look up something.” (Governer et al 2009 p. 51)
Identifying technologies for use in the Information Sciences
Wikipedia is a published set of linked resources, organized visually through a template and through collaboration of the web community, has built up a vast network of articles and subjects. Articles are arranged by subject and interlinked through hyperlinks; categories are organized into a hierarchy (Interestingly Web 3.0 is a category under the subject Web 2.0). If data relating to subjects and categories were marked-up with XML and metadata applied to give context, rather then just XHTML and were referenced using unique reference identifiers that explicitly described meaning, Wikipedia could become a much more powerful resource for information scientist. W3C schools has recommended the technologies and standards that need to be established to facilitate the idea of semantically describing information as ‘objects’, ‘subjects’ and relationships or ‘predicates’ through RDF Resource Description Framework referred to as RDF triples.
“URI’s can identify anything as a resource, the subject of an RDF statement can be a resource, and predicates in RDF statements are always resources. Because URI’s uniquely identify resources (things in the real world) they are considered ‘strong identifiers’. There is no ambiguity about what they represent, and they always represent the same thing, regardless of the context we find them in.” Sergaran et al 2009 p.66) Thus organizing the important facts that have been added, in a rather ad-hoc fashion to subjects in Wikipedia using RDF statements containing URIrefs, which are built from standard terms (an example of which is provided by the Dublin Core Metadata Initiative http://dublincore.org/ Accessed 02-12-2010) the information becomes highly re-usable.
RDF allows properties to be invented independently regardless of the subject domain. It can be converted to XML and using the simple triple makes it simple to identify a resource (Chowdhury 2007 p.203). Graphically representing RDF statements is a powerful way to visualize the interlinking of objects, so simple triples can be aggregated into complex webs of relationships using simple nodes and connectors. I have drawn a simple RDF graph of my music blog, describing the triples for three objects that ‘belong’ to my music blog: http://chrisbrookditablog.blogspot.com/2010/11/semantic-web-technologies-resource.html (Accessed 26-11-2010)
Wikipedia exemplifies the Web 2.0 principle of collaborative tagging, which could be described as user driven Taxonomy of the world’s knowledge. “Web 2.0 is an informal flexible way of integrating disparate web services. It requires less dependence on shared vocabularies and provides workable rather than totally perfect solutions. (Burke 2009), For Information scientists, this gives rise to ambiguity in describing web resources. The Semantic Web standards and technologies being championed by W3C and its partners attempts to remove this ambiguity.
Representing and organizing data using standard metadata built upon an overarching ontology of semantic meaning, real world ontology is applied to web resources in the same manner as librarians do for traditional printed resources using MARC 21. Using XML and RDF to represent resources using standard metadata such as the Dublic Core and URIrefs, creates a library catalogue potentially extending to the whole World Wide Web.
Ontology is developed within subject domains to model real life objects. They apply a set of rules as to how these objects can relate to one another. “An ontology provides a precise vocabulary with which knowledge can be represented, how they can be grouped, and what relationships connect the together.” (Segaran et al 2009 p. 127). This would further help to structure the data held in Wikipedia, and as it is written using XHTML it is directly compatible with XML, thus making it machine readable and machine understandable. If applied correctly, Ontology could be created based a taxonomy provided by resources on Wikipedia. The W3C standard OWL (Web Ontology Langauge http://chrisbrookditablog.blogspot.com/2010/12/semantic-web-technologies-owl.html Accessed 12-12-2010) a mark-up technology for describing ontology allows machines to understand the relationships and hierarchy of subjects and objects.
Information specialists currently have a great vested interest in utilising these for stacked technologies to interpret data in new ways. Simple RDF graphs can be joined through new relationships, which since built on simple ontological rules allow complex reasoning to be performed using many more variables than we may normally consider. Thus URI’s act like primary keys in a relational database.
Projects to create semantic wikis, from either pre-defined or user created ‘folksonomies’ where “some portion of its data in a way that can be queried elsewhere. Typical uses of such data include querying it within the wiki (sometimes using standard query languages like SPARQL), aggregating it in displays like tables, maps and calendars; exporting it via formats like RDF, OWL or CSV; and reasoning with it, to calculate new facts from the given facts. (http://en.wikipedia.org/wiki/Semantic_Wikipedia Accessed 12-12-2010)
Freebase (http://www.freebase.com/home Accessed 02-12-2010) is an example of a semantic wiki, where articles can be built automatically form multiple resources.
Utilising Web 2.0 and Semantic Web Technologies
Exploring the possibilities of utilizing this now semantically structured data such as held in Freebase or the Government’s proposal to publish its datasets in RDF format, offer the possibility to query and mash together information and data in new ways. “Companies and businesses often need to gather data from a range of sources, XML can serve as a uniform data exchange format, and thus can facilitate such gathering, processing, re-use and distribution of data across various applications” Chowdhury.G 2007 p. 164.
The UK Department for Communities an Local Government in conjunction with Local Authorities planning departments developed and rolled out a standard on-line planning application form called 1APP (http://www.planningportal.gov.uk/PpApplications/genpub/en/Ecabinet. Accessed 04-12-2010). From 6 April 2008, The Standard on-line form allowed Local Authorities in England and Wales to receive planning applications digitally. Through the application of an XML schema, details of planning applications could be captured digitally such as applicant details, type of development, number of housing units , floorspace of commercial development and a range of information that can be directly uploaded into bespoke back-office Planning Systems.
This greatly improved efficiency from the old paper based system, eliminating data entry and scanning. Local Authorities tend to use large and complex databases for dealing with planning application and to develop and roll out a web based system for document and event handling would be far too costly. However, the use of the online XML schema has the power to make the data collected re-usable in other applications. Data could be read and fed into other departments systems and used to calculate statistics on housing and commercial development, and identify trends to appraise the success of land use planning policy.
The flexible nature of data that is semantically described leaves information professionals be able to look for new overlaps of information, taking high level ontological rules and attempting to realise the same relationships in multiple datasets. For example the thousands of datasets held by government departments written in RDF. Ontological rules can be applied to Census data, crime data, housing tenure, population projections, ethnic breakdown, socio-economic classifications, which all describe real world objects. Ordnance Survey have also created an ontology to describe geographical entities using OWL-DL, making geographical locations and defined areas explicit to machines through RDF. http://www.ordnancesurvey.co.uk/oswebsite/ontology/Accessed 01-12-2010.
This has implications for information professionals working in policy research for instance being able to write spatial queries to explore questions not normally possible: What is the prevalent socio-economic class of 25-34 year olds of Somali origin who live in areas in the top 10% areas for crime against the person and live in Council rented dwellings? Here, multiple data sets are queried, including the ability to Geocode pieces of data. Instead of copying data sets into GIS systems to query, the work is all done over the web, thus the information specialist is hopefully assured that she is using the most current data.
Linked data and the governments drive to create the ‘Open Data Movement’ are essentially making data available as a commodity that can be taken, manipulated by private enterprise to create more opportunity for information systems development (http://data.gov.uk/ Accessed 10-12-2010) Data from planning applications, linked using RDF graphs representing all manner of data published by the ONS, can provide local government with information which would be invaluable to service planning.
For example in the future land use planning must consider environmental and societal changes such as rising sea levels and over population and guide the development of the infrastructure to support this.
Planning applications give data on the number of houses to be built. The data is also geocoded so it can easily be mapped using OS geography data published in RDF format. Environment agency publishes flood risk zones, again geocoded. Population projections, give official population statistics. Complex statistical queries could be built up from the data represented by RDF statements to determine where development must be directed in the future to mitigate against flooding, and where services will need to be located to cope with the environmental changes we will see in the future. Policy making thus becomes far more efficient. Using RDF graphs and SPARQL to quickly observe relationships in the real world without the need to manually bring together disparate datasets in a GIS system, the web becomes a platform for government to formulate policy based on evidence provided by linked data sets.
Conclusion
Web 2.0 applications make publishing accessible with minimal effort and rely on user defined tags as metadata. The Semantic Web uses marked-up sections of machine readable data found in databases or documents on the web, and describes them through real-world semantic models or ontologies.
“The Semantic Web is about two things. It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents. It is also about language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing." (http://www.w3.org/2001/sw/ Accessed 28-11-2010)
The Semantic Web movement will hopefully lead to more merging of web-based data. Discoveries of new information from existing information will become possible by looking for overlaps between new and existing data and could lead to new advances in science, medicine and improve our general understanding of the real world.
References
http://chrisbrookditablog.blogspot.com/2010/12/web-20-and-web-30-semantic-web.html
(URL for this blog post)
(URL for this blog post)
Burke, M., The Semantic Web and the Digital Library Aslib Proceedings 61 (3) 2009
Chowdhury, G. G., Chowdhury S., Organizing information from the shelf to the web 2007 Facet
Nickull, D., Hinchcliffe D., Governor, J. Web 2.0 Architectures 2009, O’Reilly
Sergaran, T., Evans C., Taylor J. Programming the Semantic Web 2009 O’Reilly
http://www.w3.org/2001/sw/ Accessed 28-11-2010
http://en.wikipedia.org/wiki/MARC_standards Accessed 28-11-2010)
http://radar.oreilly.com/2006/12/web-20-compact-definition-tryi.html Accessed 26-11-2010
Wikipedia.org Accessed 04-12-2010).
http://en.wikipedia.org/wiki/File:Wikipedia_main_page_XHTML_source.png Accessed 04-12-2010
http://en.wikipedia.org/wiki/Semantic_Wikipedia Accessed 12-12-2010
http://www.freebase.com/home Accessed 02-12-2010
http://chrisbrookditablog.blogspot.com/2010/11/dita-session-05-introducing-web-20-and.html Accessed 13-11-2010
http://chrisbrookditablog.blogspot.com/2010/11/dita-session-05-introducing-web-20-and.html Accessed 13-11-2010
http://chrisbrookditablog.blogspot.com/2010/11/extensible-markup-language-xml.html Accessed 12-12-2010
http://chrisbrookditablog.blogspot.com/2010/11/semantic-web-technologies-resource.html Accessed 26-11-2010
http://chrisbrookditablog.blogspot.com/2010/12/semantic-web-technologies-owl.html Accessed 12-12-2010
http://data.gov.uk/ Accessed 10-12-2010)
http://dublincore.org/ (Dublin Core Metadata Initiative) Accessed 02-12-2010
http://www.ordnancesurvey.co.uk/oswebsite/ontology/ Accessed 01-12-2010
http://www.planningportal.gov.uk/PpApplications/genpub/en/Ecabinet. Accessed 04-12-2010