Approccio GRDDL e RDF

Grazie a XTech 2005 si e’ resa disponibile in Rete una grande quantita’ di papers molto buoni che parlano degli ultimi trends e delle ultime tecnologie in modo chiaro ed esaustivo…

Una di queste che volevo approfondire e’ proprio Gleaning Resource Descriptions from Dialects of Languages (GRDDL)…

Premessa e piccola visione d’insieme

Questo post sara’ tecnico, per forza, ma per quelli che non vogliono leggere tutto in pratica cosa diavolo e’ GRDDL?

In breve e’ una metodologia che seguendo pochi passi permette di trarre informazioni dall’attuale Web e dalle pagine XHTML senza che l’autore debba per forza sapere RDF o altro…
Permette una estrazione di metadati dinamicamente dalle pagine se queste sono dotate di opportuni accorgimenti: e’ un importante tassello di passaggio al Semantic Web in modo piu’ indolore possibile…

Partiamo da Bridging XHTML, XML and RDF with GRDDL

Allora iniziamo da una definizione semplice, il ruolo di GRDDL:

GRDDL, a technology in development in W3C, allows to incorporate semantics from XML vocabularies and XHTML conventions into the Semantic Web by re-using existing extensibility hooks of the Web.

Quindi in pratica e’ una modalita’ per risolvere il gap di avere informazioni in formati non-semantici, come XHTML e dialetti vari in XML ( i famosi Microformats ), “renderizzati” in formati compatibili con il Web semantico, come RDF.

While RDF and OWL, W3C Recommendations since February 2004, provide a direct solution for re-using and combining vocabularies, many existing applications and markup systems cannot realistically be moved to this data model, and even today, many new applications are likely to be built on existing XML and HTML toolkits with their well-deployed workflow tools, rather on the less ubiquitous RDF ones.

In realta’ ci sono due motivi per questa situazione: uno e’ che molti considerano la complessita’ di aggiungere un livello RDF ai dati una cosa inutile e non viene ancora compresa l’importanza di una tale maggiore complessita’…
l’altro invece legato sempre al primo prevede l’uso e l’abuso in alcuni casi di XML per fare cose piu’ semplici, ma che sarebbero all’interno dei ruoli coperti da RDF…

Insomma RDF non viene considerato e molti invece non ne conoscono le potenzialita’…

In addition to these XML use-cases, the need to incorporate fine-grained metadata in HTML documents beyond what the HTML specification defines, has arisen again and again in the Web history, either to benefit from the deployment of well-known RDF vocabularies, or simply to use HTML as a lever to deploy what has been casually called the “lowercase semantic web”, namely the possibility to encode lightweight semantics through HTML markup conventions

E qui quindi abbiamo i principali vocabolari RDF creati per diverse necessita’:

Dublin Core: il vocabolario generale per la descrizione delle risorse
FOAF: descrive le relazioni tra persone al livello generale foaf:knows senza altre specificazioni
XFN invece nasce come dialetto XML per indicare gli autori delle pagineWeb e i loro rapporti ad un livello piu’ dettagliato del FOAF
GeoURL, parla invece delle informazioni geografiche legate ad una pagina Web
il trackback e pingback dei blogs: segnale che si vuole estendere la sematica stessa delle pagine Web

In tutti questi casi appena espressi e’ quindi auspicabile l’uso di RDF e GRDDL copre proprio questo ruolo:

GRDDL, standing for “Gleaning Resource Descriptions from Dialects of Languages” GRDDL, proposes a mechanism to make these associations possible for any XML or XHTML document.
GRDDL grounds these associations in URI space, making it simple to extend the collection of transformations as new XML vocabularies are deployed.

Parte tecnica: come lavora GRDDL

Visto che qui la carne al fuoco e’ tanta faro’ una sintesi e mostrero’ alcune parti interessanti, ma consiglio caldamente una lettura della parte originale per una comprensione specifica…

-> GRDDL mechanisms , che ha due sottosezioni:

Specifying a Transformation For a Family of Documents
Specifying a Transformation For an Individual Document

Gia’ da qui si puo’ capire che bisogna capire cosa sono le “famiglie” alle quali poi saranno associate le adeguate trasformazioni…
Risulta chiaro credo che il cuore di questo approccio e’ la trasformazione da parte di un motore dedicato, basato comunque su XSLT… usando la famiglia di riferimento come base per capire cosa estrarre dal documento iniziale.

Tornando alla nostra famiglia, essa non e’ altro che un insieme di documenti XML o XHTML che condividono la stessa struttura semantica, usando come riferimento e identificativo una URI ben definita ( applicando le regole del Architecture of the World Wide Web, Volume One )…

The idea behind GRDDL is that a family of documents will advertise its family via a well-known URI and that dereferencing this URI should lead a GRDDL processor to an algorithm to map from the structure to the semantics.
The XHTML1 namespace URI is an example of one such family identifier.

La cosa interessante a questo punto e’ come vengono gestiti i sotto-vocabolari di XHTML ( tra cui possiamo vedere i Microformats ) dal punto di vista XHTML…
Prendiamo come esempio XFN ( XHTML Friends Network )…

Qui si parla del modo esatto di aggiungere significato semantico all’HTML, usando le specifiche ufficiali, una cosa pero’ che non e’ molto risaputa:

The proper way to anchor these additional semantics in the Web is to use the profile attribute on the head element, as warranted by the HTML 4.01 specification HTML4:
**The profile attribute of the HEAD specifies the location of a meta data profile. The value of the profile attribute is a URI**.
User agents may use this URI in two ways:
As a globally unique name. User agents may be able to recognize the name (without actually retrieving the profile) and perform some activity based on known conventions for that profile. For instance, search engines could provide an interface for searching through catalogs of HTML documents, where these documents all use the same profile for representing catalog entries.
* As a link. User agents may dereference the URI and perform some activity based on the actual definitions within the profile (e.g., authorize the usage of the profile within the current HTML document). [The HTML4] specification does not define formats for profiles.
Indeed, an XHTML document using the set of relationships defined in XFN XFN must reference the XFN profile in its head element.

Un cenno alle trasformate per documenti individuali

Quando pero’ non ha senso riferirsi ad una famiglia di documenti c’e’ un modo per usare GRDDL:

an attribute (dataview:transformation) on the root element for the generic XML case
GRDDL attribute in an XML Schema document
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:data-view="http://www.w3.org/2003/g/data-view#" data-view:transformation="http://www.w3.org/2003/g/embeddedRDF.xsl">
 ** for XHTML**, given the syntactic constraints imposed by the required DTD validity, adding an attribute in the html root element is not an option. Thus, GRDDL proposes to use a specific rel attribute value (transformation), anchored in URI space through a defined profile attribute value:
  GRDDL link in an XHTML document
<html xmlns="http://www.w3.org/1999/xhtml"> <head profile="http://www.w3.org/2003/g/data-view"> <title>Some Document</title> <link rel="transformation" href="http://www.w3.org/2000/06/dc-extract/dc-extract.xsl" />

Questo tra l’altro lo approfondiro’ su un post futuro che parlera’ meglio del plugin dello Structured Blogging…

Ma alla fine come lavora questo GRDDL?

Questi sono i passi chiave:

A GRDDL processor encounters an XML document whose root element has a dereferencable namespace URI. If this document is an XHTML document, the head is examined for a derefencable profile URI.
Upon dereferencing the said URI, the GRDDL processor finds that it references one or more URIs identifying a set of algorithms (this step is described in more detail below).
If the identified algorithms are well-known to the GRDDL processor, it applies them to the initial XML (or XHTML) document; otherwise, it dereferences the URIs and finds, e.g., an XSL transformation as one of the representations.
Applying the XSL transformations to the initial document, the GRDDL processor extracts a set of RDF/XML statements that are asserted as being part of the intended meaning of the document.

Quindi in pratica il processore GRDDL cerca deferenziando le URI dei profili che incontra le trasformate desiderate che portano come risultato un set di triple RDF con i metadata delle pagine che si stanno guardando…

Questa e’ l’idea: il problema e’ che ci sono un paio di intoppi sul processo di deferenziazione delle URI e su come identificare le trasformate..

Lo scenario applicativo

Alla fine del documento vengono mostrati tre differenti scenari, ma quello piu’ interessante a mio avviso e’ l’ultimo, quello relativo alla comunita’ dei bloggers…

Vediamo di cosa si tratta…

In pratica il cuore della cosa e’ questo:

… the blogging community could create a profile (or a set of profiles) that would associate the existing conventions to a URI; if this profile was set up with a set of GRDDL transformations, any content referencing the said profile could automatically be processed by Semantic Web agents supporting GRDDL.
Moreover, any addition of a new convention to the given profile (as a result of a consensus in the community) could be supported in processing tools by simply adding a link to a new XSL style sheet to the said profile.

I vocabolari attualmente in uso nascono da esigenze della comunita’ che si esprimono in modi simili di interpretazione dei contenuti in certi contesti: il problema e’ che queste modalita’ possono cambiare nel tempo e occorre usare un qualcosa che sia mantenibile nel tempo…

Il suggerimento e’ questo:
usare un profilo o piu’ profili identificati da un URI e associare a questo profilo delle trasformate GRDDL per elaborare il contenuto in modo consistente dagli agenti automatici del Semantic Web…
E questo chiaramente permette una modifica molto semplice della trasformata XSLT nel caso in cui il profilo cambi…

Conclusioni

GRDDL proposes a set of mechanisms strongly anchored in the Web Architecture through its use of URIs, and has the potential to address a number of issues that have arisen through the co-deployment of XHTML, XML-based vocabularies and RDF-based technologies.

E’ chiaro che la nascita e lo sviluppo di GRDDL e’ la risposta giusta per unire il mondo che vede RDF troppo complesso da gestire direttamente e che crede nella flessibilita’ di XML… e’ un collante anche per il materiale pre-esistente e una sua catalogazione semantica piu’ semplice e indolore…

Riferimenti:

-> Bridging XHTML, XML and RDF with GRDDL
-> HTML, Metadata, and RDF
-> GRDDL
-> GRDDL specification updated works with Microformats
-> GRDDL on the desktop
-> Gleaning Resource Descriptions from Dialects of Languages (GRDDL) ( W3C Standard )

Approccio GRDDL e RDF

Matteo Brunati