Semantic Service Discovery Techniques for the composable web
semantic service discovery techniques for web, This content contributes to the problem of resource and service discovery in the context of the composable web.
In the current web, mashup technologies allow developers reusing services and contents to build new web applications.
However, developers face a problem of information flood when searching for appropriate services or resources for their combination. To contribute to overcoming this problem, a framework is defined for the discovery of services and resources. In this framework, three levels are defined for performing discovery at content, discovery and agente levels. The content level involves the information available in web resources.
The web follows the Representational Stateless Transfer (REST) architectural style, in which resources are returned as representations from servers to clients.
These representations usually employ the HyperText Markup Language (HTML), which, along with Content Style Sheets (CSS), describes the markup employed to render representations in a web browser.
Although the use of SemanticWeb standards such as Resource Description Framework (RDF) make this architecture suitable for automatic processes to use the information present in web resources, these standards are too often not employed, so automation must rely on processing HTML.
This process, often referred as Screen Scraping in the literature, is the content discovery according to the proposed framework. At this level, discovery rules indicate how the different pieces of data in resources’ representations are mapped onto semantic entities. By processing discovery rules on web resources, semantically described contents can be obtained out of them.
The service level involves the operations that can be performed on the web. The current web allows users to perform different tasks such as search, blogging, e-commerce, or social networking.
To describe the possible services in RESTful architectures, a high-level feature-oriented service methodology is proposed at this level. This lightweight description framework allows defining service discovery rules to identify operations in interactions with REST resources.
The discovery is thus performed by applying discovery rules to contents discovered in REST interactions, in a novel process called service probing. Also, service discovery can be performed by modelling services as contents, i.e., by retrieving Application Programming Interface (API) documentation and API listings in service registries such as ProgrammableWeb.
For this, a unified model for composable components in Mashup-Driven Development (MDD) has been defined after the analysis of service repositories from the web. The agent level involves the orchestration of the discovery of services and contents. At this level, agent rules allow to specify behaviours for crawling and executing services, which results in the fulfilment of a high-level goal.
Agent rules are plans that allow introspecting the discovered data and services from the web and the knowledge present in service and content discovery rules to anticipate the contents and services to be found on specific resources from the web.
By the definition of plans, an agent can be configured to target specific resources. The discovery framework has been evaluated on different scenarios, each one covering different levels of the framework.
Contenidos a la Carta project deals with the mashing-up of news from electronic newspapers, and the framework was used for the discovery and extraction of pieces of news from the web. Similarly, in Resulta and VulneraNET projects the discovery of ideas and security knowledge in the web is covered, respectively.
The service level is covered in the OMELETTE project, where mashup components such as services and widgets are discovered from component repositories from the web.
The agent level is applied to the crawling of services and news in these scenarios, highlighting how the semantic description of rules and extracted data can provide complex behaviours and orchestrations of tasks in the web.
The main contributions of the thesis are the unified framework for discovery, which allows configuring agents to perform automated tasks. Also, a scraping ontology has been defined for the construction of mappings for scraping web resources.
A novel first-order logic rule induction algorithm is defined for the automated construction and maintenance of these mappings out of the visual information in web resources. Additionally, a common unified model for the discovery of services is defined, which allows sharing service descriptions.