Module 1 - Global Retrieval
Our primary goal in this module is to investigate the challenges and potentials of a semantically integrated public sector information locator service. Our schema integration approach involves the development of a uniform interface - mediated schema - that captures the core concepts of a PSI catalogue. Our data/instance integration approach involves the application of Information Retrieval and other metrics for correlating and integrating the contents of the catalogues.
Part 1: We harvested 5 catalogues of PSI holders around the world. These include the US Raw and Tool catalogues of data.gov, the CKAN dump of data.gov.uk, the OPSI Information Asset Registers, and Australia's online portal to 69 public sector datasets. These include over 9,000 records of PSI.
Browse the catalogues individually using the embedded Linked Data browser: Records Finder
Part 2: A simple cross-catalogue search engine and records correlator. This is a first step towards establishing links between records of the same or different catalogues. For this instalment we explore correlations at the tag/keyword level only. Our methodology includes a full employment of the Vector Space Model, as originally conceived for the SMART retrieval system. We further extend the interpretation of tags via their synonyms and stems using the Wordnet electronic lexicon. Queries are also adjusted with the same minimum expansion.
Explore correlations and search for records using the experimental HTML interface: Catalogues Correlator
Part 3: A more advanced retrieval engine. In this instalment we are working on a variant of the Binary Independence Model and Okapi BM25. The engine will enable automatic and iterative query expansion using a hybrid global and local blind feedback technique. Work in progress...