Links: ppt
This research aims at enabling effective access to structured information sources on the Internet. Over the past few years, the Web has deepened dramatically. A significant and increasing amount of information is hidden away on the "deep" Web, behind the query interfaces of searchable databases. There are numerous such autonomous and heterogeneous sources, each with a different schema and native query constraints. Because current crawlers cannot effectively query databases, such data is invisible to traditional search engines, and thus remains largely hidden from users
MetaQuerier for the Deep Web
We propose to build a metaquery system, to help users in finding and querying online databases effectively and uniformly. Our efforts aim at opening up the deep Web to users, by building a MetaQuerier; see the architecture below. On this wild frontier of the deep Web, the MetaQuerier will address the challenges of both exploration and integration. Our goal is thus twofold: First, to make the deep Web systematically accessible: the MetaExplorer will discover sources on the deep Web to build a searchable repository, in order to help users find sources useful for their information needs. Second, to make the deep Web uniformly usable: the MetaIntegrator will help users interact with online databases to ask queries.
1) Deep Web Explorer
The MetaExplorer project focuses on the discovery, modeling, and structuring of databases on the Web, to build a searchable source repository. This project develops a "search engine" of Web databases, including crawlers for efficiently discovering databases on the Internet, models designed to represent these databases, wrappers for automatically extracting their model parameters (eg., schema details on their query interfaces) and structure, and index of searchable Web sources.
2) Deep Web Integrator
The MetaIntegrator project focuses on the integration issues of online sources--i.e., to bring sources coherently together for query answering. We investigate source selection, query mediation, and schema integration for building MetaIntegrator. In studying large-scale integration, these thrusts benefit from the source repository of the companion MetaExplorer. We will investigate the key enabling technology of dynamic ad-hoc information integration. In contrast to traditional static systems, MetaIntegrator is dynamic -- new sources may be added any time when they are discovered.