UIUC header
Mias_header2

Contextual Text Mining with Probabilistic Topic Models

Links: ppt

Participants: ChengXiang Zhai

The explosive growth of information demands powerful text mining tools to help us digest information and discover hidden knowledge in text. Text analysis is often associated with various kinds of context, such as time, location, and sources. Given any text data with context information, we often would like to extract the subtopics or themes from text and analyze their variations over context, e.g., to reveal spatiotemporal variations of a subtopic like "government response" in blog articles about hurricane Katrina. In this project, we are developing general probabilistic models and new algorithms for discovering and analyzing various contextual patterns from text, which we refer to as contextual text mining. The proposed models have broad applications in multiple domains to help understand topic evolutions, spatiotemporal impact of events, public opinions, and detect topic related social communities in arbitrary text collections. The extracted topical patterns can reveal hidden associations and latent knowledge in text, and provide evidence for decision-makers to use in making policy decisions.

Contextual Text Mining

  • Discover hidden topics/themes from text - What did people say about hurricane Katrina in blogs, in news, ...?
  • Reveal topic variations over contexts - How do opinions about government response in hurricane Katrina vary over different states?
  • Reveal correlations of topical patterns and context variables - Have positive opinions about Iraq war in blog articles been affected by special V programs Covering the war?

Key Technologies

  • General contextual text mining models - Contextual probabilistic latent semantics analysis model
  • General probabilistic topic labeling - Maximizing label-topic mutual information
  • Information retrieval with language models - Probabilistic text representation and matching

Applications

  • Opinion analysis
  • Policy imact analysis
  • Event impact analysis
  • Media effect
  • Causal analysis
  • Topic trend analysis
  • ...