UIUC header
Mias_header2

2008 Summer Tutorials



All tutorials will occur in room 3405 of the Siebel Center

As part of the MIAS Data Sciences Summer Institute, renowned faculty from the University of Illinois Department of Computer Science will offer a collection of short courses designed to introduce students, scientists, and researchers to a variety of practices and software tools designed at UIUC in support of knowledge discovery. Because the tutorials are taught by some of the most innovative scientists in the world, the content will be up-to-the-moment and keenly relevant.

Enrollment in the tutorial includes the 15 hour course (over 5 days) and the opportunity to explore the possibility of collaboration with university faculty on projects of joint interest.

5/20-5/23 Accessing Multi-Modal Structured and Unstructured Information: Building Data-Aware Search Systems

Tue: 10am-12, 1-4pm
Wed, Thur, Fri: 1-4pm

Kevin Chang
Information abounds on the Internet, in multiple modalities and in structured and unstructured forms-- How can we access such information effectively? This tutorial will give a system perspective for building "data-aware" search services. First, to concretely study a novel search system, we will present the anatomy of "entity search"-- a new type of search for reaching directly to data entities (emails, phones, locations, etc.). Second, to develop new search services, we will introduce pertinent techniques, such as data crawling, information extraction, image recognition, inverted indexing and searching, and database querying. Each topic will highlight existing tools and resources, and emphasize hands-on exercises. Overall, this tutorial will lay the foundation for students to get started with their research projects.

5/27-5/30 Data Mining: Principles, Methods, and Applications

Tue, Wed, Thur, Fri: 10:30am-12, 1-3pm

Jiawei Han
We will offer a tutorial course on data mining, which introduces the concepts, algorithms, techniques, and systems of data mining, including (1) data preprocessing, (2) frequent pattern and correlation analysis, (3) cluster and outlier analysis, (4) mining sequential and complex structured data, (5) information network analysis, (6) mining data streams, (7) mining RFID, moving object and spatiotemporal data, and (8) data mining applications. The course may attract students who need to implement and/or use data mining methods and systems to analyze large amounts of data.

6/2-6/6 Machine Learning for Natural Language Processing and Information Extraction

Mon-Fri: 1-4pm

Dan Roth
Statistical and machine learning techniques have brought significant advances in language processing and information extraction, and have allowed researchers to start dealing robustly and broadly with realistic size problems. This short course will introduce some of the central learning frameworks and techniques that have emerged in this field and found applications in several areas in text processing. We will present the main theoretical paradigms used in natural language processing - learning theoretic, probabilistic, and information theoretic - the relations between them, and the main algorithmic techniques developed within these paradigms. Building on a brief theoretical introduction we will introduce key algorithmic techniques for classification (e.g naive Bayes, and variations of Perceptron and SVM) and structured prediction in the context of NLP and information extraction tasks. We will also discuss issues such as feature extraction and training paradigms (supervised; semi-supervised; EM), and address some of the issues involved in using these techniques in real world NLP applications.

Slides


6/9-6/13 Information Retrieval and Web Search

Mon-Fri: 1-4pm

ChengXiang Zhai
This tutorial will introduce the foundation and technologies of Web search. We will first systematically review the basic concepts, models, and techniques in information retrieval, which is the foundation of all search engine technologies. We will then discuss special challenges in Web search and review new technologies developed recently for Web search.

Slides


  • Day 1 Slides (ppt)
  • Day 2/3 Slides (ppt)
  • Day 4 Slides (ppt)
  • Day 5 Slides (ppt)

6/23-6/27 Computer Vision

Mon-Fri: 1-4pm

David Forsyth
This tutorial will be an intensive course in methods to interpret human activities in pictures and in video. This is a topic of wide current importance in security applications. Relevant material is currently scattered across the animation, computer vision, and tracking literature. The syllabus will include methods to find people in static images and in video; methods to recover the 3D configuration of the body from 2D image information; and methods to infer what a person is doing from this information.

2007 Summer Tutorials


5/14/07 - 7/11/07

5/21-5/25

Databases and Information Integration

Kevin Chang

This tutorial instructs students in the fundamentals of DBMS and then takes them through a state-of-the-art tour of issues and techniques in data integration. Throughout, we emphasize modeling, query processing, semantic integration, and managing uncertainty and inconsistency. Modern techniques from the database, information retrieval, and artificial intelligence communities are applied in problems in data integration arising from the web, Deep-Web, and other inconsistent sources.

5/28-6/1

Information Retrieval and Web Information Access

ChengXiang Zhai

We introduce the foundation of information retrieval as well as its application and new development in Web information access. We will start with basic IR retrieval models for text retrieval. Then, we will study the new challenges of the Web and new techniques in Web search, integration, and mining--for finding information, integrating dynamic "deep" sources, and discovering knowledge.

6/4-6/8

Machine Learning

Mark Sammons and Nick Rizzolo

This tutorial will introduce methodologies and tools both for preparing data for use with machine learning tools (including feature extraction) and for applying machine learning techniques and tools to practical problems.

Examples will be given in the textual domain, starting with free-form text and building machine-learning-based natural-language processing tools. Participants will have the opportunity to develop tools to solve three text-processing problems during the three sessions.

Slides


Tutorial 1

6/18-6/22

Computer Vision

David Forsyth

The tutorial will be based around our textbook, "Computer Vision: A Modern Approach," which is now used in all major departments teaching the topic. The syllabus will emphasize aspects of computer vision most relevant to information discovery and retrieval. In particular, we will examine different technologies for image feature extraction, object recognition, camera calibration, and linking information in images with text information, metadata, and information in other formats.

Slides

6/25-6/29

Machine Learning and Data Mining

Jiawei Han

We will offer a tutorial course on data mining and machine learning, which introduces the concepts, algorithms, techniques, and systems of data mining and machine learning, including (1) data preprocessing, (2) frequent pattern and correlation analysis, (3) supervised learning (classification), (4) unsupervised learning (cluster analysis), (5) mining sequential and complex structured data, (6) mining data streams, text data, Web data, spatiotemporal data, biomedical data, and other forms of complex data, and (7) data mining and machine learning applications. The course may attract studednts from computer science and other disciplines who need to implement and/or use data mining and machine learning methods and systems to analyze large amounts of data.

Slides


Slides for this tutorial can be found here.
Password: mias07