left image


MEng Thesis Research Opportunities

Research assistantships for students working on a Masters of Engineering (MEng) thesis may be available with these CSAIL research projects. If you are interested, contact the researchers involved in these projects directly through their websites.

Note: MEng thesis research opportunities are open only to current MIT MEng students.

A Chance for Early Detection of Cognitive Deficits 

For several decades neurologists have used a deceptively simple test to help diagnose cognitive capabilities: patients are asked to draw a clock face showing a particular time. New technology -- a ballpoint pen that digitizes as you write -- has made it possible to collect data from this test that is hundreds of times more precise than anything that can be discerned from ink on paper, as well as enabling virtually instant analysis of the data. Our current software makes several hundred measurements on each test, enabling new avenues of diagnosis for a range of diseases. 

A multi-site clinical study currently underway is collecting data from 7 clinics around the US; to date data has been gathered from more than 1500 tests. Prelminary analysis of that data suggest that it enables diagnosing conditions such as Alzheimer's far earlier than they can otherwise be detected. 

This is an extraordinary opportunity for someone with a good background in statistical analysis and/or machine learning, or the willingness and the time to get up to speed on tools like SVM-Light. This is innovative research with a good chance for publication. 

UROP and MEng students will work closely with Professor Randall Davis and members of his group. Please send a brief description of relevant research and coursework background, short-term and longer-term availability and interest, and a copy of your resume to davis@csail.mit.edu.

Arabic Language Understanding for Spoken Dialogue Systems

We’re looking for a UROP or MEng student who will start around IAP. The student will work on an Arabic dialogue system that could answer users’ inquiries about landmarks such as restaurants around a city. Arabic language understanding will be explored, as well as crowd-sourcing-based data collection for Arabic. Dialogue control and language generation in Arabic will also be investigated.

Requirements: Java expertise, read/write Arabic.
This project is expected to turn into a Super UROP or MEng thesis. If interested, please send a CV to Jingjing Liu (jingl@csail.mit.edu).

Data Tamer: A Next-generation Data Curation System

Data curation is the act of discovering a data source(s) of interest, cleaning and transforming the new data, semantically integrating it with other local data sources, and deduplicating the resulting composite. There has been much research on the various components of curation (especially data integration and deduplication).  However, there has been little work on collecting all of the curation components into an integrated system.  Data Tamer is such an integrated system.

In addition, most of the previous work will not scale to the sizes of problems that we are finding in the field.  For example, one web aggregator (Goby.com) requires the curation of 80,000 URLs and a second biotech company (Novartis) has the problem of curating 8000 spreadsheets.  At this scale, data curation cannot be a manual (human) effort, but must entail machine learning approaches with a human assist when necessary.  Moreover, the problems we are encountering in real enterprises do not necessarily follow the typical machine learning paradigm.  Among other issues, all curation must be incremental, as new data sources are uncovered and must be curated over time. 

We are working on building Data Tamer into a complete end-to-end system.   We are looking for MEng students in the following areas:

* machine learning algorithms to perform attribute identification, grouping of attributes into tables, transformation of incoming data and deduplication.  

* data visualization so a human can examine a data source at will and specify manual transformations, as necessary

Distributed Access Control for Social Media

The goal of this project is to construct a fully distributed access control protocol, and implement a system, that supports the kind of friend-oriented sharing offered by Facebook and other social media tools. The protocol should allow users to define a set of "friends" and indicate that certain resources should only be accessible to those friends. It must also provide a mechanism that allows individuals to authenticate that they are members of a given "friends" group to gain access permissions. The authentication mechanism must be distributed (so anyone who wants to can implement and "authenticator" system. All of this must be done while simultaneously preserving privacy (of who is in what friends lists).

Contact e-mail: karger@mit.edu


Fresh Breeze: A Programmable Multicore System Architecture

Computation Structures Group, CSAIL
Jack B. Dennis, Professor Emeritus

Computer architects have found that putting several processor cores on the same chip is a better way of using silicon area than attempting to achieve greater performance by adding further complexity to a single processor. Moreover, the increased performance is obtained for significantly less energy consumption. However, the result so far is chips that are very difficult to program to realize their performance potential.

The Fresh Breeze Project is attacking this programmability problem using concepts from functional programming, principles of modular software construction, and global shared address spaces. Our approach is to incorporate several ideas that are significant departures from conventional thinking about multicore system architecture:
By implementing a shared global address space, the cost of implementing fine-grain cooperation of multiple processors can be greatly reduced. Also, the conventional distinction between "memory" and the file system can be abolished, simplifying programming, as has been demonstrated in Multics. A more radical departure from convention is the implementation of a cycle-free heap of information "chunks" that are created, used, and released, but never modified once created. Memory management is performed done by efficient hardware mechanisms.

More information may be found at http://www.csg.csail.mit.edu/Users/dennis.

Topics suitable for Master of Engineering Projects include: performance analysis of specific applications on a Fresh Breeze system; memory management studies; design of an external shared memory system for a system built of Fresh Breeze chips; failure recovery in a Fresh Breeze system; implementation of load balancing; transaction processing applications.

For more information, contact Prof. Jack Dennis (dennis@csail.mit.edu <mailto:dennis@csail.mit.edu>).

Haystack Project

The Haystack project aims to develop new tools to help people organize, manipulate, and retrieve all the information they encounter on a day to day basis. A major focus is personalization, adapting over time to the specific information needs and preferences of individual users.

The research combines ideas from databases, human-computer interaction, and machine learning. We explore database tools to build a data representation rich and flexible enough to record all the information any individual consider important, and in particular make heavy use of Semantic Web technology for manipulating richly structured information on the world wide web. We apply ideas from human-computer interaction to develop interfaces that can present that rich data model and let the user work with it. And we apply ideas from machine learning to help the system observe and adapt to the preferences and perceptions of its user. We've built a number of tools including the Haystack "universal desktop client" for managing any kind of information you care about, an Ajax tool called Exhibit for creating fancy interactive web pages without any programming, and the "Jourknow" system for managing those scraps of information that don't seem to fit anywhere. You can find out more about these and other projects at the Haystack project page at http://haystack.csail.mit.edu .

We have openings for self-motivated MEngs who are interested in extending the capabilities of the system at all layers---the database back-end, the machine-learning core, or the user interface. Initiative is key as MEngs are given substantial flexibility to choose their own project. You will join a team of 6 graduate students and 5 undergraduates already working on the project.

Applicants should email Prof. David Karger (karger@csail.mit.edu)

Information Scraps, Quick Notetaking, and Personal Information Organization

Our lives are filled with small, random scraps of information that seem to have no natural home. Where do we put them, and how do we find them later? We've created List.it (http://listit.csail.mit.edu/), a fast, lightweight browser extension for capturing and organizing such scraps. Listit has over 25,000 active users who have recorded more than 100,000 scraps. Analyzing them we've discovered important subpopulations such as packrats, minimalists, and spring cleaners. To advance our study of personal information organization, we want to study the activity of our current users (both the content they create and their interaction with it), add useful functionality to the tool (such as sharing, reminding, and context sensitive retrieval) and study the way users react to the new functionality.

Contact e-mail: karger@mit.edu

Interactive Data Visualization for Journalists using Wordpress

There's a new movement in journalism to incorporate rich data visualization in news stories, but many journalists lack that skills to create their own "news apps" for this purpose. We've prototyped a data visualization framework, Datapress (http://projects.csail.mit.edu/datapress) to support authoring (not programming) such visualizations in Wordpress, a popular platform for journalism. Your job is to flesh out this prototype. This will involve a cyclic process of contacting and working with journalists to support their use of Datapress, discovering what improvements need to be made to enhance the value of the tool, and implementing those improvements to the Datapress platform.


MIT Center for Collective Intelligence


The MIT Center for Collective Intelligence (http://cci.mit.edu) is focused on answering the question:  How can people and computers be connected so that--collectively--they act more intelligently than any person, group, or computer has ever done before? The Center includes faculty from around MIT, including CSAIL, Media Lab, and Sloan.

We expect to have opportunities for summer jobs, MEng, UROP, or other student work on a variety of projects, including: (a) creating “radically open” computer simulation environments in which thousands of people can collectively help figure out what to do about global climate change (see http://cci.mit.edu/research/climate.html), (b) using tools like Amazon Mechanical Turk to create a hybrid human/computer system to find and classify interesting examples of collective intelligence, (c) developing on-line games and other environments to measure and improve the collective intelligence of groups of people (see http://cci.mit.edu/research/measuring.html), and (d) combining human and machine intelligence in flexible new ways to make accurate predictions about future events such as product sales, political events, and outcomes of medical treatments (see http://cci.mit.edu/research/prediction.html).

Contact: Prof. Thomas Malone (malone@mit.edu) or Prof. Peter Szolovits (psz@mit.edu).

SciDB: An Open Source DBMS for Science and Engineering Applications

Contact:  Michael Stonebraker (stonebraker@csail.mit.edu)

The SciDB team is designing and building an open-source DBMS, optimized for science and engineering applications.  It implements a nested array data model, an array query language that borrow heavily from SQL, runs on a grid of Linux nodes connected by TCP/IP networking, and contains novel features such as uncertain data and provenance. Further information can be obtained from our web site www.scidb.org.

Two positions are available for MEng students.  The first one entails writing a collection of user-defined functions (UDFs) that implement parallel linear algebra operators and then benchmarking the SciDB versions with those of others.
The second position involves exploring how to provide provenance in SciDB, starting with how to provide client features efficiently and including a user notation for inquiring about provenance.
Both positions include one semester of support as an RA.

Secure Group Communication in Dynamic Mission-Critical Environments

We are looking for a research assistant to contribute to an ongoing research project in secure group communication in dynamic mission-critical environments. The project involves studying and experimenting with available solutions, as well as designing and prototyping new solutions (Java, C, and C++), which are driven by specific applications and environment characteristics.

A qualified student would have experience in network programming, practical network security, distributed systems, and cryptography.

A significant portion of research will be done on-site at MIT Lincoln Laboratory (www.ll.mit.edu), located in Lexington, MA. U.S. citizenship is required for this position.

Note: this position is now open.

Contact: Dr. Roger Khazan. (rkh@mit.edu).

SLS: Cross-domain language understanding and dialogue failure recovery


We have been investigating the deployment of smartphone-based spoken dialogue applications (on Android and iPhone/iPad devices) in several different domains such as movies, restaurants, flights, and weather, with each domain being a stand-alone application. Now a major challenge is the development of cross-domain speech interfaces, which could analyze the context of the conversation automatically and redirect the user to the corresponding dialogue path. A potentially useful application is for the vehicle environment, where the app navigation heavily relies on the speech input from users rather than gestures like typing or clicking.


The student will help us explore incorporating the conversational context into language understanding. We will investigate a topic detection approach that acts as a form of moderator to figure out which app is the appropriate one to field an incoming spoken query. We will also investigate dialogue recovery mechanisms and dialogue state tracking strategies in the presence of speech technology failure.


Requirements: machine learning background and Java expertise. This project could also start as a Super UROP project and turns into an MEng thesis later (see the UROP openings page). If interested, please send a CV to Jingjing Liu (jingl@csail.mit.edu).


SLS: Stochastic dialogue management via multi-layer classification


Many of the current dialogue systems have employed rule-based or template-based dialogue management strategies to highly control the system response and the dialogue flow. There have also been some studies on stochastic dialogue generation such as POMDP. Both have pros and cons, as it is hard to achieve a perfect dialogue performance relying on one sole solution.

In this project, we will explore automatic dialogue and response generation via the combination of stochastic approaches and heuristic methods. The student will explore a stochastic model to learn the best dialogue hypothesis based on crowd-generated data. We will start with a simple prototype framework such as an XML schema for defining tasks/actions, and then shift it towards automatic generation when we collect sufficient conversational data with the initial system setting. A hierarchical infrastructure will be investigated, where the higher layers are designed for categorizing the context of the conversation and the lower layers are designed for domain-specific word-based semantic understanding.

Requirements: machine learning background and Java expertise. This project could also start as a Super UROP project and turns into an MEng thesis later (see the UROP openings page). If interested, please send a CV to Jingjing Liu (jingl@csail.mit.edu).

Speech-based Interface for Medical Data Access

We’re looking for a UROP or MEng student who will start around IAP. The student will work on a medical domain spoken dialogue system, which harvests patient-provided medical data (such as drug reviews or vaccine reports) from the web and presents the summarized content to users via spoken conversations. Statistical models such as Conditional Random Fields will be explored for spoken language understanding, and data mining approaches such as topic modeling will be investigated for medical data analysis.

Requirements: Java expertise. This project is expected to turn into a Super UROP or MEng thesis. If interested, please send a CV to Jingjing Liu (jingl@csail.mit.edu).

Spoken Dialogue System for English Language Learning

We’re looking for a UROP or MEng student who will start around IAP. The student will work on a spoken dialogue system that will help Chinese speakers learn English. The
research focus will be machine translation, English language learning, and the development of the speech-based interface.

Requirements: Java expertise, read/write Mandarin Chinese.

This project is expected to turn into a Super UROP or MEng thesis. If interested, please send a CV to Jingjing Liu

The Future Textbook

Now that we can put textbooks on the web, how can we change them to make them better? How can we make them more dynamic, more adaptable to individual students, more sociable, or more informative? We've tackled some of these questions with Nb (http://nb.mit.edu/), a tool that lets students hold forum-type discussions in the margins of their online reading material. Nb is currently in use in roughly 25 classes at 6 universities. We have a long list of improvements to implement and assess in Nb, including social moderation, key-question highlighting, organization via tagging, chat and wiki functionality, support for sketching diagrams and other non-text annotations, hot-spot mapping for faculty, html and video annotation, and many others that students like you think of. Experience with Javascript and Django is a plus, as is good performance in 6.813/831.

Contact e-mail: karger@mit.edu

Transparent Web Browsing

Nowadays, all sorts of shady companies are collecting information about your browsing activities and using it for their own mysterious purposes. How could that information be used to your benefit? We propose to build Eyebrowse, a web browser extension that gathers information about your web browsing activities and shares that information (under your control) with others to mutual benefit. Potential applications include discovering interesting new web sites based on the browsing activity of your friends, improving web navigation by blazing trails to the important parts of web sites, supporting chance encounters when you and your friends are visiting the same web site, collaboratively browsing the web, identifying links between pages that ought to exist but don't, reporting on global web activity trends, tagging sites and pages according to the interests of people who visit them, and other exciting applications that you will come up with. Experience with Javascript, Django and general web application design is a plus, as is good performance in 6.813/831.

Contact e-mail: karger@mit.edu

Vision-based Automated Behavior Recognition


Automated quantitative analysis of animal behavior is an important tool for assessing the effects of diseases, drugs, gene mutations, and perturbations in neural circuits. Automating behavioral analysis addresses the inherent limitations of human assessment (time, cost, and reproducibility) and allows researchers to study behaviors over longer time scales than typically studied.
Developed from a computational model of motion processing in the primate visual cortex, we built a computer vision system to recognize a typical set of 8 behaviors of single mice in the home cage (see http://cbcl.mit.edu/software-datasets/hueihan/). In ongoing work, our goal is to significantly extend this system in two major directions:

1. Recognition of more behaviors in multiple mice (such as social behaviors)

2. Classification of human behaviors starting with autistic motor movements.

Throughout the process, we maintain a commitment to release these behavioral tools publicly to the academic research community.

Starting in IAP 2011, the Center for Biological and Computation Learning (CBCL) will have a position available for a MEng student to be in charge of extending this system. We expect that the research, as in our past work, will combine computer vision and machine learning.

Prerequisites: The ideal student will have experience and research interests in machine learning and computer vision complemented by very strong programming skills (Matlab, C++). We work closely with other research collaborators to develop these systems, so a passion for design and usability is a plus.

Contact: Please send questions or application with resume/CV to Prof. Tomaso Poggio (tp@csail.mit.edu).

URL: http://cbcl.mit.edu/