Research
Research Groups > Information Retrieval and Data-Mining
Javed Aslam Javed Aslam Kenneth Baclawski Kenneth Baclawski Albert-László Barabási Albert-László Barabási Mirek Riedewald Mirek RiedewaldThe quantity of digital data multiplies each year, with no end in sight. In the field of medicine, for instance, a typical clinical study once involved a few hundred patients. Today, automated human genome studies can include data from millions of patients. This acceleration of data collection can be seen throughout medicine, the physical sciences, and the social sciences. Since a single person or research group can’t manually process these millions of data points, there is an urgent need for intelligent systems to glean patterns and extract information.
Northeastern’s information retrieval and data-mining group addresses problems surrounding the processing, storage, and organization of vast quantities of information, with expertise in machine learning, spatial indexing, the Semantic Web, and database management.
In the area of machine learning, one goal is to build a diagnostic tool that can automatically look at patient records and learn to set rules and make predictions about diagnoses. In data-mining, Northeastern researchers have developed some of the most widely used search techniques and are now looking at refined queries that determine the quality of search results.
Team Achievements
- Developed the "optimal location query," a spatial indexing method that enables users to find "optimal" locations within a pre-specified area based upon multiple criteria. For example, the technique can be used to optimally site businesses within a target area based on population, demographics, etc.;
- Created tools to characterize and extract knowledge from biomedical literature, including the graphs and images that are critical parts of most biomedical research papers;
- Introduced applications of ontology-based computing, including the Semantic Web (a layer above the World Wide Web that understands the meaning of information and can make valid inferences about it) in the area of health sciences;
- Held leadership positions on the committees of top conferences, including the Association for Computing Machinery's International Conference on Research and Development in Information Retrieval (SIGIR), the International Conference on Management of Data (SIGMOD), the International Conference on Very Large Data Bases (VLDB), and the International Conference on Data Engineering (ICDE);
- Developed novel approaches for tracking massive quantities of observational environmental change data in collaboration with Cornell University's Laboratory of Ornithology.