CS Distinguished Speaker Series

Yi Chang, Huawei Research America

Friday, September 15, 2017

3:30 PM in Rice Hall Auditorium

Host: Hongning Wang

Title: Improving Relevance Ranking for Search Engines

Relevance ranking of search engine is a billion-dollar challenge, while there is a disadvantage of backwardness in web search competition. Learning to rank algorithms could effectively improve relevance ranking, yet it is a systematic effort to continuously improve the relevance of a search engine. In this talk, I will introduce the background and the most recent advances in this topic, in particular, three key techniques: ranking functions, semantic matching features and query rewriting. The major part of this talk is based on our KDD’2016 Best Paper Award (Applied Data Science Track).

About the speaker:  
Dr. Yi Chang is a Technical Vice President at Huawei Research America, where he is leading Huawei’s newly built Search Technology Lab. He was at Yahoo Research from 2006 to 2016, where he was a research director, and in charge of relevance of Yahoo’s web search engine and vertical search engines. He has broad research interests on information retrieval, data mining, applied machine learning, and natural language processing. He has published more than 100 research papers in premium conferences or journals, and received the Best Paper Award on WSDM’2016, the Best Paper Award on KDD’2016 separately. He has actively involved in multiple academia services, and he will serve as the general chair of WSDM’2018.

Yuchi Tian: PhD Qualifying Exam Presentation

Monday, September 18, 2017 at 10:00 am in Rice 204



Committee Members: Baishakhi Ray (Advisor), Mary Lou Soffa (Chair), David Evans, and Samira Khan.

Title: Automatically Diagnosing and Repairing Error Handling Bugs in C

Correct error handling is essential for building reliable and secure systems. Unfortunately, low-level languages like C often do not support any error handling primitives and leave it up to the developers to create their own mechanisms for error propagation and handling. However, in practice, the developers often make mistakes while writing the repetitive and tedious error handling code and inadvertently introduce bugs. Such error handling bugs often have severe consequences undermining the security and reliability of the affected systems. Fixing these bugs is also tiring—they are repetitive and cumbersome to implement. Therefore, it is crucial to develop tool supports for automatically detecting and fixing error handling bugs.
To understand the nature of error handling bugs that occur in widely used C programs, we conduct a comprehensive study of real world error handling bugs and their fixes. Leveraging the knowledge, we then design, implement, and evaluate ErrDoc, a tool that not only detects and characterizes different types of error handling bugs but also automatically fixes them. Our evaluation on five open-source projects shows that ErrDoc can detect error handling bugs with 100% to 84% precision and around 95% recall, and categorize them with 83% to 96% precision and above 90% recall. Thus, ErrDoc improves precision up to 5 percentage points, and recall up to 44 percentage points w.r.t. the state-of-the-art. We also demonstrate that ErrDoc can fix the bugs with high accuracy.

Russel Pears: “Mining Data Streams: Issues, Challenges and Some Solutions”


Speaker:          Russel Pears*

Date:     Tuesday, August 29

Time:    3:30-4:30 p.m.

Location:          Rice Hall, room 242

Host:                 Tom Horton (tbh3f)

Title:  Mining Data Streams: Issues, Challenges and Some Solutions

Abstract:   Data streams present unique challenges in terms of mining and knowledge extraction. Due to the open ended and fast data arrival rates associated with many data streams, standard machine learning approaches cannot be applied due to memory and speed constraints. In addition to these challenges, very often data streams are dynamic in nature with the underlying data distribution being subject to change over time.

In the first part of this talk I will touch on some of the methods that have been proposed to deal with the issues listed above. The second part of the talk will concentrate on my recent approach which explores a solution based on sensing stream volatility and tailoring the mining strategy to the level of volatility in the stream. Preliminary results from an experimental study have revealed that significant speed ups over state of the art approaches can be achieved while maintaining prediction accuracy.

About the speaker:  Russel Pears is currently attached to the Department of Computer Science at the Auckland University of Technology (AUT) in New Zealand. Russel’s career spans more than 3 decades in tertiary education, the last 16 of which has been at AUT University. During this period he has taught across a wide spectrum of courses in the Computer Science curriculum. He has also held senior leadership positions such as Programme Leader for the MSc and PhD programmes run by the School of Engineering. Computing and Mathematical Sciences at AUT University.

Russel’s research interests are in the Data Mining and Machine Learning areas where he has published widely in peer reviewed International conferences and journals. He currently supervises 4 Doctoral students and 2 MSc students in their thesis research.

Liliya McLean Besaleva: PhD Dissertation Defense

Friday, September 22nd, 2017 at 10:00 am in Rice 504


Committee Members: Alf Weaver (Advisor), Jack Stankovic (Chair), Worthy Martin, Hongning Wang and Larry Richards.

Title: Smart E-Commerce Personalization Using Customized Algorithms


Applicatiоns fоr machine learning algоrithms can be оbserved in numerоus places in оur mоdern lives. Frоm medical diagnоsis predictiоns tо smarter ways оf shоpping оnline, big fast data is streaming in and being utilized cоnstantly. Unfоrtunately, unusual instances оf data, called imbalanced data, are still being ignоred at large because оf the inadequacies оf analytical methоds that are designed tо handle hоmоgenized data sets and tо “smооth оut” оutliers. Cоnsequently, rare use cases оf significant impоrtance remain neglected and lead tо high-cоst losses оr even tragedies. In the past decade, a myriad оf apprоaches handling this prоblem that range frоm data mоdificatiоns tо alteratiоns оf existing algоrithms have appeared with varying success. Yet, the majоrity оf them have majоr drawbacks when applied tо different applicatiоn dоmains because оf the nоn-unifоrm nature оf the applicable data.

Within the vast domain of e-commerce, we have developed an innovative approach for handling imbalanced data, which is a hybrid meta-classificatiоn methоd that will cоnsist оf a mixed sоlutiоn оf multimоdal data fоrmats and algоrithmic adaptatiоns fоr an оptimal balance between predictiоn accuracy, sensitivity and specificity fоr multiclass imbalanced datasets. Оur sоlutiоn will be divided intо twо main phases serving different purpоses. In phase оne, we will classify the оutliers with less accuracy for faster, more urgent situations which require immediate predictions that can withstand pоssible errоrs in the classificatiоn. In phase twо, we will dо a deeper analysis оf the results and aim at precisely identifying high-cоst multiclass imbalanced data with larger impact. The gоal оf this wоrk is tо prоvide a sоlutiоn that imprоves the data usability, classificatiоn accuracy and resulting cоsts оf analyzing massive data sets (e.g., millions of shopping records) in e-commerce.

Nathan Brunelle: PhD Dissertation Defense

Monday, July 31, 2017 at 9:00 am in Rice 242
The committee members are: Gabriel Robins (Advisor), James Cohoon (Committee Chair), Kevin Skadron, Ke Wang and Mircea Stan (Minor Representative).

Title: Superscalable Algorithms

We propose two new highly-scalable approaches to effectively process massive data sets in the post- Moore’s Law era, namely (1) designing algorithms to operate directly on highly compressed data, and (2) leveraging massively parallel finite automata-based architectures for specific problem domains. The former method extends scalability by exploiting regularity in highly-compressible data, while also avoiding expensive decompression and re-compression. The latter hardware compactly encapsulates complex behaviors via simulation of non-deterministic finite-state automata. We evaluate the efficiency, extensibility, and generality of these non-traditional approaches in big data environments. By presenting both promising experimental results and theoretical impossibility arguments, we provide more comprehensive frameworks for future research in these areas.

Jonathan Dorn: PhD Dissertation Defense


Thursday, July 20, 2017 at 12:30 pm in Rice 536

Committee members: Westley Weimer (Advisor), Baishakhi Ray (Committee Chair), Jason Lawrence, Stephanie Forrest (UNM) and Chad Dodson (Minor Representative).

Title: Optimizing Tradeoffs of Non-Functional Properties in Software


Software systems have become integral to the daily life of millions of people. These systems provide much of our entertainment (e.g., video games, feature-length movies, and YouTube) and our transportation (e.g., planes, trains and automobiles). They ensure that the electricity to power homes and businesses is delivered and are significant consumers of that electricity themselves. With so many people consuming software, the best balance between runtime, energy or battery use, and accuracy is different for some users than for others. With so many applications playing so many different roles and so many developers producing and modifying them, the tradeoff between maintainability and other properties must be managed as well.

Existing methodologies for managing these “non-functional” properties require significant additional effort. Some techniques impose restrictions on how software may be designed or require time-consuming manual reviews. These techniques are frequently specific to a single application domain, programming language, or architecture, and are primarily applicable during initial software design and development. Further, modifying one property, such as runtime, often changes another property as well, such as maintainability.

In this dissertation, we present a framework, exemplified by three case studies, for automatically manipulating interconnected program properties to find the optimal tradeoffs. We exploit evolutionary search to explore the complex interactions of diverse properties and present the results to users. We demonstrate the applicability and effectiveness of this approach in three application domains, involving different combinations of dynamic properties (how the program behaves as it runs) and static properties (what the source code itself is like). In doing so, we describe the ways in which those domains impact the choices of how to represent programs, how to measure their properties effectively, and how to search for the best among many candidate program implementations. We show that effective choices enable the framework to take unmodified human-written programs and automatically produce new implementations with better properties—and better tradeoffs between properties—than before.