Yuchi Tian: PhD Qualifying Exam Presentation

Monday, September 18, 2017 at 10:00 am in Rice 204



Committee Members: Baishakhi Ray (Advisor), Mary Lou Soffa (Chair), David Evans, and Samira Khan.

Title: Automatically Diagnosing and Repairing Error Handling Bugs in C

Correct error handling is essential for building reliable and secure systems. Unfortunately, low-level languages like C often do not support any error handling primitives and leave it up to the developers to create their own mechanisms for error propagation and handling. However, in practice, the developers often make mistakes while writing the repetitive and tedious error handling code and inadvertently introduce bugs. Such error handling bugs often have severe consequences undermining the security and reliability of the affected systems. Fixing these bugs is also tiring—they are repetitive and cumbersome to implement. Therefore, it is crucial to develop tool supports for automatically detecting and fixing error handling bugs.
To understand the nature of error handling bugs that occur in widely used C programs, we conduct a comprehensive study of real world error handling bugs and their fixes. Leveraging the knowledge, we then design, implement, and evaluate ErrDoc, a tool that not only detects and characterizes different types of error handling bugs but also automatically fixes them. Our evaluation on five open-source projects shows that ErrDoc can detect error handling bugs with 100% to 84% precision and around 95% recall, and categorize them with 83% to 96% precision and above 90% recall. Thus, ErrDoc improves precision up to 5 percentage points, and recall up to 44 percentage points w.r.t. the state-of-the-art. We also demonstrate that ErrDoc can fix the bugs with high accuracy.

Liliya McLean Besaleva: PhD Dissertation Defense

Friday, September 22nd, 2017 at 10:00 am in Rice 504


Committee Members: Alf Weaver (Advisor), Jack Stankovic (Chair), Worthy Martin, Hongning Wang and Larry Richards.

Title: Smart E-Commerce Personalization Using Customized Algorithms


Applicatiоns fоr machine learning algоrithms can be оbserved in numerоus places in оur mоdern lives. Frоm medical diagnоsis predictiоns tо smarter ways оf shоpping оnline, big fast data is streaming in and being utilized cоnstantly. Unfоrtunately, unusual instances оf data, called imbalanced data, are still being ignоred at large because оf the inadequacies оf analytical methоds that are designed tо handle hоmоgenized data sets and tо “smооth оut” оutliers. Cоnsequently, rare use cases оf significant impоrtance remain neglected and lead tо high-cоst losses оr even tragedies. In the past decade, a myriad оf apprоaches handling this prоblem that range frоm data mоdificatiоns tо alteratiоns оf existing algоrithms have appeared with varying success. Yet, the majоrity оf them have majоr drawbacks when applied tо different applicatiоn dоmains because оf the nоn-unifоrm nature оf the applicable data.

Within the vast domain of e-commerce, we have developed an innovative approach for handling imbalanced data, which is a hybrid meta-classificatiоn methоd that will cоnsist оf a mixed sоlutiоn оf multimоdal data fоrmats and algоrithmic adaptatiоns fоr an оptimal balance between predictiоn accuracy, sensitivity and specificity fоr multiclass imbalanced datasets. Оur sоlutiоn will be divided intо twо main phases serving different purpоses. In phase оne, we will classify the оutliers with less accuracy for faster, more urgent situations which require immediate predictions that can withstand pоssible errоrs in the classificatiоn. In phase twо, we will dо a deeper analysis оf the results and aim at precisely identifying high-cоst multiclass imbalanced data with larger impact. The gоal оf this wоrk is tо prоvide a sоlutiоn that imprоves the data usability, classificatiоn accuracy and resulting cоsts оf analyzing massive data sets (e.g., millions of shopping records) in e-commerce.

Nathan Brunelle: PhD Dissertation Defense

Monday, July 31, 2017 at 9:00 am in Rice 242
The committee members are: Gabriel Robins (Advisor), James Cohoon (Committee Chair), Kevin Skadron, Ke Wang and Mircea Stan (Minor Representative).

Title: Superscalable Algorithms

We propose two new highly-scalable approaches to effectively process massive data sets in the post- Moore’s Law era, namely (1) designing algorithms to operate directly on highly compressed data, and (2) leveraging massively parallel finite automata-based architectures for specific problem domains. The former method extends scalability by exploiting regularity in highly-compressible data, while also avoiding expensive decompression and re-compression. The latter hardware compactly encapsulates complex behaviors via simulation of non-deterministic finite-state automata. We evaluate the efficiency, extensibility, and generality of these non-traditional approaches in big data environments. By presenting both promising experimental results and theoretical impossibility arguments, we provide more comprehensive frameworks for future research in these areas.

Jonathan Dorn: PhD Dissertation Defense


Thursday, July 20, 2017 at 12:30 pm in Rice 536

Committee members: Westley Weimer (Advisor), Baishakhi Ray (Committee Chair), Jason Lawrence, Stephanie Forrest (UNM) and Chad Dodson (Minor Representative).

Title: Optimizing Tradeoffs of Non-Functional Properties in Software


Software systems have become integral to the daily life of millions of people. These systems provide much of our entertainment (e.g., video games, feature-length movies, and YouTube) and our transportation (e.g., planes, trains and automobiles). They ensure that the electricity to power homes and businesses is delivered and are significant consumers of that electricity themselves. With so many people consuming software, the best balance between runtime, energy or battery use, and accuracy is different for some users than for others. With so many applications playing so many different roles and so many developers producing and modifying them, the tradeoff between maintainability and other properties must be managed as well.

Existing methodologies for managing these “non-functional” properties require significant additional effort. Some techniques impose restrictions on how software may be designed or require time-consuming manual reviews. These techniques are frequently specific to a single application domain, programming language, or architecture, and are primarily applicable during initial software design and development. Further, modifying one property, such as runtime, often changes another property as well, such as maintainability.

In this dissertation, we present a framework, exemplified by three case studies, for automatically manipulating interconnected program properties to find the optimal tradeoffs. We exploit evolutionary search to explore the complex interactions of diverse properties and present the results to users. We demonstrate the applicability and effectiveness of this approach in three application domains, involving different combinations of dynamic properties (how the program behaves as it runs) and static properties (what the source code itself is like). In doing so, we describe the ways in which those domains impact the choices of how to represent programs, how to measure their properties effectively, and how to search for the best among many candidate program implementations. We show that effective choices enable the framework to take unmodified human-written programs and automatically produce new implementations with better properties—and better tradeoffs between properties—than before.

Ameer Mohammed: PhD Proposal

Thursday, May 18, 2017 at 3:00 pm in Rice 536

Committee: Mohammad Mahmoody (Advisor), David Evans (Chair), Jack Davidson, Denis Nekipelov (UVA Economics), and Sanjam Garg (UC Berkeley).

The Computational Complexity of Program Obfuscation

Recent developments in cryptography have given rise to a variety of extremely powerful tools some of which were even previously considered impossible. In this work, we focus on studying one such task called program obfuscation which, roughly speaking, is a procedure to transform a program P into an “equivalent” version Q that hides any information within the implementation of P without affecting its input/output functionality. In particular, we will study the complexity of indistinguishability obfuscation (IO) which is shown to be the “best possible” obfuscation and is versatile enough to enable a wide variety of new applications in cryptography.

Current theoretical constructions of IO rely on extremely strong and poorly understood cryptographic assumptions. Therefore basing IO on more robust and well-understood assumptions is a much desired goal. While the search for such weaker assumptions is still in progress, basing IO on standard assumptions seem out of reach.

Our main objective in this project is to see if there are inherent barriers in this regard. Namely, we aim to prove barriers between known classical, well-understood assumptions and IO. In order to prove such impossibility results on IO, we need to show that certain assumptions and/or objects are insufficient to imply IO. Traditionally, such impossibility results are proven within the “black-box framework”. However, due to the “non-black-box” nature of IO and its constructions, the existing black-box framework does not seem the right model for proving impossibility results.

We aim to first develop an extension to the black-box framework in such a way that it closely captures the type of techniques that are currently used for constructing IO. We will then use this model to prove lower bounds on the complexity of obfuscation by showing that certain assumptions and/or primitives, under this new framework, are insufficient for the construction of secure IO schemes. Furthermore, while this new extended framework facilitates a more comprehensive study of lower bounds for primitives that are currently realized from more specialized cryptographic objects, it is of independent interest and opens up the opportunity to revisit and improve upon existing well-known impossibility results.

Ritambhara Singh: PhD Proposal Presentation

Monday, May 1, 2017 at 11:00 am in Rice 504


Committee: Yanjun Qi (Advisor), Mary Lou Soffa (Chair), Gabriel Robins, Christina Leslie, Memorial Sloan Kettering Cancer Center; Mazhar Adli, UVA Dept. of Biochemistry and Molecular Genetics (Minor Representative).

Title: Fast and Interpretable Classification of Sequential Data in Biology

Machine learning models have shown great success in helping biologists to analyze sequential data (like DNA sequences or measurements of activity levels along the genome). However, the state-of-the-art machine learning methods face two hard challenges posed by sequential data: (1) Interpretability of the predictions for better insights (2) Slow computation due to expanding search space of sequential patterns. The proposed research aims to solve these two challenges by improving the existing state-of-the-art machine learning methods. Specifically, we focus on two popular models: Neural Networks (NNs) and String Kernel with Support Vector Machines (SK-SVM).

+Challenge (1): NNs can handle large sequential datasets accurately and in an efficient manner. However, NNs have widely been viewed as `black boxes’ due to their complexity, making them hard to understand. +Solution (1): We propose a unified architecture – Attentive-DeepChrome – that handles prediction and understanding in an end-to-end manner.

+Challenge (2): SK-SVM methods achieve high accuracy and have theoretical guarantees for smaller datasets (<5,000 samples). However, current implementations run extremely slow when we increase the dictionary size or allow more mismatches. +Solution (2): We propose a novel algorithm for calculating Gapped k-mer Kernel using Counting (GaKCo). This algorithm is fast, scalable and naturally parallelizable.

In summary, this research work expands the frontiers of the existing machine learning methods for improved analysis of sequential data in biology.

Ning Yu: PhD Qualifying Exam Presentation

Tuesday, May 2, 2017 at 10:00 am in Rice 404


Committee: Connelly Barnes (Advisor), Vicente Ordóñez (Chair), Yanjun Qi, and Kai-Wei Chang

Title: Learning to Detect Multiple Photographic Defects

In this work, we introduce the problem of simultaneously detecting multiple photographic defects. We aim at detecting the existence, severity, and potential locations of common photographic defects related to color, noise, blur and composition. The automatic detection of such defects could be used to provide users with suggestions for how to improve photos without the need to laboriously try various correction methods. Defect detection could also help users select photos of higher quality while filtering out those with severe defects in photo curation and summarization. To investigate this problem, we collected a large-scale dataset of user annotations on seven common photographic defects, which allows us to evaluate algorithms by measuring their consistency with human judgments. Our new dataset enables us to formulate the problem as a multi-task learning problem and train a multi-column deep convolutional neural network (CNN) to simultaneously predict the severity of all the defects. Unlike some existing single-defect estimation methods that rely on low-level statistics and may fail in many cases on natural photographs, our model is able to understand image contents and quality at a higher level. As a result, in our experiments, we show that our model has predictions with much higher consistency with human judgments than low-level methods as well as several baseline CNN models. Our model also performs better than an average human from our user study.

PhD Proposal Presentation: Ifat Emi

Tuesday, April 11, 2017 at 10:00 am in Rice 404


Committee: John A. Stankovic (Advisor), Hongning Wang (Chair), and Kai-Wei Chang; Minor Representatives: John Lach (ECE) and Laura Barnes (SYS).

Title: Towards Micro-Activity Recognition for Monitoring Activity Quality to Generate Notification


In order to notify users about potentially unsafe situations and to track mistakes or efficiency performing activities, it is important to monitor the quality of performing an activity and identify the missing/wrong steps. However, the state-of-the-art activity recognition frameworks ignore such details and impose constraints on sensor values, the types of detected activities (no parallel/interleaved/joint activities), or the number of users, which reduce the robustness of the system in the real world settings. Therefore, we propose a novel grammar based general purpose framework for modeling activities and micro-activities that retains the details of the activity steps, quantifies activity quality, and identifies the missing/incorrect activity steps. We are naming this framework as QuActive. We are also proposing to build a system based on the framework for detecting micro-activities, recognizing activities, monitoring the activity quality, and generating notifications to users about missing steps and unsafe situations. This proposal provides an overview of QuActive framework, the details of system design, the research challenges and proposed approaches, proposed experiments, related works, and some preliminary results.