Liliya McLean Besaleva: PhD Dissertation Defense

Friday, September 22nd, 2017 at 10:00 am in Rice 504


Committee Members: Alf Weaver (Advisor), Jack Stankovic (Chair), Worthy Martin, Hongning Wang and Larry Richards.

Title: Smart E-Commerce Personalization Using Customized Algorithms


Applicatiоns fоr machine learning algоrithms can be оbserved in numerоus places in оur mоdern lives. Frоm medical diagnоsis predictiоns tо smarter ways оf shоpping оnline, big fast data is streaming in and being utilized cоnstantly. Unfоrtunately, unusual instances оf data, called imbalanced data, are still being ignоred at large because оf the inadequacies оf analytical methоds that are designed tо handle hоmоgenized data sets and tо “smооth оut” оutliers. Cоnsequently, rare use cases оf significant impоrtance remain neglected and lead tо high-cоst losses оr even tragedies. In the past decade, a myriad оf apprоaches handling this prоblem that range frоm data mоdificatiоns tо alteratiоns оf existing algоrithms have appeared with varying success. Yet, the majоrity оf them have majоr drawbacks when applied tо different applicatiоn dоmains because оf the nоn-unifоrm nature оf the applicable data.

Within the vast domain of e-commerce, we have developed an innovative approach for handling imbalanced data, which is a hybrid meta-classificatiоn methоd that will cоnsist оf a mixed sоlutiоn оf multimоdal data fоrmats and algоrithmic adaptatiоns fоr an оptimal balance between predictiоn accuracy, sensitivity and specificity fоr multiclass imbalanced datasets. Оur sоlutiоn will be divided intо twо main phases serving different purpоses. In phase оne, we will classify the оutliers with less accuracy for faster, more urgent situations which require immediate predictions that can withstand pоssible errоrs in the classificatiоn. In phase twо, we will dо a deeper analysis оf the results and aim at precisely identifying high-cоst multiclass imbalanced data with larger impact. The gоal оf this wоrk is tо prоvide a sоlutiоn that imprоves the data usability, classificatiоn accuracy and resulting cоsts оf analyzing massive data sets (e.g., millions of shopping records) in e-commerce.

Nathan Brunelle: PhD Dissertation Defense

Monday, July 31, 2017 at 9:00 am in Rice 242
The committee members are: Gabriel Robins (Advisor), James Cohoon (Committee Chair), Kevin Skadron, Ke Wang and Mircea Stan (Minor Representative).

Title: Superscalable Algorithms

We propose two new highly-scalable approaches to effectively process massive data sets in the post- Moore’s Law era, namely (1) designing algorithms to operate directly on highly compressed data, and (2) leveraging massively parallel finite automata-based architectures for specific problem domains. The former method extends scalability by exploiting regularity in highly-compressible data, while also avoiding expensive decompression and re-compression. The latter hardware compactly encapsulates complex behaviors via simulation of non-deterministic finite-state automata. We evaluate the efficiency, extensibility, and generality of these non-traditional approaches in big data environments. By presenting both promising experimental results and theoretical impossibility arguments, we provide more comprehensive frameworks for future research in these areas.

Jonathan Dorn: PhD Dissertation Defense


Thursday, July 20, 2017 at 12:30 pm in Rice 536

Committee members: Westley Weimer (Advisor), Baishakhi Ray (Committee Chair), Jason Lawrence, Stephanie Forrest (UNM) and Chad Dodson (Minor Representative).

Title: Optimizing Tradeoffs of Non-Functional Properties in Software


Software systems have become integral to the daily life of millions of people. These systems provide much of our entertainment (e.g., video games, feature-length movies, and YouTube) and our transportation (e.g., planes, trains and automobiles). They ensure that the electricity to power homes and businesses is delivered and are significant consumers of that electricity themselves. With so many people consuming software, the best balance between runtime, energy or battery use, and accuracy is different for some users than for others. With so many applications playing so many different roles and so many developers producing and modifying them, the tradeoff between maintainability and other properties must be managed as well.

Existing methodologies for managing these “non-functional” properties require significant additional effort. Some techniques impose restrictions on how software may be designed or require time-consuming manual reviews. These techniques are frequently specific to a single application domain, programming language, or architecture, and are primarily applicable during initial software design and development. Further, modifying one property, such as runtime, often changes another property as well, such as maintainability.

In this dissertation, we present a framework, exemplified by three case studies, for automatically manipulating interconnected program properties to find the optimal tradeoffs. We exploit evolutionary search to explore the complex interactions of diverse properties and present the results to users. We demonstrate the applicability and effectiveness of this approach in three application domains, involving different combinations of dynamic properties (how the program behaves as it runs) and static properties (what the source code itself is like). In doing so, we describe the ways in which those domains impact the choices of how to represent programs, how to measure their properties effectively, and how to search for the best among many candidate program implementations. We show that effective choices enable the framework to take unmodified human-written programs and automatically produce new implementations with better properties—and better tradeoffs between properties—than before.

Congratulations to our CS Award Winners!

We ended another successful year with an awards ceremony on May 3rd. We would like to congratulate everyone on all of their hard work this year and recognize the award winners.

The Louis T. Rader awards are given to undergraduates who demonstrate outstanding leadership and promise in education, research and service.

2017 Louis T. Rader Education Awards:
Sarah Bland, Kamille Foster, Sam Havron, Edward Kao, Melloy, Bret Quigley, John Lily Zamanali

2017 Louis T. Rader Research Awards:

Salah Assana, Sanil Rao, Jean Salac, Chandan Singh

2017 Louis T. Rader Service Awards:
Collin Berman, Cyrus Malekpour, Andrew Norton

Grad Student awards

Outstanding Research:
Jack Wadden
Rita Singh
Honorable Mention: Qingyun Wu, Ankur Sarker, Zhuozhao Li, Lin Yan

Outstanding Service:
Kevin Angstadt
Elahe Soltanaghaei

Outstanding Teaching:
Kevin Angstadt
Jack Lanchantin
Honorable Mention: Christopher (Jordan) Mincey and Anindya Prodhan

Congratulations to CS Ugrad: Chandan Singh!

chandanChandan Singh‘s senior thesis was named the winner in the Design track of the SEAS Undergraduate Research and Design Symposium.  What a fantastic recognition of his research.

Congratulations as well to Yanjun Qi, his advisor.  They are now working on submitting this for publication.

Chandan is a 4th-year CS major and has been admitted into Berkeley CS PhD program. This summer he will intern at Facebook.

Ritambhara Singh: PhD Proposal Presentation

Monday, May 1, 2017 at 11:00 am in Rice 504


Committee: Yanjun Qi (Advisor), Mary Lou Soffa (Chair), Gabriel Robins, Christina Leslie, Memorial Sloan Kettering Cancer Center; Mazhar Adli, UVA Dept. of Biochemistry and Molecular Genetics (Minor Representative).

Title: Fast and Interpretable Classification of Sequential Data in Biology

Machine learning models have shown great success in helping biologists to analyze sequential data (like DNA sequences or measurements of activity levels along the genome). However, the state-of-the-art machine learning methods face two hard challenges posed by sequential data: (1) Interpretability of the predictions for better insights (2) Slow computation due to expanding search space of sequential patterns. The proposed research aims to solve these two challenges by improving the existing state-of-the-art machine learning methods. Specifically, we focus on two popular models: Neural Networks (NNs) and String Kernel with Support Vector Machines (SK-SVM).

+Challenge (1): NNs can handle large sequential datasets accurately and in an efficient manner. However, NNs have widely been viewed as `black boxes’ due to their complexity, making them hard to understand. +Solution (1): We propose a unified architecture – Attentive-DeepChrome – that handles prediction and understanding in an end-to-end manner.

+Challenge (2): SK-SVM methods achieve high accuracy and have theoretical guarantees for smaller datasets (<5,000 samples). However, current implementations run extremely slow when we increase the dictionary size or allow more mismatches. +Solution (2): We propose a novel algorithm for calculating Gapped k-mer Kernel using Counting (GaKCo). This algorithm is fast, scalable and naturally parallelizable.

In summary, this research work expands the frontiers of the existing machine learning methods for improved analysis of sequential data in biology.

Ning Yu: PhD Qualifying Exam Presentation

Tuesday, May 2, 2017 at 10:00 am in Rice 404


Committee: Connelly Barnes (Advisor), Vicente Ordóñez (Chair), Yanjun Qi, and Kai-Wei Chang

Title: Learning to Detect Multiple Photographic Defects

In this work, we introduce the problem of simultaneously detecting multiple photographic defects. We aim at detecting the existence, severity, and potential locations of common photographic defects related to color, noise, blur and composition. The automatic detection of such defects could be used to provide users with suggestions for how to improve photos without the need to laboriously try various correction methods. Defect detection could also help users select photos of higher quality while filtering out those with severe defects in photo curation and summarization. To investigate this problem, we collected a large-scale dataset of user annotations on seven common photographic defects, which allows us to evaluate algorithms by measuring their consistency with human judgments. Our new dataset enables us to formulate the problem as a multi-task learning problem and train a multi-column deep convolutional neural network (CNN) to simultaneously predict the severity of all the defects. Unlike some existing single-defect estimation methods that rely on low-level statistics and may fail in many cases on natural photographs, our model is able to understand image contents and quality at a higher level. As a result, in our experiments, we show that our model has predictions with much higher consistency with human judgments than low-level methods as well as several baseline CNN models. Our model also performs better than an average human from our user study.