Computer Vision and Pattern Recognition Research Group

Making the computer see, as a human being would, is the goal of machine vision. This 40-year-old field of research has seen many success stories such as face detection in cameras, optical character recognition for checks, fingerprint matching, human-level object detection, autonomous desert driving, and breathtaking visual effects such as fly-around in the movie industry. However, the general computer vision problem is far from being solved. There are many areas that need a substantial amount of work to be useful change the way we work. For example, autonomous urban driving using visual navigation, human behavior identification for surveillance and helping the elderly, combining visual recognition with other forms of information such as text, registering a tumor for image-guided surgery and many other problems are far from being solved or need improved solutions. There is, then, much work to be done to make the machines see as we do. Machine Vision Group attempts to solve several such problems.

See our list of publications.

Ongoing Projects

Domain Generalization

In computer vision, domain generalization focuses on training models that generalize well to unseen domains without access to data from these domains during training. For example, a classifier trained on real photos may fail in the hand-sketch domain. We attempt to tackle this issue using identifying invariant features, features that truly represent classes, and using vision-language models to guide the classifiers.

See, e.g., Test-Time Optimization for Domain Adaptive Open Vocabulary Segmentation, A Feature Generator for Few-Shot Learning

Context-Aware Occlusion Removal

In this work, we identify objects that do not relate to the image context as occlusions and remove them, reconstructing the space occupied coherently. We detect occlusions by considering the relation between foreground and background object classes represented by vector embeddings, and remove them through inpainting. We use deep networks for semantic segmentation, and word embeddings generated by the word-to-vector model in this work.

Extensions to Capsule Networks

We extended the recent capsule networks model, a deep neural network that better models hierarchical relationships, taking several paths. In the TextCaps work, we adjust the instantiation parameters with random controlled noise to generate new training samples from the existing samples, with realistic augmentations which reflect actual variations that are present in human hand writing. Our results with a mere 200 training samples per class surpass existing character recognition results in MNIST and several other datasets. In DeepCaps, we developed a deep capsule network architecture which uses a novel 3D convolution based dynamic routing algorithm. Further, we propose a class-independent decoder network, which strengthens the use of reconstruction loss as a regularization term. This leads to an interesting property of the decoder, which allows us to identify and control the physical attributes of the images represented by the instantiation parameters.

Gait Analysis

There are several systems that use one or several Kinect sensors for human gait analysis, particularly for diagnosis of patients. However, due to the limited depth sensing range of the Kinect—a sensor manufactured for video gaming—the depth measurement accuracy reduces with distance from the Kinect. In addition, self-occlusion of the subject limits the accuracy and utility of such systems. We overcome these limitations by first by using a two-Kinect gait analysis system and second by mechanically moving the Kinects in synchronization with the test subject and each other. These methods increase the practical measurement range of the Kinect based system whilst maintaining the measurement accuracy.

Vision Processor Design

The widespread use of high definition cameras for surveillance and related tasks has given rise to the concept of edge computing as transmitting and processing video streams in real time have become challenging. However, edge computing at low power and lower cost is difficult with general purpose processor hardware inside cameras. Finding a solution that meets the above requirements and demonstrates flexibility to handle diverse conditions is challenging. We design processors geared for computer vision tasks to overcome this challenge.

Some of the above research projects were funded by the National Research Council, National Science Foundation, and Senate Research Committee of the University of Moratwua.