Teaching a robot to perceive and navigate in an unstructured natural
world is a difficult task. Without learning, navigation systems are
short-range and extremely limited. With learning, the robot can be
taught to classify terrain at longer distances, but these classifiers
can be fragile as well, leading to extremely conservative planning. A
robust, high-level learning-based perception system for a mobile robot
needs to continually learn and adapt as it explores new
environments. To do this, a strong feature representation is necessary
that can encode meaningful, discriminative patterns as well as
invariance to irrelevant transformations. A simple realtime classifier
can then be trained on those features to predict the traversability of
the current terrain.
One such method for learning a feature representation is discussed in
detail in this work. Dimensionality reduction by learning an
invariant mapping (DrLIM) is a weakly supervised method for learning a
similarity measure over a domain. Given a set of training samples
and their pairwise relationships, which can be arbitrarily defined, DrLIM
can be used to learn a function that is invariant to complex
transformations of the inputs such as shape distortion and
rotation.
The main contribution of this work is a self-supervised learning
process for long-range vision that is able to accurately classify
complex terrain, permitting improved strategic planning. As a mobile
robot moves through offroad environments, it learns traversability
from a stereo obstacle detector. The learning architecture is composed
of a static feature extractor, trained offline for a general yet
discriminative feature representation, and an adaptive online
classifier. This architecture reduces the effect of concept drift by
allowing the online classifier to quickly adapt to very few training
samples without overtraining. After experiments with several different
learned feature extractors, we conclude that unsupervised or weakly
supervised learning methods are necessary for training general feature
representations for natural scenes.
The process was developed and tested on the LAGR mobile
robot as part of a fully autonomous vision-based navigation system.