Otmar Hilliges
Information at your fingertips – towards an interactive future
Virtual and augmented reality (VR/AR) is seeing a lot of interest both in industry and academia. With VR/AR displays being actively developed by leading tech companies, we can assume that high quality headsets both in terms of rendering quality as well as the form factor will soon be available. However, if we have an always-on, always-available display right in front of our eyes then the question of how we interact with the world in both its digital as well as physical incarnation becomes an increasingly pressing question.
The requirements imposed on an AR/VR interaction paradigm are many-fold. Such a system should be non-intrusive, work under occlusion and in different lighting conditions, be amenable to mobile and outdoor scenarios, and obviously be highly accurate. Furthermore, we will see a transition from explicit interaction to implicit interaction. That means instead of issuing fine-grained step-by-step commands, as is the case with mouse and keyboard interfaces, we seek an interaction paradigm in which interacting agents, powered by artificial intelligence, can perceive and analyze our actions and reason about our intent. Such systems will then be able to pro-actively display information at opportune moments and locations or even act in the physical world itself.
From a technical perspective this brings new challenges since most work in computer vision assumes an external camera pointed at a human and many traditional approaches do not translate to an ego-centric perspective. While the recent trend of deep-learning powered vision systems has brought automated action understanding closer to reality, a core problem in analyzing human activity is that it is tedious, expensive and sometimes impossible to collect appropriate training data, especially once we look beyond simple discriminative approaches and begin to strive for understanding of more complex activities. I will talk about deep-learning approaches that leverage knowledge about spatio-temporal structure that underlies human activity. In particular, we are interested in incorporating such knowledge into end-to-end trainable architectures such that the explicit representation of domain knowledge can be exploited for meta-learning and self-supervised forms of learning and in turn reduce the reliance on labelled training data. I will demonstrate how incorporation of such models can improve many tasks including human (hand) pose and eye gaze estimation, and how such approaches can be leveraged to build always available means of input in AR/VR. In the second half, the talk will explore challenges in interacting with virtual objects that live in the real-world including (lack of) haptic feedback and the danger of information overload.
Otmar Hilliges is currently an Associate Professor of computer science at ETH Zurich, where he leads the AIT lab. His research is at the intersection of machine learning, computer vision and human computer interaction (HCI). The main mission is to develop new ways for humans to interact with complex interactive systems (computers, wearables, robots), powered by advanced algorithms and technologies in machine perception, planning and data driven user modelling. Prior to joining ETH he was a Researcher at Microsoft Research Cambridge (2010-2013). His Diplom (equiv. MSc) in Computer Science is from Technische Universität München, Germany (2004) and his PhD in Computer Science from LMU München, Germany (2009). He spent two years as a postdoc at Microsoft Research Cambridge (2010-2012). He has published more than 70 peer-reviewed papers in the major venues on computer vision, HCI and computer graphics and received an ERC starting grant in 2017 for computational approaches to sensing based human-computer interfaces. Finally, 20+ patents have been filed in his name on a variety of subjects from surface reconstruction to AR/VR.