Computer Vision: From 3D Reconstruction to Visual Recognition
Explore 1600+ online courses from top universities. Join Coursera today to learn data science, programming, business strategy, and more.
When a 3-dimensional world is projected onto a 2-dimensional image, such as the human retina or a photograph, reconstructing back the layout and contents of the real-world becomes an ill-posed problem that is extremely difficult to solve. Humans possess the remarkable ability to navigate and understand the visual world by solving the inversion problem going from 2D to 3D. Computer Vision, a modern discipline of artificial intelligence, seeks to imitate such abilities of humans to recognize objects, navigate scenes, reconstruct layouts, and understand the geometric space and semantic meaning of the visual world. These abilities are critical in many applications including personal robotics, autonomous driving and exploration as well as photo organization, image or video retrieval and human-computer interaction.
This course delivers a systematic overview of computer vision, comparable to an advanced graduate level class. We emphasize on two key issues in modeling vision: space and meaning. We begin by laying out the main problems vision needs to solve: mapping out the 3D structure of objects and scenes, recognizing objects, segmenting objects, recognizing meaning of scenes, understanding movements of humans, etc. Motivated by these important problems centered on the understanding of space and meaning, we will study the fundamental theories and important algorithms of computer vision together, starting from the analysis of 2D images, and culminating in the holistic understanding of a 3D scene
Part 0: Introduction - What is computer vision?
Part 1: Visual understanding in 2D space
groups of pixels
object and scene recognition
Part 2: Perceiving and modeling the 3D space
capturing a picture in 3D
popping out a scene in 3D
popping out an object in 3D
mapping out a space
Part 3: Coherent understanding of the scene and the 3D space
object recognition in 3D space
visual recognition in context
Part 4: Functions and activities in the 3D scene
event recognition in images
action recognition in videos
vision and language