The main objective of this research is to explore and evaluate a new workflow
which mixes user interaction and automated computation aspects for interactive virtual
cinematography that will better support user creativity. In particular, following some
preliminary results presented at ACM Multimedia 2011, we intend to
propose a novel workflow in which artificial intelligence techniques are employed to
generate a large range of viewpoint suggestions, to be explored by the users as a starting
point for creating shots and performing cuts. Typically, users would then reframe the
selected viewpoints to their needs, shoot the sequence, and request further suggestions for
the next shots. All next suggestions will rely on the existing shots to generate relevant
viewpoints that follow classical continuity rules between shots. A further, original and novel
way of interacting with such a system is by using motion-tracked cameras. Motion-tracked
cameras are devices tracked in both motion and position in a real-environment which
coordinates are mapped to a virtual camera in a virtual environment (see Fig 1). Enabling a
proper mix between hints provided by an automated system and interactive possibilities
offered by a motion-tracked camera represents an important scientific challenge and
potentially leads to a strong industrial impact.
Figure 1: A motion-tracked camera for previsualisation (NVizage, UK). In their current application,
motion-tracked cameras only perform a simple mapping between real and virtual camera
coordinates. Our project aims at proposing smart techniques to assist the filmmakers in the process
of placing the camera, selecting and shooting a sequence.
Scientific and Technical Challenges
The underlying scientific and technical challenges are :
- the ability to generate relevant viewpoint suggestions following classical cinematic
conventions. This requires the automated analysis of the 3D environment,
characters and characters actions together with the formalization in
computationally efficient models of major screen composition rules. This
formalization is needed for the generation and evaluation of the best camera
candidates to be presented at the user among millions of possibilities. Classical
techniques from the literature have only considered a few elements of
composition (viewpoint angle/character size) and generally rely on very rough
abstractions of objects. Using recent results [Abdullah etal. 2011], our objective is
to propose computationally efficient and precise techniques through more
sophisticated image evaluation techniques (namely GPU-based). - the ability to formalize and represent a number of characteristic elements of
cinematographic style. While presenting a wide range of suggestions represents a
benefit for the user (he doesn’t need to explore the whole tracking space to
explore viewpoints), there still remains hundreds of possibilities where to place
the camera. However, through years of practice, cinematography has authored
collections of characteristic shots and viewpoints for specific contexts and film
genres. As an example, western movies encompass well-known types of shots
and transitions to portray duels, and these characteristic elements are very often
re-used in a number of movies (not related to westerns) to convey similar
elements of tension relative to a (symbolic) duel. The issue here is to encode these
characteristic elements of style and genre and let the users select which genre they
prefer on a given scene. In this project we propose to characterize elements of
style and genre using reinforcement learning techniques from hand-annotated
real movies. - the integration of motion-tracked cameras in the workflow. While the workflow
we propose can be employed in classical modeling tools with traditional controllers
(mouse/keyboard), there is a clear benefit in using motion-tracked controllers
to improve interactivity. Given that tracking spaces are of limited size, there is
a need to provide novel interaction metaphors to ease the process of content
creation with tracked cameras. In particular there are key questions on how to
automatically handle the scaling issue (creating navigation paths around whole
cities and then around small targets in the scene), find means to mimic classical
camera setups through appropriate filtering of the tracked data (eg. mimic
steady cam motion), provide supports and hints to build traditional motions
(travellings, dollies, cranes, and specific effects such as vertigo effect).