Research objectives

The main objective of this research is to explore and evaluate a new workflow
which mixes user interaction and automated computation aspects for interactive virtual
cinematography that will better support user creativity. In particular, following some
preliminary results presented at ACM Multimedia 2011, we intend to
propose a novel workflow in which artificial intelligence techniques are employed to
generate a large range of viewpoint suggestions, to be explored by the users as a starting
point for creating shots and performing cuts. Typically, users would then reframe the
selected viewpoints to their needs, shoot the sequence, and request further suggestions for
the next shots. All next suggestions will rely on the existing shots to generate relevant
viewpoints that follow classical continuity rules between shots. A further, original and novel
way of interacting with such a system is by using motion-tracked cameras. Motion-tracked
cameras are devices tracked in both motion and position in a real-environment which
coordinates are mapped to a virtual camera in a virtual environment (see Fig 1). Enabling a
proper mix between hints provided by an automated system and interactive possibilities
offered by a motion-tracked camera represents an important scientific challenge and
potentially leads to a strong industrial impact.
vcam

Figure 1: A motion-tracked camera for previsualisation (NVizage, UK). In their current application,
motion-tracked cameras only perform a simple mapping between real and virtual camera
coordinates. Our project aims at proposing smart techniques to assist the filmmakers in the process
of placing the camera, selecting and shooting a sequence.

Scientific and Technical Challenges

The underlying scientific and technical challenges are :

  • the ability to generate relevant viewpoint suggestions following classical cinematic
    conventions. This requires the automated analysis of the 3D environment,
    characters and characters actions together with the formalization in
    computationally efficient models of major screen composition rules. This
    formalization is needed for the generation and evaluation of the best camera
    candidates to be presented at the user among millions of possibilities. Classical
    techniques from the literature have only considered a few elements of
    composition (viewpoint angle/character size) and generally rely on very rough
    abstractions of objects. Using recent results [Abdullah etal. 2011], our objective is
    to propose computationally efficient and precise techniques through more
    sophisticated image evaluation techniques (namely GPU-based).
  • the ability to formalize and represent a number of characteristic elements of
    cinematographic style. While presenting a wide range of suggestions represents a
    benefit for the user (he doesn’t need to explore the whole tracking space to
    explore viewpoints), there still remains hundreds of possibilities where to place
    the camera. However, through years of practice, cinematography has authored
    collections of characteristic shots and viewpoints for specific contexts and film
    genres. As an example, western movies encompass well-known types of shots
    and transitions to portray duels, and these characteristic elements are very often
    re-used in a number of movies (not related to westerns) to convey similar
    elements of tension relative to a (symbolic) duel. The issue here is to encode these
    characteristic elements of style and genre and let the users select which genre they
    prefer on a given scene. In this project we propose to characterize elements of
    style and genre using reinforcement learning techniques from hand-annotated
    real movies.
  • the integration of motion-tracked cameras in the workflow. While the workflow
    we propose can be employed in classical modeling tools with traditional controllers
    (mouse/keyboard), there is a clear benefit in using motion-tracked controllers
    to improve interactivity. Given that tracking spaces are of limited size, there is
    a need to provide novel interaction metaphors to ease the process of content
    creation with tracked cameras. In particular there are key questions on how to
    automatically handle the scaling issue (creating navigation paths around whole
    cities and then around small targets in the scene), find means to mimic classical
    camera setups through appropriate filtering of the tracked data (eg. mimic
    steady cam motion), provide supports and hints to build traditional motions
    (travellings, dollies, cranes, and specific effects such as vertigo effect).

Comments are closed.