DNO: Optimizing Diffusion Noise Can Serve As Universal Motion Priors

¹ETH Zurich ²UC Berkeley
³VISTEC, Thailand ⁴Google

Abstract

We propose Diffusion Noise Optimization (DNO), a new method that effectively leverages existing motion diffusion models as motion priors for a wide range of motion-related tasks.

Instead of training a task-specific diffusion model for each new task, DNO operates by optimizing the diffusion latent noise of an existing pre-trained text-to-motion model. Given the corresponding latent noise of a human motion, it propagates the gradient from the target criteria defined on the motion space through the whole denoising process to update the diffusion latent noise. As a result, DNO supports any use cases where criteria can be defined as a function of motion.

In particular, we show that, for motion editing and control, DNO outperforms existing methods in both achieving the objective and preserving the motion content. DNO accommodates a diverse range of editing modes, including changing trajectory, pose, joint locations, or avoiding newly added obstacles.

In addition, DNO is effective in motion denoising and completion, producing smooth and realistic motion from noisy and partial inputs. DNO achieves these results at inference time without the need for model retraining, offering great versatility for any defined reward or loss function on the motion representation.

Summary

Key Ideas

We show that the diffusion latent noise (x_T) can be optimized using critertion function defined on the motion space (x₀) to serve as universal motion priors for a wide range of motion-related tasks.

Casual Summary

Many papers show that in VAE or GAN, we can optimize the latent noise according to a loss in the output space to obtain a prediction that satisfies certain properties. Then, what about the (supposedly) richer generative space of the diffusion model? Can we optimize it the same way we do with VAE or GAN?

In this paper, we show we can also change the output motion by optimizing the latent diffusion noise. This latent noise is very versatile as a motion prior.

How? We unroll the ODE chain to produce the output motion, compute loss, and backpropagate the gradient through the full denoising chain to the latent noise x_T. We demonstrate that this process is feasible. In addition, we do not need that many ODE steps for the optimization in the motion domain, which we can still hold in the GPU memory.
This enables us to use a pre-trained motion diffusion model to handle many arbitrary motion tasks, including motion editing, in-betweening, denoising, and avoiding obstacles, by only changing the objective function during test time optimization. The model is never trained to solve any of these tasks.

Highlights: DNO is a simple, versatile, and effective method that can be used for many tasks.

@inproceedings{karunratanakul2023dno, title={Optimizing Diffusion Noise Can Serve As Universal Motion Priors}, author={Karunratanakul, Korrawe and Preechakul, Konpat and Aksan, Emre and Beeler, Thabo and Suwajanakorn, Supasorn and Tang, Siyu}, booktitle={arxiv:2312.11994}, year={2023} }

DNO: Optimizing Diffusion Noise Can Serve As Universal Motion Priors

Abstract

Video

Summary

Key Ideas

Casual Summary

Results

DNO does not require any training (zero-shot)
and works with any motion task by changing the criterion function.

Motion Editing

With DNO, we can edit the original motions (blue) to produce output motions (orange)
that satisfy various editing objectives.

Motion In-betweening

DNO can infill the motion given a starting pose (red) and an ending pose (green).

Motion Denoising

Motion Blending

DNO can blend two motions (blue and green) to produce motion with smooth transition between them.

BibTeX

DNO: Optimizing Diffusion Noise Can Serve As Universal Motion Priors

Abstract

Video

Summary

Key Ideas

Casual Summary

Results

DNO does not require any training (zero-shot) and works with any motion task by changing the criterion function.

Motion Editing

With DNO, we can edit the original motions (blue) to produce output motions (orange) that satisfy various editing objectives.

Motion In-betweening

DNO can infill the motion given a starting pose (red) and an ending pose (green).

Motion Denoising

Motion Blending

DNO can blend two motions (blue and green) to produce motion with smooth transition between them.

BibTeX

DNO does not require any training (zero-shot)
and works with any motion task by changing the criterion function.

With DNO, we can edit the original motions (blue) to produce output motions (orange)
that satisfy various editing objectives.