torch.compile¶

compile() was introduced in PyTorch 2.0

Our default and supported backend is inductor with benchmarks showing 30% to 2x speedups and 10% memory compression on real world models for both training and inference with a single line of code.

Note

The compile() API is experimental and subject to change.

The simplest possible interesting program is the below which we go over in a lot more detail in getting started showing how to use compile() to speed up inference on a variety of real world models from both TIMM and HuggingFace which we co-announced here

import torch
def fn(x):
    x = torch.cos(x).cuda()
    x = torch.sin(x).cuda()
    return x
compiled_fn = torch.compile(fn(torch.randn(10).cuda()))

If you happen to be running your model on an Ampere GPU, it’s crucial to enable tensor cores. We will actually warn you to set torch.set_float32_matmul_precision('high')

compile() works over Module as well as functions so you can pass in your entire training loop.

The above example was for inference but you can follow this tutorial for an example on training

Optimizations¶

Optimizations can be passed in compile() with either a backend mode parameter or as passes. To understand what are the available options you can run torch._inductor.list_options and torch._inductor.list_mode_options()

The default backend is inductor which will likely be the most reliable and performant option for most users and library maintainers, other backends are there for power users who don’t mind more experimental community support.

There is some nuance involved in benchmarking torch.compile so we’ve provided a utility to make this simpler with bench_all()

You can get the full list of community backends by running list_backends()

compile

Optimizes given model/function using TorchDynamo and specified backend.

Troubleshooting and Gotchas¶

IF you experience issues with models failing to compile, running of out of memory, recompiling too often, not giving accurate results, odds are you will find the right tool to solve your problem in our guides.

Warning

A few features are still very much in development and not likely to work for most users. Please do not use these features in production code and if you’re a library maintainer please do not expose these options to your users Dynamic shapes dynamic=true and max autotune mode="max-autotune" which can be passed in to compile(). Distributed training has some quirks which you can follow in the troubleshooting guide below. Model export is not ready yet.

Learn more¶

If you can’t wait to get started and want to learn more about the internals of the PyTorch 2.0 stack then please check out the references below.

torch.compile¶

Optimizations¶

Troubleshooting and Gotchas¶

Learn more¶

Docs

Tutorials

Resources