Sora is a diffusion model(21,)(22,)(23,)(24,)(25); given input noisy patches (and conditioning information like text prompts), it’s trained to predict the original “clean” patches. Importantly, Sora is a diffusion transformer.(26) Transformers have demonstrated remarkable scaling properties across a variety of domains, including language modeling,(13,)(14) computer vision,(15,)(16,)(17,)(18) and image generation.(27,)(28,)(29)
Sora 是扩散模型 (21,) (22,) (23,) (24,) (25) ;给定输入噪声补丁(以及文本提示等调节信息),它被训练来预测原始的“干净”补丁。重要的是,Sora 是一个扩散变压器。 (26) Transformers 在多个领域展示了卓越的扩展特性,包括语言建模、 (13,) (14) 计算机视觉、 (15,) (16,) (17,) (18) 和图像生成。 (27,) (28,) (29)