Artificial intelligence research lab OpenAI has announced details on its latest technology that will see huge improvements to 3D rendering.
OpenAI is the company behind text-to-image generator, DALL-E, which has now turned its attention to translate text prompts into 3D point clouds, which it will call POINT-E.
According to a paper (opens in new tab) published by OpenAI, POINT-E “produces 3D models in only 1-2 minutes on a single GPU”, compared with other current solutions which can take hours and require multiple GPUs.
An extract from the paper details POINT-E’s current place in the world of 3D model building:
“While our method still falls short of the state-of-the-art in terms of sample quality, it is one to two orders of magnitude faster to sample from, offering a practical trade-off for some use cases.”
It works by generating a single synthetic view with a text-to-image diffusion model. Then, a 3D point cloud is generated, which is easier to synthesize hence the reduced load on GPUs, though it doesn’t capture smaller details hence the trade-off mentioned in the paper.
A secondary AI has been trained to alleviate some of this, but the work explains that this can “sometimes miss thin/sparse parts of objects”, such as the stalks of a plant, giving the illusion of floating flowers.
OpenAI promises to have trained the artificial intelligence on several million 3D models and their metadata, though its use cases for now remain fairly limited.
One such example includes rendering real-world objects for 3D printing, though as the technology develops and becomes more refined, it’s likely that we’ll see it being used in more advanced cases such as gaming and even television.
The project’s open-source code is available on GitHub (opens in new tab),