
This property is reflected by the impressive zero-shot FID score of 6.66 on the COCO dataset (FID is a main metric used to evaluate the performance of text-to-image models the lower the score, the better). Until now, these use cases have been challenging for most text-to-image models. Incorporating the intelligence of the T5 model, DeepFloyd IF generates coherent and clear text alongside objects of different properties appearing in various spatial relations. A significant amount of text-image cross-attention layers also provides better prompt and image alliance.Īpplication of text description into images: The generation pipeline utilizes the large language model T5-XXL-1.1 as a text encoder. In line with other Stability AI models, Stability AI intends to release a DeepFloyd IF model fully open source at a future date. Today Stability AI, together with its multimodal AI research lab DeepFloyd, announced the research release of DeepFloyd IF, a powerful text-to-image cascaded pixel diffusion model.ĭeepFloyd IF is a state-of-the-art text-to-image model released on a non-commercial, research-permissible license that provides an opportunity for research labs to examine and experiment with advanced text-to-image generation approaches.
