MARCH 13th, 2023

TALES OF TOMORROW

ISSUE #3

Hey everybody!

We hope you're having a great start to this very special week. Thanks to the Microsoft Germany CTO spilling the beans, the rumor mill is on fire! 🔥

In the meantime, AI is entering the creative industry with full force (stay tuned for Berlinale coverage in #4!), and with many transformative moments still ahead, let's enjoy every step of the journey! 🚀

Without further ado, here's your update.

🗞️ Updates

Language model news: GPT-4, VisualChatGPT & GPT-NeoX-20B

First, there were rumors. Then came the refutations. And suddenly it seemed like a regional Microsoft CTO casually announced that GPT-4 will be released this week. Speculation continues as to whether the successor to GPT-3 will be equipped with multimodality right away, but at least it won't be long until we find out! 🙏

Meanwhile, VisualChatGPT got silently released on GitHub! If you have some 50GB of disk space ready you can clone it here (attention Mac users you'll need this workaround), or try this Huggingface demo instead.

Speaking of which, EleutherAI's GPT-NeoX-20B is now also available on Huggingface. You can test and compare it to other open-source models here.

Putting things into place: GLIGEN

After ControlNet's spatial consistency and Composer's controllability paradigm, a joint research effort supported by Microsoft has added GLIGEN to the growing list of AI image generation breakthroughs: the new approach allows controlling object placement in various ways. Here's more info about GLIGEN and a demo to play around with this exciting new tool.

Midjourney V5: release likely within 14 days

Paying members now have access to Midjourney's infamous "V5 Rating Party" where users judge images created with a "next-gen image generation system" (Midjourney). Last time it took two weeks from the first "V4 Rating Party" to the final release of the V4 model. So, we're close! Here you can check out some V5 images and what they tell us about the new model's capabilities.

🔦 Project Spotlight

Closed Beta Test: Wonder Studio

Wonder Dynamics invites creatives to participate in the closed beta testing phase of Wonder Studio, a web-based AI animation tool that composes CG characters seamlessly into live-action scenes, without the need for any 3D software or production hardware - all you need is a camera. Watch the demo and request access here.

"Do The Evolution" by Ricardo Villavicencio

If you missed Ricardo Villavicencio's highly acclaimed AI-assisted short film, you can catch up below. Here's a look at how he integrated Runway Tools into his animation workflow.

Uberduck.ai

The open-source Voice AI community's collection of expressive AI voiceovers has now surpassed the 5000 voices mark! Paying members ($10/month) now get full API access to build their audio apps or synthesize themselves with their own custom voice clone ($50).

"Jukeboxed" by Merzmensch

JukeBox is a music-generating neural network created by OpenAI and trained on a dataset of 1.2 million songs from different artists and bands.

Artist and data journalist Merzmensch created an impressive series of AI-generated compositions and explained his workflow here. Listen to the other-wordly sounds of his Jukeboxed Series here.

🔬 Paper corner

3D Cinemagraphy from a Single Image

With a new technique called Novel View Synthesis a joint research effort between Adobe and Huazhong University of Science & Technology points to an interesting direction for AI cinema. Here are links to the project page and demo video.

Is Stable Diffusion reading our minds?

A paper submitted to this year's "Computer Vision and Pattern Recognition Conference" (CVPR) caught the attention of the AI creatives community which started to wonder if Stable Diffusion can generate images from people's brain activity.

While this wasn't quite the case, it's nonetheless super exciting how Stable Diffusion actually helped visualize images from people's brains!

Stable Diffusion being used to reconstruct images from people's brain activity.

Multimodality with Kosmos-1

With Kosmos-1 Microsoft unveiled a new multimodal large language model combining vision and language processing to perform a wide range of tasks. Its ability to perceive and learn from multimodal input sets it apart from traditional large language models, making it a powerful tool for advanced applications. You can check out the original paper here.

What else?

Runway's Gen1 rollout is in full swing, as this Twitter feed shows. Screenwriters may benefit from honing their prompt engineering skills and StyleGAN-T beats Stable Diffusion in speed (100x!) and image resolution.

Quote of the Week

We hope you enjoyed reading Tales Of Tomorrow and found some inspiration! 😊 If so, please share "Tales Of Tomorrow" on Twitter.

Also, if you want your project featured or if there is a topic you would like us to cover, please don't hesitate to reply to this email - don't be shy, we would love to hear your feedback and do our best to include your wishes in issue #4!

Until then, keep creating! ❤️

Peace.

Tristan

Want to support my work?

Buy me a coffee

113 Cherry St #92768, Seattle, WA 98104-2205
Unsubscribe · Preferences

Hi! I'm a Tristan.

Tales Of Tomorrow | Issue #3 | March 13th