Crafting Personalized Stories with AI: A Look into the Multimodal Storybook Maker
In this article we’ll explore how a new tool from aiTransformer, the Storybook Maker, combines large language model (LLM), image generation, and text-to-speech (TTS) technology for automated video storybook creation.
The Concept of Storybook Maker
The art of storytelling has evolved throughout history, from oral traditions to written narratives, and now into the digital age. The Storybook Maker is a tool seen as the next step in this progression. It was developed with the aim of automating the creation of personalized stories, incorporating multimodal learning to enhance the storytelling experience.
The concept originated from a recognition that traditional story creation methods were time-consuming and demanded substantial expertise. Storybook Maker seeks to democratize the storytelling process by making it accessible to individuals of all backgrounds and skill levels, even with zero skill in writing/drawing. Storybook Maker takes the experience a step further by incorporating text-to-speech and lip-syncing technology to create personalized presentations of each story and result in a video storybook. This feature allows users to listen to or watch their stories being told aloud, enhancing accessibility and catering to various learning styles or preferences.
Story Generation with LLMs
The story generation is centered around a robust language model capable of converting basic inputs such as prompts, URLs, or documents into compelling narratives. This module utilizes prompt engineering to establish predefined story genres, offering users versatility while prioritizing ease of use.
The language model is trained on vast amounts of data, allowing it to understand context and generate coherent narratives that cater to various themes and story lines. Unless instructed to use custom story (the “Generate voice with supplied text” method), the model doesn’t simply regurgitate existing stories; instead, it creates new ones based on user inputs. For example, the following is part of a story generated using the URL of the Mario Day wiki page with story genre “adventure”:
Once upon a time, in the bustling world of Mushroom Kingdom, March 10 was known as Mario Day. This special day, named after the iconic plumber himself, was celebrated by fans worldwide with great enthusiasm. Mario, the beloved hero, had saved Princess Peach countless times from the clutches of Bowser and his minions.
As another example, the following is part of a story generated using a simple prompt “E.T.” with story genre “science fiction”:
In the quiet of the cosmos, a strange signal pierced through the silence. An ET message, it read: “We come in peace, bearing advanced technology. Join us or perish.” As our scientists deciphered the code, a spaceship materialized. The fate of humanity hung in the balance.
Image Generation with Stable Diffusion
Storybook Maker’s image generation capability is powered by the latest open source text-to-image model Stable Diffusion, which uses deep learning techniques to analyze data and generate high-quality images based on user inputs. It’s capable of creating various types of images, including characters, landscapes, objects, and more. This flexibility allows users to create truly unique storybooks that reflect their creativity and imagination.
To easily create pictures in certain style, for example, the impressionist art style, an anime or game theme, prompt engineering is used to create over 100 predefined picture styles. Following pictures show some samples using the prompt “Santa is coming to town” with different styles.
Final Assembly — Video Creation
After pictures are generated in alignment with the storylines, Storybook Maker’s speech synthesizer technology animates the story by employing a personalized storyteller. This narrator verbalizes each tale while presenting the associated pictures, creating a cohesive multimedia experience for users.
The speech synthesizer employs advanced TTS technology to convert text into natural-sounding speech, and uses lip-syncing algorithm to synchronize the lips of a picture of the storyteller to speak the text, ensuring an engaging and immersive storytelling experience. Users can choose their preferred voice or picture for the storyteller, even clone their own voice and use their own picture to tell the story. This level of customization adds a personal touch to each storybook created using Storybook Maker.
User-Friendly Features and Customization
Storybook Maker offers simplicity with predefined story genres and picture styles, making it accessible to beginners who can easily produce a wide variety of storybooks. The tool also offers a range of templates that cater to different storytelling genres, making it easy for users to get started without having to worry about the technical aspects of creating their own stories from scratch.
Customizing Story and Accompanying Pictures
Advanced users have the option to utilize custom prompts for greater control over their stories’ narratives and pictures. When selecting the story generation method “Generate voice with supplied text and pictures with custom prompts”, user can enter custom story lines to generate voice and prompts enclosed with [] to generate pictures.
A picture will be generated per sentence, by default the sentence will be used as the prompt, but if a custom prompt (enclosed with []) is provided, it’ll be used to generate the picture. Prompt within [] will be used for generating the picture only, won’t be in the story telling — this allows more control to the accompanying picture without affecting the story itself.
When generating a story using non-custom method with prompt “Albert Einstein”, pronouns can be used all over the place, for example,
Amidst the clutter of Princeton’s library, Einstein pondered relativity. His brow furrowed beneath wild hair as he scribbled equations on a worn notebook. A librarian sneezed, startling him. He looked up, smiled, and returned to his thoughts, oblivious to the world around him.
Since a picture is generated per sentence, for the sentence using pronoun the subject is unknown to the image generator so it can generate any person, but if you include a custom prompt mentioning Einstein the correct person may be generated. For example, use custom method with the following prompt you’re more likely to get coherent pictures for the story.
Amidst the clutter of Princeton’s library, Einstein pondered relativity. His brow furrowed beneath wild hair as he scribbled equations on a worn notebook[Einstein’s brow furrowed beneath wild hair as he scribbled equations on a worn notebook]. A librarian sneezed, startling him. He looked up, smiled, and returned to his thoughts, oblivious to the world around him[Einstein looked up and smiled].
Another good use of the custom method is to generate storybooks of different picture styles for the same story. When using a non-custom method, the LLM can generate a different story every time even with the same input. So if you’d like to see different picture styles for the same story, you can download a generated story or use your own one and enter it using the custom method, then select different picture styles to generate.
Customizing Picture Style
The picture style can also be customized. The tool provides over 100 predefined picture styles with sample images, but there exists so many different styles and it’s impossible to include them all in the list, for instance, styles of specific artists are not included because numerous artists are recognizable by the image generator. To use a custom picture style, select the custom style option and enter the style description following a simple template: Just put the prompt to be stylized with a placeholder [prompt], e.g. magnificent [prompt] by Greg Rutkowski. If no placeholder is present, the style text will be appended to the prompt.
Potential Applications
While Storybook Maker is still a new tool, its potential applications are vast and varied. In education, it can be used as an engaging way for students to learn complex concepts by creating their own storybooks based on the material. Content creators can use it to produce immersive multimedia stories that captivate audiences and stand out from traditional text-based content. Furthermore, Storybook Maker has the potential to revolutionize personal storytelling by making it more accessible and engaging than ever before. The possibilities are endless, and as users continue to explore the tool’s capabilities, new applications will undoubtedly emerge. You can find some sample videos made with this tool in aiTransformer’s YouTube channel, including a country series based on the country wiki pages and a fairy tale series based on popular fairy tale stories.
Conclusion
The Storybook Maker is an innovative tool that combines advanced language models, image generation, and text-to-speech technology for automated storybook creation. Its user-friendly interface, customization options, and powerful features make it accessible to both beginners and advanced users.
As we’ve explored in this article, Storybook Maker offers a unique approach to storytelling that goes beyond traditional methods by adding visuals and narration to create immersive multimedia experiences. We encourage you to explore the tool for yourself and unleash your creativity in storytelling. The future of multimodal learning holds endless possibilities, and Storybook Maker is at the forefront of this exciting new frontier!