AI TikTok Video Generator

The AI TikTok Video Generator is an innovative application designed to create trending TikTok videos automatically. This project leverages various technologies and APIs to generate engaging content, combining text-to-speech, video merging, and subtitle overlay functionalities. The primary aim of the project is to automate the creation of TikTok videos by using AI-generated scripts, audio, and curated video clips from Pexels.
Key Features
- Script Generation: Generates engaging scripts for TikTok videos based on user input.
- Text-to-Speech (TTS): Converts the generated script into speech using AI-powered TTS engines.
- Video Fetching: Searches and retrieves videos from Pexels based on the provided keyword.
- Video Merging: Merges multiple videos into a single video with a specified resolution.
- Audio-Video Synchronization: Merges the generated audio with the fetched video content.
- Subtitle Overlay: Adds subtitles to the video based on the generated script.
- Polling Mechanism: Ensures the asynchronous processing of video generation tasks with a status polling mechanism.
- Cloud Storage: Stores the final video output in Google Cloud Storage.
Tech Stack
Frontend:
- Next.js: A React framework for building the user interface.
- Tailwind CSS: For styling the application.
- React Hooks: To manage state and side effects in the application.
Backend:
- Node.js: JavaScript runtime for server-side scripting.
- Express.js: Web framework for handling API requests.
- FFmpeg: A multimedia framework to handle video processing.
- Google Cloud Storage: For storing and serving the generated videos.
- Pexels API: To fetch video clips based on the provided keywords.
- Text-to-Speech API: For converting text scripts to audio.
How It Works
User Input:
The user provides a text input and a keyword for the video.
Script Generation:
The backend generates a script based on the user input. This could be done using a pre-defined template or an AI-based text generator.
Text-to-Speech Conversion:
The generated script is sent to a Text-to-Speech (TTS) API, which returns an audio file of the script being read aloud.
Video Fetching:
The backend uses the Pexels API to search for video clips based on the provided keyword. These videos are then downloaded for further processing.
Video Merging:
Multiple video clips are resized to a common resolution and merged into a single video using FFmpeg.
Audio-Video Synchronization:
The generated audio is synchronized with the merged video. FFmpeg is used to combine the video and audio into a single output file.
Subtitle Overlay:
Subtitles are generated from the script and overlaid onto the video using FFmpeg. The subtitles are styled and positioned appropriately for visibility.
Processing and Storage:
The final video is processed, and the output file is saved to Google Cloud Storage.
Polling and Status Updates:
The frontend polls the backend to check the status of the video processing job. Once the processing is complete, the user is provided with a link to the final video.
Future Enhancements
- Enhanced Script Generation: Use AI models to generate more engaging and contextually relevant scripts.
- User Customization: Allow users to customize video templates, subtitle styles, and audio settings.
- Real-Time Updates: Implement WebSocket for real-time updates on the video processing status.
- Scalability: Deploy the backend on a scalable cloud platform to handle a larger number of concurrent users.
- Analytics: Provide analytics and insights on the generated videos' performance.