API Documentation
Complete reference for the TTSFM Text-to-Speech API. Free, simple, and powerful.
Overview
The TTSFM API provides a modern, OpenAI-compatible interface for text-to-speech generation. It supports multiple voices, audio formats, and includes advanced features like text length validation and batch processing.
http://ttsapi.site/api/
Key Features
- 11 different voice options
- Multiple audio formats (MP3, WAV, OPUS, etc.)
- Text length validation (4096 character limit)
- Automatic text splitting for long content
- Batch processing capabilities
- Combined audio generation from long text
- Real-time status monitoring
Authentication
Currently, the API supports optional API key authentication. If configured, include your API key in the request headers.
Authorization: Bearer YOUR_API_KEY
Text Length Validation
TTSFM includes built-in text length validation to ensure compatibility with TTS models. The default maximum length is 4096 characters, but this can be customized.
Validation Options
max_length
: Maximum allowed characters (default: 4096)validate_length
: Enable/disable validation (default: true)preserve_words
: Avoid splitting words when chunking (default: true)
API Endpoints
GET /api/voices
Get list of available voices.
Response Example:
{
"voices": [
{
"id": "alloy",
"name": "Alloy",
"description": "Alloy voice"
},
{
"id": "echo",
"name": "Echo",
"description": "Echo voice"
}
],
"count": 6
}
GET /api/formats
Get list of supported audio formats.
Response Example:
{
"formats": [
{
"id": "mp3",
"name": "MP3",
"mime_type": "audio/mp3",
"description": "MP3 audio format"
}
],
"count": 6
}
POST /api/validate-text
Validate text length and get splitting suggestions.
Request Body:
{
"text": "Your text to validate",
"max_length": 4096
}
Response Example:
{
"text_length": 5000,
"max_length": 4096,
"is_valid": false,
"needs_splitting": true,
"suggested_chunks": 2,
"chunk_preview": [
"First chunk preview...",
"Second chunk preview..."
]
}
POST /api/generate
Generate speech from text.
Request Body:
{
"text": "Hello, world!",
"voice": "alloy",
"format": "mp3",
"instructions": "Speak cheerfully",
"max_length": 4096,
"validate_length": true
}
Parameters:
text
(required): Text to convert to speechvoice
(optional): Voice ID (default: "alloy")format
(optional): Audio format (default: "mp3")instructions
(optional): Voice modulation instructionsmax_length
(optional): Maximum text length (default: 4096)validate_length
(optional): Enable validation (default: true)
Response:
Returns audio file with appropriate Content-Type header.
POST /api/generate-batch
Generate speech from long text by automatically splitting into chunks. Uses intelligent text splitting that preserves word boundaries for natural-sounding speech.
Request Body:
{
"text": "Very long text that exceeds the limit...",
"voice": "alloy",
"format": "mp3",
"max_length": 4096,
"preserve_words": true
}
Response Example:
{
"total_chunks": 3,
"successful_chunks": 3,
"results": [
{
"chunk_index": 1,
"chunk_text": "First chunk text...",
"audio_data": "base64_encoded_audio",
"content_type": "audio/mp3",
"size": 12345,
"format": "mp3"
}
]
}
Python Package
Long Text Support
The TTSFM Python package includes built-in long text splitting functionality:
from ttsfm import TTSClient, Voice, AudioFormat
# Create client
client = TTSClient()
# Generate speech from long text (automatically splits)
responses = client.generate_speech_long_text(
text="Very long text that exceeds 4096 characters...",
voice=Voice.ALLOY,
response_format=AudioFormat.MP3,
max_length=2000,
preserve_words=True
)
# Save each chunk
for i, response in enumerate(responses, 1):
response.save_to_file(f"part_{i:03d}.mp3")
Key Features:
- Automatic Splitting: No manual text chunking required
- Word Preservation: Maintains word boundaries for natural speech
- Batch Processing: Efficient handling of multiple chunks
- CLI Support: Use
--split-long-text
flag
POST /api/generate-combined
Generate a single combined audio file from long text. Automatically splits text into chunks, generates speech for each chunk, and combines them into one seamless audio file.
Request Body:
{
"text": "Very long text that exceeds the limit...",
"voice": "alloy",
"format": "mp3",
"instructions": "Optional voice instructions",
"max_length": 4096,
"preserve_words": true
}
Response:
Returns a single audio file containing all chunks combined seamlessly.
Response Headers:
X-Chunks-Combined
: Number of chunks that were combinedX-Original-Text-Length
: Original text length in charactersX-Audio-Size
: Final audio file size in bytes
POST /v1/audio/speech-combined
OpenAI-compatible endpoint for combined audio generation. Same functionality as above but follows OpenAI API format.
Request Body:
{
"model": "gpt-4o-mini-tts",
"input": "Very long text that exceeds the limit...",
"voice": "alloy",
"response_format": "mp3",
"instructions": "Optional voice instructions",
"speed": 1.0,
"max_length": 4096
}
Response:
Returns a single combined audio file with the same headers as the native endpoint.
Use Cases:
- Long Articles: Convert blog posts or articles to single audio files
- Audiobooks: Generate chapters as single audio files
- Podcasts: Create podcast episodes from scripts
- Educational Content: Convert learning materials to audio
Example Usage:
# Python example
import requests
response = requests.post(
"http://ttsapi.site/api/generate-combined",
json={
"text": "Your very long text content here...",
"voice": "nova",
"format": "mp3",
"max_length": 2000
}
)
if response.status_code == 200:
with open("combined_audio.mp3", "wb") as f:
f.write(response.content)
chunks = response.headers.get('X-Chunks-Combined')
print(f"Combined {chunks} chunks into single file")