API Documentation
Complete reference for the TTSFM Text-to-Speech API. Free, simple, and powerful.
Overview
The TTSFM API provides a modern, OpenAI-compatible interface for text-to-speech generation. It supports multiple voices, audio formats, and includes advanced features like text length validation and intelligent auto-combine functionality.
http://ttsapi.site/api/
Key Features
- π€ 11 different voice options - Choose from alloy, echo, nova, and more
- π΅ Multiple audio formats - MP3, WAV, OPUS, AAC, FLAC, PCM support
- π€ OpenAI compatibility - Drop-in replacement for OpenAI's TTS API
- β¨ Auto-combine feature - Automatically handles long text (>1000 chars) by splitting and combining audio
- π Text length validation - Smart validation with configurable limits
- π Real-time monitoring - Status endpoints and health checks
Operational Notes
- Requests above 1000 characters are automatically split when auto_combine is enabled; disable validation to manage chunking yourself.
- MP3 requests return MP3. OPUS, AAC, FLAC, WAV, and PCM map to WAV for reliable playback.
- Audio comes from the third-party openai.fm service; availability may change without noticeβadd graceful fallbacks.
- The Docker image bundles ffmpeg so combined MP3 responses work immediately without extra setup.
Authentication
Currently, the API supports optional API key authentication. If configured, include your API key in the request headers.
Authorization: Bearer YOUR_API_KEY
Text Length Validation
TTSFM includes built-in text length validation to ensure compatibility with TTS models. The default maximum length is 1000 characters, but this can be customized.
Validation Options
max_length: Maximum allowed characters (default: 1000)validate_length: Enable/disable validation (default: true)preserve_words: Avoid splitting words when chunking (default: true)
API Endpoints
GET /api/voices
Get list of available voices.
Response Example:
{
"voices": [
{
"id": "alloy",
"name": "Alloy",
"description": "Alloy voice"
},
{
"id": "echo",
"name": "Echo",
"description": "Echo voice"
}
],
"count": 6
}
GET /api/formats
Get available audio formats for speech generation.
Available Formats
We support multiple format requests, but internally:
- mp3 - Returns actual MP3 format
- All other formats (opus, aac, flac, wav, pcm) - Mapped to WAV format
Response Example:
{
"formats": [
{
"id": "mp3",
"name": "MP3",
"mime_type": "audio/mp3",
"description": "MP3 audio format"
},
{
"id": "opus",
"name": "Opus",
"mime_type": "audio/wav",
"description": "Returns WAV format"
},
{
"id": "aac",
"name": "AAC",
"mime_type": "audio/wav",
"description": "Returns WAV format"
},
{
"id": "flac",
"name": "FLAC",
"mime_type": "audio/wav",
"description": "Returns WAV format"
},
{
"id": "wav",
"name": "WAV",
"mime_type": "audio/wav",
"description": "WAV audio format"
},
{
"id": "pcm",
"name": "PCM",
"mime_type": "audio/wav",
"description": "Returns WAV format"
}
],
"count": 6
}
POST /api/validate-text
Validate text length and get splitting suggestions.
Request Body:
{
"text": "Your text to validate",
"max_length": 1000
}
Response Example:
{
"text_length": 5000,
"max_length": 1000,
"is_valid": false,
"needs_splitting": true,
"suggested_chunks": 2,
"chunk_preview": [
"First chunk preview...",
"Second chunk preview..."
]
}
POST /api/generate
Generate speech from text.
Request Body:
{
"text": "Hello, world!",
"voice": "alloy",
"format": "mp3",
"instructions": "Speak cheerfully",
"max_length": 1000,
"validate_length": true
}
Parameters:
text(required): Text to convert to speechvoice(optional): Voice ID (default: "alloy")format(optional): Audio format (default: "mp3")instructions(optional): Voice modulation instructionsmax_length(optional): Maximum text length (default: 1000)validate_length(optional): Enable validation (default: true)
Response:
Returns audio file with appropriate Content-Type header.
Python Package
Long Text Support
The TTSFM Python package includes built-in long text splitting functionality for developers who need fine-grained control:
from ttsfm import TTSClient, Voice, AudioFormat
# Create client
client = TTSClient()
# Generate speech from long text (automatically splits into separate files)
responses = client.generate_speech_long_text(
text="Very long text that exceeds 1000 characters...",
voice=Voice.ALLOY,
response_format=AudioFormat.MP3,
max_length=2000,
preserve_words=True
)
# Save each chunk as separate files
for i, response in enumerate(responses, 1):
response.save_to_file(f"part_{i:03d}.mp3")
Developer Features:
- Manual Splitting: Full control over text chunking for advanced use cases
- Word Preservation: Maintains word boundaries for natural speech
- Separate Files: Each chunk saved as individual audio file
- CLI Support: Use `--split-long-text` flag for command-line usage
POST /api/generate-combined
Generate a single combined audio file from long text. Automatically splits text into chunks, generates speech for each chunk, and combines them into one seamless audio file.
Request Body:
{
"text": "Very long text that exceeds the limit...",
"voice": "alloy",
"format": "mp3",
"instructions": "Optional voice instructions",
"max_length": 1000,
"preserve_words": true
}
Response:
Returns a single audio file containing all chunks combined seamlessly.
Response Headers:
X-Chunks-Combined: Number of chunks that were combinedX-Original-Text-Length: Original text length in charactersX-Audio-Size: Final audio file size in bytes
POST /v1/audio/speech
Enhanced OpenAI-compatible endpoint with auto-combine feature. Automatically handles long text by splitting and combining audio chunks when needed.
Request Body:
{
"model": "gpt-4o-mini-tts",
"input": "Text of any length...",
"voice": "alloy",
"response_format": "mp3",
"instructions": "Optional voice instructions",
"speed": 1.0,
"auto_combine": true,
"max_length": 1000
}
Enhanced Parameters:
- auto_combine (boolean, default: true):
true: Automatically split long text and combine audio chunks into a single filefalse: Return error if text exceeds max_length (standard OpenAI behavior)
- max_length (integer, default: 1000): Maximum characters per chunk when splitting
Response Headers:
X-Auto-Combine: Whether auto-combine was enabled (true/false)X-Chunks-Combined: Number of audio chunks combined (1 for short text)X-Original-Text-Length: Original text length (for long text processing)X-Audio-Format: Audio format of the responseX-Audio-Size: Audio file size in bytes
docs.examples_title
# Short text (works normally)
curl -X POST http://ttsapi.site/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini-tts",
"input": "Hello world!",
"voice": "alloy"
}'
# Long text with auto-combine (default)
curl -X POST http://ttsapi.site/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini-tts",
"input": "Very long text...",
"voice": "alloy",
"auto_combine": true
}'
# Long text without auto-combine (will error)
curl -X POST http://ttsapi.site/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini-tts",
"input": "Very long text...",
"voice": "alloy",
"auto_combine": false
}'
Use Cases:
- Long Articles: Convert blog posts or articles to single audio files
- Audiobooks: Generate chapters as single audio files
- Podcasts: Create podcast episodes from scripts
- Educational Content: Convert learning materials to audio
Example Usage:
# Python example
import requests
response = requests.post(
"http://ttsapi.site/api/generate-combined",
json={
"text": "Your very long text content here...",
"voice": "nova",
"format": "mp3",
"max_length": 2000
}
)
if response.status_code == 200:
with open("combined_audio.mp3", "wb") as f:
f.write(response.content)
chunks = response.headers.get('X-Chunks-Combined')
print(f"Combined {chunks} chunks into single file")
WebSocket Streaming
Real-time audio streaming for enhanced user experience. Get audio chunks as they're generated instead of waiting for the complete file.
Connection
// JavaScript WebSocket client
const client = new WebSocketTTSClient({
socketUrl: 'http://ttsapi.site',
debug: true
});
// Connection events
client.onConnect = () => console.log('Connected');
client.onDisconnect = () => console.log('Disconnected');
Streaming TTS Generation
// Generate speech with real-time streaming
const result = await client.generateSpeech('Hello, WebSocket world!', {
voice: 'alloy',
format: 'mp3',
chunkSize: 1024, // Characters per chunk
// Progress callback
onProgress: (progress) => {
console.log(`Progress: ${progress.progress}%`);
console.log(`Chunks: ${progress.chunksCompleted}/${progress.totalChunks}`);
},
// Receive audio chunks in real-time
onChunk: (chunk) => {
console.log(`Received chunk ${chunk.chunkIndex + 1}`);
// Process or play audio chunk immediately
processAudioChunk(chunk.audioData);
},
// Completion callback
onComplete: (result) => {
console.log('Streaming complete!');
// result.audioData contains the complete audio
}
});
WebSocket Events
Client β Server Events
| Event | Description | Payload |
|---|---|---|
generate_stream |
Start TTS generation | {text, voice, format, chunk_size} |
cancel_stream |
Cancel active stream | {request_id} |
Server β Client Events
| Event | Description | Payload |
|---|---|---|
stream_started |
Stream initiated | {request_id, timestamp} |
audio_chunk |
Audio chunk ready | {request_id, chunk_index, audio_data, duration} |
stream_progress |
Progress update | {progress, chunks_completed, total_chunks} |
stream_complete |
Generation complete | {request_id, total_chunks, status} |
stream_error |
Error occurred | {request_id, error, timestamp} |
Benefits
- Real-time feedback: Users see progress as audio generates
- Lower latency: First audio chunk arrives quickly
- Cancellable: Stop generation mid-stream if needed
- Efficient: Process chunks as they arrive
Example: Streaming Audio Player
// Create a streaming audio player
const audioChunks = [];
let isPlaying = false;
const streamingPlayer = await client.generateSpeech(longText, {
voice: 'nova',
format: 'mp3',
onChunk: (chunk) => {
// Store chunk
audioChunks.push(chunk.audioData);
// Start playing after first chunk
if (!isPlaying && audioChunks.length >= 3) {
startStreamingPlayback(audioChunks);
isPlaying = true;
}
},
onComplete: (result) => {
// Ensure all chunks are played
finishPlayback(result.audioData);
}
});
Try It Out!
Experience WebSocket streaming in action at the WebSocket Demo or enable streaming mode in the Playground.