TTSFM API Documentation

Overview

The TTSFM API provides a modern, OpenAI-compatible interface for text-to-speech generation. It supports multiple voices, audio formats, and includes advanced features like text length validation and batch processing.

Base URL: http://ttsapi.site/api/

Key Features

11 different voice options
Multiple audio formats (MP3, WAV, OPUS, etc.)
Text length validation (4096 character limit)
Automatic text splitting for long content
Batch processing capabilities
Combined audio generation from long text
Real-time status monitoring

Authentication

Currently, the API supports optional API key authentication. If configured, include your API key in the request headers.

Authorization: Bearer YOUR_API_KEY

Text Length Validation

TTSFM includes built-in text length validation to ensure compatibility with TTS models. The default maximum length is 4096 characters, but this can be customized.

Important: Text exceeding the maximum length will be rejected unless validation is disabled or the text is split into chunks.

Validation Options

max_length: Maximum allowed characters (default: 4096)
validate_length: Enable/disable validation (default: true)
preserve_words: Avoid splitting words when chunking (default: true)

API Endpoints

GET /api/voices

Get list of available voices.

Response Example:

{
  "voices": [
    {
      "id": "alloy",
      "name": "Alloy",
      "description": "Alloy voice"
    },
    {
      "id": "echo",
      "name": "Echo", 
      "description": "Echo voice"
    }
  ],
  "count": 6
}

GET /api/formats

Get list of supported audio formats.

Response Example:

{
  "formats": [
    {
      "id": "mp3",
      "name": "MP3",
      "mime_type": "audio/mp3",
      "description": "MP3 audio format"
    }
  ],
  "count": 6
}

POST /api/validate-text

Validate text length and get splitting suggestions.

Request Body:

{
  "text": "Your text to validate",
  "max_length": 4096
}

Response Example:

{
  "text_length": 5000,
  "max_length": 4096,
  "is_valid": false,
  "needs_splitting": true,
  "suggested_chunks": 2,
  "chunk_preview": [
    "First chunk preview...",
    "Second chunk preview..."
  ]
}

POST /api/generate

Generate speech from text.

Request Body:

{
  "text": "Hello, world!",
  "voice": "alloy",
  "format": "mp3",
  "instructions": "Speak cheerfully",
  "max_length": 4096,
  "validate_length": true
}

Parameters:

text (required): Text to convert to speech
voice (optional): Voice ID (default: "alloy")
format (optional): Audio format (default: "mp3")
instructions (optional): Voice modulation instructions
max_length (optional): Maximum text length (default: 4096)
validate_length (optional): Enable validation (default: true)

Response:

Returns audio file with appropriate Content-Type header.

POST /api/generate-batch

Generate speech from long text by automatically splitting into chunks. Uses intelligent text splitting that preserves word boundaries for natural-sounding speech.

Request Body:

{
  "text": "Very long text that exceeds the limit...",
  "voice": "alloy",
  "format": "mp3",
  "max_length": 4096,
  "preserve_words": true
}

Response Example:

{
  "total_chunks": 3,
  "successful_chunks": 3,
  "results": [
    {
      "chunk_index": 1,
      "chunk_text": "First chunk text...",
      "audio_data": "base64_encoded_audio",
      "content_type": "audio/mp3",
      "size": 12345,
      "format": "mp3"
    }
  ]
}

Python Package

Long Text Support

The TTSFM Python package includes built-in long text splitting functionality:

from ttsfm import TTSClient, Voice, AudioFormat

# Create client
client = TTSClient()

# Generate speech from long text (automatically splits)
responses = client.generate_speech_long_text(
    text="Very long text that exceeds 4096 characters...",
    voice=Voice.ALLOY,
    response_format=AudioFormat.MP3,
    max_length=2000,
    preserve_words=True
)

# Save each chunk
for i, response in enumerate(responses, 1):
    response.save_to_file(f"part_{i:03d}.mp3")

Key Features:

Automatic Splitting: No manual text chunking required
Word Preservation: Maintains word boundaries for natural speech
Batch Processing: Efficient handling of multiple chunks
CLI Support: Use --split-long-text flag

POST /api/generate-combined

Generate a single combined audio file from long text. Automatically splits text into chunks, generates speech for each chunk, and combines them into one seamless audio file.

Request Body:

{
  "text": "Very long text that exceeds the limit...",
  "voice": "alloy",
  "format": "mp3",
  "instructions": "Optional voice instructions",
  "max_length": 4096,
  "preserve_words": true
}

Response:

Returns a single audio file containing all chunks combined seamlessly.

Response Headers:

X-Chunks-Combined: Number of chunks that were combined
X-Original-Text-Length: Original text length in characters
X-Audio-Size: Final audio file size in bytes

POST /v1/audio/speech-combined

OpenAI-compatible endpoint for combined audio generation. Same functionality as above but follows OpenAI API format.

Request Body:

{
  "model": "gpt-4o-mini-tts",
  "input": "Very long text that exceeds the limit...",
  "voice": "alloy",
  "response_format": "mp3",
  "instructions": "Optional voice instructions",
  "speed": 1.0,
  "max_length": 4096
}

Response:

Returns a single combined audio file with the same headers as the native endpoint.

Audio Combination: Uses advanced audio processing (PyDub) when available, with intelligent fallbacks for different environments. Supports all audio formats.

Use Cases:

Long Articles: Convert blog posts or articles to single audio files
Audiobooks: Generate chapters as single audio files
Podcasts: Create podcast episodes from scripts
Educational Content: Convert learning materials to audio

Example Usage:

# Python example
import requests

response = requests.post(
    "http://ttsapi.site/api/generate-combined",
    json={
        "text": "Your very long text content here...",
        "voice": "nova",
        "format": "mp3",
        "max_length": 2000
    }
)

if response.status_code == 200:
    with open("combined_audio.mp3", "wb") as f:
        f.write(response.content)

    chunks = response.headers.get('X-Chunks-Combined')
    print(f"Combined {chunks} chunks into single file")

API Documentation

Contents

Overview

Key Features

Authentication

Text Length Validation

Validation Options

API Endpoints

GET /api/voices

Response Example:

GET /api/formats

Response Example:

POST /api/validate-text

Request Body:

Response Example:

POST /api/generate

Request Body:

Parameters:

Response:

POST /api/generate-batch

Request Body:

Response Example:

Python Package

Long Text Support

Key Features:

POST /api/generate-combined

Request Body:

Response:

Response Headers:

POST /v1/audio/speech-combined

Request Body:

Response:

Use Cases:

Example Usage: