API Documentation

Complete reference for the TTSFM Text-to-Speech API. Free, simple, and powerful.

Overview

The TTSFM API provides a modern, OpenAI-compatible interface for text-to-speech generation. It supports multiple voices, audio formats, and includes advanced features like text length validation and batch processing.

Base URL: http://ttsapi.site/api/

Key Features

  • 11 different voice options
  • Multiple audio formats (MP3, WAV, OPUS, etc.)
  • Text length validation (4096 character limit)
  • Automatic text splitting for long content
  • Batch processing capabilities
  • Combined audio generation from long text
  • Real-time status monitoring

Authentication

Currently, the API supports optional API key authentication. If configured, include your API key in the request headers.

Authorization: Bearer YOUR_API_KEY

Text Length Validation

TTSFM includes built-in text length validation to ensure compatibility with TTS models. The default maximum length is 4096 characters, but this can be customized.

Important: Text exceeding the maximum length will be rejected unless validation is disabled or the text is split into chunks.

Validation Options

  • max_length: Maximum allowed characters (default: 4096)
  • validate_length: Enable/disable validation (default: true)
  • preserve_words: Avoid splitting words when chunking (default: true)

API Endpoints

GET /api/voices

Get list of available voices.

Response Example:
{
  "voices": [
    {
      "id": "alloy",
      "name": "Alloy",
      "description": "Alloy voice"
    },
    {
      "id": "echo",
      "name": "Echo", 
      "description": "Echo voice"
    }
  ],
  "count": 6
}

GET /api/formats

Get list of supported audio formats.

Response Example:
{
  "formats": [
    {
      "id": "mp3",
      "name": "MP3",
      "mime_type": "audio/mp3",
      "description": "MP3 audio format"
    }
  ],
  "count": 6
}

POST /api/validate-text

Validate text length and get splitting suggestions.

Request Body:
{
  "text": "Your text to validate",
  "max_length": 4096
}
Response Example:
{
  "text_length": 5000,
  "max_length": 4096,
  "is_valid": false,
  "needs_splitting": true,
  "suggested_chunks": 2,
  "chunk_preview": [
    "First chunk preview...",
    "Second chunk preview..."
  ]
}

POST /api/generate

Generate speech from text.

Request Body:
{
  "text": "Hello, world!",
  "voice": "alloy",
  "format": "mp3",
  "instructions": "Speak cheerfully",
  "max_length": 4096,
  "validate_length": true
}
Parameters:
  • text (required): Text to convert to speech
  • voice (optional): Voice ID (default: "alloy")
  • format (optional): Audio format (default: "mp3")
  • instructions (optional): Voice modulation instructions
  • max_length (optional): Maximum text length (default: 4096)
  • validate_length (optional): Enable validation (default: true)
Response:

Returns audio file with appropriate Content-Type header.

POST /api/generate-batch

Generate speech from long text by automatically splitting into chunks. Uses intelligent text splitting that preserves word boundaries for natural-sounding speech.

Request Body:
{
  "text": "Very long text that exceeds the limit...",
  "voice": "alloy",
  "format": "mp3",
  "max_length": 4096,
  "preserve_words": true
}
Response Example:
{
  "total_chunks": 3,
  "successful_chunks": 3,
  "results": [
    {
      "chunk_index": 1,
      "chunk_text": "First chunk text...",
      "audio_data": "base64_encoded_audio",
      "content_type": "audio/mp3",
      "size": 12345,
      "format": "mp3"
    }
  ]
}

Python Package

Long Text Support

The TTSFM Python package includes built-in long text splitting functionality:

from ttsfm import TTSClient, Voice, AudioFormat

# Create client
client = TTSClient()

# Generate speech from long text (automatically splits)
responses = client.generate_speech_long_text(
    text="Very long text that exceeds 4096 characters...",
    voice=Voice.ALLOY,
    response_format=AudioFormat.MP3,
    max_length=2000,
    preserve_words=True
)

# Save each chunk
for i, response in enumerate(responses, 1):
    response.save_to_file(f"part_{i:03d}.mp3")
Key Features:
  • Automatic Splitting: No manual text chunking required
  • Word Preservation: Maintains word boundaries for natural speech
  • Batch Processing: Efficient handling of multiple chunks
  • CLI Support: Use --split-long-text flag

POST /api/generate-combined

Generate a single combined audio file from long text. Automatically splits text into chunks, generates speech for each chunk, and combines them into one seamless audio file.

Request Body:
{
  "text": "Very long text that exceeds the limit...",
  "voice": "alloy",
  "format": "mp3",
  "instructions": "Optional voice instructions",
  "max_length": 4096,
  "preserve_words": true
}
Response:

Returns a single audio file containing all chunks combined seamlessly.

Response Headers:
  • X-Chunks-Combined: Number of chunks that were combined
  • X-Original-Text-Length: Original text length in characters
  • X-Audio-Size: Final audio file size in bytes

POST /v1/audio/speech-combined

OpenAI-compatible endpoint for combined audio generation. Same functionality as above but follows OpenAI API format.

Request Body:
{
  "model": "gpt-4o-mini-tts",
  "input": "Very long text that exceeds the limit...",
  "voice": "alloy",
  "response_format": "mp3",
  "instructions": "Optional voice instructions",
  "speed": 1.0,
  "max_length": 4096
}
Response:

Returns a single combined audio file with the same headers as the native endpoint.

Audio Combination: Uses advanced audio processing (PyDub) when available, with intelligent fallbacks for different environments. Supports all audio formats.
Use Cases:
  • Long Articles: Convert blog posts or articles to single audio files
  • Audiobooks: Generate chapters as single audio files
  • Podcasts: Create podcast episodes from scripts
  • Educational Content: Convert learning materials to audio
Example Usage:
# Python example
import requests

response = requests.post(
    "http://ttsapi.site/api/generate-combined",
    json={
        "text": "Your very long text content here...",
        "voice": "nova",
        "format": "mp3",
        "max_length": 2000
    }
)

if response.status_code == 200:
    with open("combined_audio.mp3", "wb") as f:
        f.write(response.content)

    chunks = response.headers.get('X-Chunks-Combined')
    print(f"Combined {chunks} chunks into single file")