Skip to content

Audio

Text-to-speech and speech-to-text endpoints.

Text to Speech

POST /v1/audio/speech

Convert text to spoken audio.

Request

json
{
  "model": "tts-1",
  "input": "Hello, welcome to JarvisClaw!",
  "voice": "alloy"
}

Parameters

ParameterTypeRequiredDescription
modelstringYesTTS model ID (tts-1, tts-1-hd)
inputstringYesText to convert to speech (max 4096 chars)
voicestringYesVoice preset (e.g., alloy, echo, fable, onyx, nova, shimmer)
response_formatstringNoAudio format: mp3 (default), opus, aac, flac
speedfloatNoSpeed multiplier (0.25 to 4.0). Default: 1.0

Response

Returns raw audio bytes with the appropriate Content-Type header.

Example

python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.jarvisclaw.ai/v1",
    api_key="sk-your-api-key",
)

response = client.audio.speech.create(
    model="tts-1-hd",
    voice="nova",
    input="Hello, welcome to JarvisClaw!",
)

response.stream_to_file("output.mp3")
bash
curl https://api.jarvisclaw.ai/v1/audio/speech \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"model": "tts-1", "input": "Hello!", "voice": "alloy"}' \
  --output output.mp3

Speech to Text (Transcription)

POST /v1/audio/transcriptions

Transcribe audio files to text.

Request (multipart/form-data)

ParameterTypeRequiredDescription
filefileYesAudio file (mp3, mp4, mpeg, mpga, m4a, wav, webm)
modelstringYesSTT model ID (whisper-1)
languagestringNoISO-639-1 language code (improves accuracy)
response_formatstringNojson (default), text, srt, verbose_json, vtt
temperaturefloatNoSampling temperature (0-1)

Response

json
{
  "text": "Hello, this is a transcription of the audio file."
}

Example

python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.jarvisclaw.ai/v1",
    api_key="sk-your-api-key",
)

with open("recording.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
    )
    print(transcript.text)
bash
curl https://api.jarvisclaw.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-your-api-key" \
  -F file="@recording.mp3" \
  -F model="whisper-1"

Notes

  • Maximum audio file size: 25 MB
  • Supported audio formats: mp3, mp4, mpeg, mpga, m4a, wav, webm
  • For long audio, split into segments under 25 MB each

Pay per call. No subscription. No rate limits.