Skip to main content
POST
https://api.mor.org
/
api
/
v1
/
audio
/
transcriptions
Create Audio Transcription
curl --request POST \
  --url https://api.mor.org/api/v1/audio/transcriptions \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "s3_presigned_url": "<string>",
  "model": "<string>",
  "language": "<string>",
  "prompt": "<string>",
  "response_format": "<string>",
  "temperature": 123,
  "timestamp_granularities": "<string>",
  "enable_diarization": true,
  "output_content": "<string>",
  "session_id": "<string>"
}
'
{
  "text": "<string>",
  "segments": [
    {
      "id": 123,
      "start": 123,
      "end": 123,
      "text": "<string>"
    }
  ],
  "words": [
    {
      "word": "<string>",
      "start": 123,
      "end": 123
    }
  ]
}
Transcribe audio file to text. This endpoint transcribes audio files using the Morpheus Network providers. It automatically manages sessions and routes requests to the appropriate transcription model. Supports both file upload and S3 pre-signed URLs. Returns JSON or plain text responses based on response_format parameter.
Playground limitation: The interactive API playground does not correctly handle file uploads. Use the cURL examples below or an SDK instead.

Headers

Authorization
string
required
API key in format: Bearer sk-xxxxxx

Body (multipart/form-data)

file
file
Audio file to transcribe. Supported formats include: mp3, mp4, mpeg, mpga, m4a, wav, webm.
Either file or s3_presigned_url must be provided, but not both.
s3_presigned_url
string
Pre-signed S3 URL as alternative to file upload. Useful for large files or when files are already stored in S3.
Use S3 pre-signed URLs for files larger than 25MB or when you want to avoid uploading files directly.
model
string
Model ID to use for transcription (blockchain hex address or name).
Use the List Models endpoint to see available transcription models.
language
string
Language code (e.g., en, es, fr) to help improve transcription accuracy. If not specified, the model will attempt to detect the language automatically.
prompt
string
Optional text to guide the model’s transcription. Useful for proper nouns, technical terms, or specific vocabulary that may appear in the audio.
response_format
string
default:"json"
Format for the transcription response. Options:
  • json - JSON object with text and metadata
  • text - Plain text only
  • srt - SubRip subtitle format
  • verbose_json - Detailed JSON with word-level timestamps
  • vtt - WebVTT subtitle format
temperature
number
Sampling temperature between 0.0 and 1.0. Higher values make the output more random, lower values make it more deterministic.
timestamp_granularities
string
Comma-separated list of timestamp granularities. Options: word, segment. Only applicable when response_format is verbose_json.
enable_diarization
boolean
default:"false"
Enable speaker diarization to identify different speakers in the audio. Requires models that support this feature.
output_content
string
Output content type specification for advanced use cases.
session_id
string
Optional session ID to use for this request. If not provided, the system will automatically create or use the session associated with the API key.

Response

The response format depends on the response_format parameter:
text
string
Transcribed text (when response_format is json or text)
segments
array
Array of transcription segments with timestamps (when response_format is verbose_json)
words
array
Array of word-level timestamps (when timestamp_granularities includes word)

Example Request

import openai

client = openai.OpenAI(
    api_key="sk-xxxxxx",
    base_url="https://api.mor.org/api/v1"
)

# Upload audio file
with open("audio.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-model",
        file=audio_file,
        response_format="verbose_json",
        timestamp_granularities=["word", "segment"]
    )

print(transcript.text)

Use Cases

Meeting Notes

Automatically transcribe meetings and generate searchable text records

Content Accessibility

Create captions and subtitles for video content

Voice Commands

Convert voice commands to text for voice-controlled applications

Podcast Transcription

Generate searchable transcripts for podcast episodes
Use verbose_json with timestamp_granularities to get word-level timestamps, which are useful for creating interactive transcripts or synchronizing with video.
Large audio files may take longer to process. For files over 25MB, consider using S3 pre-signed URLs instead of direct file uploads.
Speaker diarization can help identify different speakers in multi-person conversations, but requires models that support this feature.