Multimodal Data

Our dataset offers video + audio + transcript bundles, enriched with detailed metadata for AI training. Here’s a sample of the data we capture:

Data Package

  • Audio & Video: 1080p video, clear audio, natural settings
  • Transcripts: Dialect + MSA + English + Phonetic versions
  • Metadata: Location, dialect, demographics, environment
  • Contextual Tags: Topic, sentiment, emotion, code-switching, non-verbal cues
  • Annotations: Gestures, facial expressions, lighting
  • Consent: Verbal + model release signed
  • Diarization: Speaker turns with timestamps

Metadata Example

{
  "recording_id": "LEB_AKKAR_001",
  "file_paths": { ... },
  "dialect": { "country": "Lebanon", "region": "Akkar" },
  "speakers": [ { "age": 45, "gender": "Male" }, { "age": 38, "gender": "Female" } ],
  "recording_environment": { "location_type": "Outdoor farm" },
  "conversation_context": { "topic": "Agriculture", "code_switching": true },
  "annotations": { "face_visible": true, "emotion_tags": ["happy", "curious"] },
  "consent": { "form_signed": true }
}

Want to learn more or request a sample? Contact us.