Multimodal Data
Our dataset offers video + audio + transcript bundles, enriched with detailed metadata for AI training. Here’s a sample of the data we capture:
Data Package
- Audio & Video: 1080p video, clear audio, natural settings
- Transcripts: Dialect + MSA + English + Phonetic versions
- Metadata: Location, dialect, demographics, environment
- Contextual Tags: Topic, sentiment, emotion, code-switching, non-verbal cues
- Annotations: Gestures, facial expressions, lighting
- Consent: Verbal + model release signed
- Diarization: Speaker turns with timestamps
Use Cases for AI and Research
- ASR training and evaluation on Levantine, Iraqi, and Yemeni Arabic.
- Diarization, speaker change detection, and conversation flow analysis.
- Emotion recognition and sentiment analysis grounded in cultural context.
- Multimodal fusion experiments combining lip movements, gestures, and speech.
- Code-switching detection between dialectal Arabic, MSA, and English.
- Conversational agents tuned for real-world background noise and interruptions.
Metadata Example
{
"recording_id": "LEB_AKKAR_001",
"file_paths": { ... },
"dialect": { "country": "Lebanon", "region": "Akkar" },
"speakers": [ { "age": 45, "gender": "Male" }, { "age": 38, "gender": "Female" } ],
"recording_environment": { "location_type": "Outdoor farm" },
"conversation_context": { "topic": "Agriculture", "code_switching": true },
"annotations": {
"face_visible": true,
"emotion_tags": ["happy", "curious"],
"sampling_rate_hz": 16000,
"audio_format": "wav",
"transcript_formats": ["dialect_ar", "msa", "en"],
"split": "train" // planned split structure for releases
},
"consent": { "form_signed": true }
}This JSON snippet mirrors the schema we share with clients, making it easy to filter by dialect, environment, split, or transcript format and to plug the data directly into ML pipelines.
Want to learn more or request a sample? Contact us or see more technical details on our AI Dataset page.