Pioneering Research for
African Languages

At Msingi AI, we're revolutionizing natural language processing for African languages through innovative research approaches that combine technical excellence with cultural understanding.

Active Research Projects

Explore our ongoing research initiatives aimed at democratizing AI access across Africa.

Active

Sauti Ya Kenya

Building authentic Kenyan Swahili Text-to-Speech that truly captures the natural flow and character of our language, setting a new standard for African language voice technology.

TTS Swahili Voice AI
Read More →
March 2025
Active

Bias & Fairness in African AI Models

Identifying and mitigating biases in NLP models trained on African languages to ensure AI tools work equitably for all African users.

Auditing popular NLP models for bias
Developing fairness metrics for multilingual contexts
Creating bias-aware dataset curation strategies
NLP Bias Ethics Fairness
Read More →
March 2025
Active

MsingiAI Tokenizers

Developing specialized tokenizers for African languages that handle complex morphology, tonal systems, and linguistic variations with high efficiency.

Key Features

  • Language-specific tokenization optimization
  • Code-switching & multilingual support
  • Efficient morphological processing
NLP Tokenization African Languages

Msingi1: Advanced Swahili Language Model

Our flagship open-source Swahili language model built as a decoder-only transformer. Designed for efficiency and performance in resource-constrained environments while delivering high-quality natural language processing for Swahili text generation and understanding.

Current Architecture

  • 12 layers, 768 hidden size, 12 attention heads
  • ~110M parameters (with 32K vocabulary)
  • 2048 token context length
  • Rotary Position Embeddings (RoPE)
  • Pre-norm transformer with GELU activation
NLP Transformer PyTorch Swahili
Experimental

Kolmogorov-Arnold Networks for African NLP

Exploring novel neural architectures with learnable activation functions to improve efficiency and performance in low-resource African language tasks.

Key Innovations

  • Smooth, learnable activation functions
  • Enhanced model expressivity
  • Improved interpretability
Neural Networks Low-Resource NLP Research
Read Paper →
March 2025
New

Small-Scale Pretraining of Language Models

Developing efficient techniques for pretraining language models with limited computational resources, making AI more accessible to researchers in low-resource settings.

LLMs Pretraining Efficiency
Optimizing training for limited hardware
Scaling laws for small models
Data efficiency techniques
Learn More →
April 2025

More Projects Coming Soon

Stay tuned for additional research initiatives

Featured Project: Msingi1

Our flagship open-source Swahili language model built to democratize AI access for Swahili speakers across East Africa.

Flagship Project

Advanced Swahili Language Model

Msingi1 is our state-of-the-art decoder-only transformer language model designed specifically for Swahili. With 12 layers, 768 hidden size, 12 attention heads, and approximately 110M parameters, it delivers impressive performance while remaining efficient enough to run on modest hardware.

The model features a 2048 token context length, Rotary Position Embeddings (RoPE), pre-norm transformer architecture with GELU activation, and optimizations like Flash Attention and gradient checkpointing for efficient training and inference.

Join Our Research Community