We are excited to announce that a new research publication co-authored by Sardar Omar Salih (Duhok Polytechnic University) and Dr. Karwan Jacksi (University of Zakho, Director of the Semantic Web Lab) has been published in the prestigious journal Data in Brief (Elsevier, Scopus Q1).
πΉ Title: KSTRV1: A Scene Text Recognition Dataset for Central Kurdish in (Arabic-Based) Script
This publication introduces KSTRV1, the first large-scale Scene Text Recognition (STR) dataset tailored for the Central Kurdish language, addressing a critical gap in AI resources for Arabic-based scripts.
π§ Whatβs in KSTRV1?
- 1,420 real-world scene images
- 19,872 manually cropped word samples in Kurdish (Sorani and Badini dialects), Arabic, and English
- 20,000 synthetic samples with diverse visual styles (fonts, distortions, lighting, backgrounds)
- Detailed annotations to support STR model training and benchmarking
π Why It Matters:
KSTRV1 captures the multilingual landscape of the Kurdistan Region and supports research in underrepresented languages and scripts in computer vision. This is a significant step toward making Kurdish languages more accessible in AI-powered visual applications.
π This is another notable achievement of the Semantic Web Lab (SWLab) at the University of Zakho, showcasing its ongoing efforts in developing datasets, tools, and technologies for the Kurdish language.
π Congratulations to the authors for their impactful contribution to Kurdish language AI research!