ML for Mass Literacy – PaadaasML Project Update

Mass literacy presents a fertile ground for numerous Machine Learning (ML) applications aimed at enhancing education and accessibility. Some potential ML use cases for mass literacy include:

Personalized Learning Platforms: ML algorithms can analyze individual learning patterns, preferences, and strengths to tailor educational content and exercises to the specific needs of learners. These platforms can adapt to each user’s pace, style, and comprehension level, thereby enhancing learning outcomes.

Language Learning Apps: ML-powered language learning applications can provide personalized feedback on pronunciation, grammar, and vocabulary usage. These apps can leverage speech recognition and Natural Language Processing (NLP) to create immersive language learning experiences.

Content Recommendation Systems: ML algorithms can recommend relevant reading materials, educational videos, and interactive exercises based on learners’ interests, proficiency levels, and learning goals. Such systems can help learners discover new topics and engage with diverse educational content.

Automated Assessment and Feedback: ML models can automate the process of grading assignments, quizzes, and exams, providing immediate feedback to learners. These systems can also identify common errors and misconceptions, offering targeted interventions to address learning gaps.

Text-to-Speech (TTS) Systems: TTS systems powered by ML can convert written text into spoken language, making educational materials accessible to learners with visual impairments or reading difficulties. These systems can enhance the inclusivity of educational resources and promote literacy among diverse populations.

Language Translation Tools: ML-based translation tools can facilitate communication and learning across different languages. These tools can translate educational content, instructions, and discussions, enabling learners to access information in their native language or learn new languages more effectively.

Sentiment Analysis in Educational Content: ML algorithms can analyze the sentiment of educational content, identifying emotionally charged language or topics that may impact learners’ engagement and comprehension. Educators can use this insight to adapt their teaching strategies and create more engaging learning experiences.

Interactive Educational Games: ML techniques such as reinforcement learning can power interactive educational games that adapt to learners’ performance and preferences. These games can reinforce literacy skills, critical thinking, and problem-solving abilities in an engaging and entertaining manner.

Predictive Analytics for Dropout Prevention: ML models can analyze various factors contributing to student attrition and dropout rates, such as attendance patterns, academic performance, and socio-economic background. By identifying at-risk students early, educators and policymakers can intervene with targeted support initiatives to prevent dropout and promote literacy.

Virtual Tutoring Systems: ML-powered virtual tutoring systems can provide one-on-one support to learners, answering questions, explaining concepts, and guiding them through challenging topics. These systems can supplement traditional classroom instruction and offer personalized assistance outside formal learning environments.

The PaadaasML Project at GS Lab | GAVS

The team of data science and AI/ML experts at GS Lab | GAVS has been working on several such ML use cases for mass literacy. Previous blog posts, https://www.gslab.com/ai-ml-datascience/machine-learning-for-basic-literacy-and-numeracy/ and https://www.gslab.com/ai-ml-datascience/ml-use-cases-for-mass-literacy/ discussed how the team narrowed down to a use case of practicing times tables as one among the series of ML use cases for mass literacy. The project has been termed PaadaasML since Paadaas is the Marathi word for times tables. The blogs also touched upon how the team took a multi-pronged approach to arrive at the solution that included deciding on the ML model, application and ML frameworks, and various other aspects of the ML pipeline like training, inferencing, etc. This is a follow up blog with further updates on this project.

ML Model

After extensive exploration of suitable models for the project, the following options were considered:

Google Speech Commands model based on TensorFlow CNN, which was finalized.
Google Speech Recognition is of excellent quality but it requires good continuous network connectivity which cannot be guaranteed in this case.
Whisper Large Marathi model by This is 6GB and so rather large in size to fit in a modest phone that rural children are expected to have.
wav2vec2-xlsr-marathi model is only about 2GB but it did not meet quality requirements.
Shazam fingerprinting that is typically used for identifying music, did not work because the audio samples in this case are very short – only 3 or 4 seconds long and in the narrow frequency range.

More information and demos are available at:

Model Tuning

This model can be tuned further by:

Data preprocessing as outlined in https://www.kaggle.com/code/christianlillelund/classify-mnist-audio-using-spectrograms-keras-cnn/notebook#Modelling
By leveraging hierarchical models like those for units and tens, etc.

Model Publishing

These models have been published in the sites listed below, where they have been well received:

Kaggle – https://www.kaggle.com/models/sameersmahajan/marathi-numbers
TensorFlow tfhub.dev – https://github.com/tensorflow/tfhub.dev/tree/master/assets/docs/sameermahajan

Improving Model Accuracy

It has been identified that the spectrograms are sensitive to the speakers as well as the sessions in which the recordings are made. Hence, the team has decided to diversify samples across these two criteria. This will help improve the accuracy of the models and make them speaker agnostic. In cases where they cannot be made completely speaker agnostic, the team might consider cohort based models, say by age, gender, etc. Field trials will help improve model accuracy further.

Additional Work Done

A Kivy appintegrated with google speech recognition that can be tried out.
A Flask web appintegrated with google speech recognition also to try out. The team is currently looking to host it so that it can be tried out directly. The link will be updated here when that is done.
A Kivy android appthat can work in conjunction with Marathi voice keyboards – Padaas 0.1 (youtube.com).

Work to be Done

Add some UI – https://lnkd.in/dU-E6PFt where you can interact with the hosted Flask app version for its look and feel https://lnkd.in/dwtJd_Ht (bear its speech recognition which is currently way off).
Build a lightweight small footprint offline version of the PaadaasML app https://lnkd.in/dcxtwYQ2 to reach those having devices with very low configurations or poor network connectivity.

The Team

Jairav Desai, Krushnakant Mardikardeshmukh, Pragati Pailwan, Mayuri Nehate, Vineet Raina, Selina Arokiaswamy, Aniruddha Madurwar, Jayant Seth, and Shivaji Mutkule

Author

Sameer Mahajan, Principal Architect
GS Lab | GAVS

Sameer Mahajan has 27 years of experience in the software industry. He has worked for companies like Microsoft and Symantec across areas like machine learning, storage, cloud, big data, networking and analytics in the United States & India.

Sameer holds 9 US patents and is an alumnus of IIT Bombay and Georgia Tech. He not only conducts hands-on workshops and seminars but also participates in panel discussions in upcoming technologies like machine learning and big data. Sameer is one of the mentors for the Machine Learning Foundations course at Coursera.