Exploring Mechanisms for Detecting Violent Content in Sinhala Image Posts: Rationale with Unsupervised vs Supervised Techniques


  • U Dikwatta Department of Computer Science, Faculty of Applied Sciences, University of Sri Jayewardenepura, Sri Lanka
  • TGI Fernando Department of Computer Science, Faculty of Applied Sciences, University of Sri Jayewardenepura, Sri Lanka
  • MKA Ariyaratne Faculty of Information Technology and Communication Sciences, Tampere University, Finland


This research explores the different avenues in machine learning to classify Sinhala image posts. Image posts in social media are one big weapon that conveys information directly to people. Image posts contain both visuals and text. English based research work is common in this regard, but only a handful can be seen from other languages. The target language was a low-resource language, Sinhala. Unsupervised algorithms were used to classify image posts and supervised algorithms were involved classifying manually extracted text in image posts. The classification decides whether the posts are violent or nonviolent. The trained supervised models were tested with interpretability models to identify the words that cause the decision of violent or nonviolent. The findings reveal supervised algorithms perform better than unsupervised algorithms in classifying image posts. However, improved results can be obtained by increasing the size and the variety of the dataset.


