Implementation of The Indonesian Language Stemming Algorithm in Twitter Data Preprocessing. Case Study: Twitter Wargabanua and Instakalsel

Main Article Content

Afian Syafaadi Rizki
Nina Mia Aristi
M. Najamudin Ridha
Aidil Fajar Zulfahri
Dwi Agung Wibowo

Abstract

Stemming is a widely used method in the field of Natural Language Processing (NLP). Its primary purpose is to normalize words with similar meanings but different forms into a common representation by converting them into their basic or root forms. Stemming is typically applied during the data preprocessing stage to enhance the performance of NLP systems. In the context of the Indonesian language, the Nazief stemming algorithm is the most commonly employed. This algorithm has been developed and adapted for various regional languages in Indonesia. In this research, we will assess the performance of the Nazief stemming algorithm on Twitter data from the accounts @wargabanua and @instakalsel. The goal is to evaluate how the algorithm handles text data that includes a mixture of two languages: Indonesian and Banjar. The test results indicate an accuracy rate of 90.34%. This demonstrates that the Nazief stemming algorithm can effectively process social media text data, even though it was not originally designed for the Banjar language.

Article Details

How to Cite
[1]
A. S. Rizki, N. M. Aristi, M. N. Ridha, A. F. Zulfahri, and D. A. Wibowo, “Implementation of The Indonesian Language Stemming Algorithm in Twitter Data Preprocessing. Case Study: Twitter Wargabanua and Instakalsel”, Fidelity, vol. 5, no. 3, pp. 175 - 183, Sep. 2023.
Section
Articles
Received 2023-07-12
Published 2023-09-30

References

Bhanuse, S.S., Kamble, S.D., Kakde, S.M. (2016). Text Mining Using Metadata for Generation of Side Information. International Conference on Information Security & Privacy. Nagpur. India.

Vijayarani, S., Ilamathi, J., Nithya. (2015). Preprocessing Techniques for Text Mining - An Overview. International Journal of Computer Science & Communication Networks. Vol 5(1), page 7- 16.

Suzanti, I.O., Jauhari, A., Hidayanti, N., Harianti, I.Y., Muffaroha, F.A. (2021). Comparison of Stemming and Similarity Algorithms in Indonesian Translated Al-Qur’an Text Search. Jurnal Ilmiah Kursor Vol.11 No. 2. hlm 91-100.

Rosid, M.A., Fitrani, A.S., Astutik, I.R.I., Mulloh, N.I., Gozali, H.A. (2020). Improving Text Preprocessing For Student Complaint Document Classification Using Sastrawi. IOP Conf. Series: Materials Science and Engineering 874. Bengkulu, Indonesia

Isik, M., And Dag, H. (2020). The impact of text preprocessing on the prediction of review ratings. Turkish Journal of Electrical Engineering and Computer Sciences Volume 28. No. 3, Article 15.

Hickman, L., Thapa, S., Tay, Louis., Cao, M., Srinivasan, P. (2022). Text Preprocessing for Text Mining in Organizational Research: Review and Recommendations. Organizational Research Methods Vol. 25(1). page 114–146

Utomo, M.S. (2013). Implementasi Stemmer Tala pada Aplikasi Berbasis Web. Jurnal Teknologi Informasi DINAMIK Volume 18, No.1. hlm 41-45.

Firman, S., Desana, W., Wibowo, A. (2022). Penerapan Algoritma Stemming Nazief & Adriani Pada Proses Klasterisasi Berita Berdasarkan Tematik Pada Laman (Web) Direktorat Jenderal HAM Menggunakan Rapidminer. Jurnal Informatika Vol. 11, No. 02. hlm 10-21.

Rizki, A.S., Tjahyanto, A., Trialih, R. (2019). Comparison of Stemming Algorithms on Indonesian Text Processing. TELKOMNIKA, Vol.17, No.1. page 95-102.

Hegazi, M.O., Al-Dossari, Y., Al-Yahy, A., Al-Sumari, A., Hilal, A. (2021). Preprocessing Arabic Text On Social Media. Helyon vol 7(2). e06191.

Polus, M.E., Abbas, T. (2021). Development for Performance of Porter Stemmer Algorithm. Eastern-European Journal of Enterprise Technologies ISSN 1729-3774. page 109.

Nazief, B. (1996). Confix Stripping: Approach to Stemming Algorithm for Bahasa Indonesia. Internal Publication. Faculty of Computer Science, University of Indonesia, Depok, Jakarta.

Asian, J. (2007). Effective Techniques for Indonesian Text Retrieval. Thesis. School of Computer Science and Information Technology, Science, Engineering, and Technology Portfolio, RMIT University Melbourne.

Arifin, A.Z. dan Setiono, A.N. (2002). Klasifikasi Dokumen Berita Kejadian Berbahasa Indonesia dengan Algoritma Single Pass Clustering. Prosiding Seminar on Intelligent Technology and Its Applications (SITIA). Teknik Elektro, Institut Teknologi Sepuluh Nopember Surabaya.

Ayyidar, N.H., Syukur, A., Subagyo, H. (2012). Algoritma Stemming Untuk Kata Serapan Bahasa Indonesia. Jurnal Teknologi Informasi, Volume 8 Nomor 1. hlm. 104-112.

Purwarianti, A. (2011). "A non deterministic Indonesian stemmer". Proceedings of the 2011 International Conference on Electrical Engineering and Informatics. Bandung. Indonesia.

Winarti, T., Kerami, D.J., Lussiana E.T.P., Sudiro, S.A. (2017). Improving Stemming Algorithm Using Morphological Rules. International Journal on Advanced Science Engineering Information Technology vol 7 no.5. page 1758-1764.

Subali, M.A.P., dan Fatichah, C. (2019). Kombinasi Metode Rule-Based dan N-Gram Stemming Untuk Mengenali Stemmer Bahasa Bali. Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK) Vol. 6, No. 2. hlm. 219-228.

Guterres, A., Gunawan, Santoso, J. (2019). Stemming Bahasa Tetun Menggunakan Pendekatan Rule Based. TEKNIKA, Volume 8, Nomor 2. hlm. 142-147.

Kastowo, D., Saputra, A., Suryono, W.D., Setyowati, E. (2022). Analisis Perbandingan Algoritma Nazief Adriani dan Levenshtein Distance untuk mengukur Tingkat Similaritas Berita Menggunakan Rabin Krap: Studi Kasus Berita Berbahasa Jawa. JNANALOKA Vol. 03 No. 01. hlm 1- 10.

Albab, M.U., Karuniawati, Y., Fawaiq, M.N. (2023). Optimization of the Stemming Technique on Text preprocessing President 3 Periods Topic. Jurnal TRANSFORMATIKA Vol. 20, No.2. hlm 1-10.

Durasid, D., Kawi, D. (1978). Bahasa Banjar Hulu. Jakarta. Pusat Pembinaan dan Pengembangan Bahasa, Departemen Pendidikan dan Kebudayaan.

Jahdiah., Dahliana., Sari, Y.P., Nengsih, S.W., Suryatin, E., Patricia, N.T., Ariestya, S.A. (2009). Pedoman Umum Ejaan Bahasa Banjar. Banjarmasin. Balai Bahasa Banjarmasin, Pusat Bahasa Departemen Pendidikan Nasional.