Bibliša: Aligned Collection Search Tool

[ Log In ] [ Register ]

 Creating a Synthetic Evaluation Dataset for the Serbian SentiWordNet / Saša Petalnikar = Izrada sintetičkog evaluativnog skupa podataka za Srpski SentiWordNet / Saša Petalnikar[About]



En/De/Fr/It- (first 9 out of 183 sentences) [pdf] Srpski - (prvih 9 od 183 rečenica) [pdf]
n1ABSTRACT: This study presents the creation of a synthetic evaluation dataset for the Serbian SentiWordNet using Large Language Models (LLMs), specifically focusing on the Mistral model.n1SAŽETAK: U radu se predstavlja izrada sintetičkog skupa za evaluaciju Srpskog SentiWordNet-a koja koristi velike jezičke modele, posebno model Mistral.
n2 Addressing the scarcity of the sentiment analysis resources for Serbian, this research aims to bridge this gap by generating a dataset to evaluate and enhance sentiment analysis tools for Serbian.n2 Zbog nedostatka resursa za analizu sentimenta na srpskom jeziku, cilj istraživanja je premošćavanje ovog jaza generisanjem skupa za evaluaciju i unapređenje alata za analizu sentimenta na srpskom.
n3 Sentiment polarity values from the English SentiWordNet were automatically mapped to Serbian WordNet via the Inter-Lingual Index (ILI).n3 Vrednosti polariteta sentimenta iz engleskog SentiWordNet-a automatski su mapirane na Srpski Vordnet.
n4 To refine these values for better alignment with the Serbian language, a new evaluation dataset was created.n4 Kako bi se ove vrednosti preciznije prilagodile srpskom jeziku, kreiran je poseban skup za evaluaciju.
n5 Initially, 500 synsets from the Serbian WordNet were selected based on their alignment with the senti-pol-sr lexicon and with the mapped values from SentiWordNet.n5 Inicijalno je odabrano 500 sinsetova iz Srpskog Vordneta, na osnovu njihove usklađenosti sa senti-pol-sr leksikonom i mapiranim vrednostima iz SentiWordNet-a.
n6 These synsets underwent sentiment polarity classification using the Mistral model.n6 Ovi sinsetovi su klasifikovani prema polaritetu sentimenta korišćenjem Mistral-a.
n7 A balanced subset of 75 synsets was then randomly extracted. It was further refined for sentiment gradation, and manually reviewed.n7Izbalansirani podskup od 75 sinsetova nasumično je izdvojen, dodatno profinjen finijom gradacijom sentimenta i ručno pregledan.
n8 The findings demonstrate a high model reliability, with approximately 93% of responses meeting the established acceptability criteria.n8 Rezultati pokazuju visoku preciznost, približno 93%.
n9KEYWORDS: SentiWordNet, synthetic dataset, Large Language Models, Serbian, sentiment analysisn9KLjUČNE REČI: Analiza sentimenta, veliki jezički modeli, sintetički skup podataka, Srpski jezik, SentiWordNet