Utvić, 2011, XII:2, ID: 1.2011.2.3[About]

En/De/Fr/It- (first 9 out of 210 sentences) [pdf]		Srpski - (prvih 9 od 210 rečenica) [pdf]
n1	Annotating the Corpus of Contemporary Serbian	n1	ANOTACIJA KORPUSA SAVREMENOG SRPSKOG JEZIKA
n2	Miloš Utvić, Faculty of Philology, University of Belgrade, Department of Library and Information Science	n2	Miloš Utvić, Filološki fakultet, Univerzitet u Beogradu, Katedra za bibliotekarstvo i informatiku
n3	This paper presents results of research that have been achieved during 2011 supported by the Serbian Ministry of Education and Science under the grant 178006 (Serbian Language and its Resources) and by project CESAR, as a part of a wider network of excellence called META-NET, funded by the European Union.	n3	Ovaj rad prikazuje rezultate postignute tokom 2011. godine u okviru projekta Srpski jezik i njegovi resursi (178006) koji finansira Ministarstvo prosvete Republike Srbije i projekta CESAR kao dela šire mreže projekata META-NET koju finansira Evropska unija.
n4	Abstract:	n4	Apstrakt
n5	This article describes stages in annotation of the 113 million Corpus of Contemporary Serbian (preparation and implementation).	n5	Ovaj tekst opisuje pripremu i realizaciju anotacije Korpusa savremenog srpskog jezika veličine 113 miliona reči.
n6	There are several levels of annotation which have been conducted.	n6	Anotacija je sprovedena na nekoliko nivoa.
n7	Corresponding bibliographical information is attached to each corpus text.	n7	Svakom tekstu korpusa je pridružena odgovarajuća bibliografska informacija.
n8	Part-of-speech (PoS) tagset is prepared, based on the electronic morphological dictionary of Serbian, as well as dictionary of possible annotations adapted for TreeTagger, the PoS tagging system.	n8	Na osnovu elektronskog morfološkog rečnika srpskog jezika pripremljen je skup etiketa za vrste reči, kao i rečnik za anotaciju prilagođen programu za etiketiranje TreeTagger.
n9	The Corpus of Contemporary Serbian has been automatically, morphosyntactically annotated with TreeTagger software, i.e. information about part of speech and lemma has been attached to each corpus word form. TreeTagger used manually tagged one million word corpus INTERA as a training set.	n9	Korišćenjem programa TreeTagger i ručno anotiranog korpusa INTERA veličine oko milion reči, izvršena je automatska morfosintaksička anotacija Korpusa savremenog srpskog jezika, tj. korpusnim rečima je pridružena informacija o vrsti reči i lemi.

Bibliša: Aligned Collection Search Tool

Utvić, 2011, XII:2, ID: 1.2011.2.3[About]