Popović, 2010, vol. XI:2, ID: 1.2010.2.2[About]

En/De/Fr/It- (first 9 out of 311 sentences) [pdf]		Srpski - (prvih 9 od 311 rečenica) [pdf]
n1	TAGGERS APPLIED ON TEXTS IN SERBIAN*	n1	PROGRAMI ZA ETIKETIRANjE TEKSTA NA SRPSKOM JEZIKU*
n2	Zoran Popovic** Hemofarm, STADA	n2	Zoran Popović** Hemofarm, STADA
n3	Abstract: This paper provides a comparative overview of existing language tools based on taggers and machine learning methods, with practical tests and results about different taggers applied on texts in Serbian.	n3	Apstrakt: ovaj tekst daje uporedni pregled postojećih jezičkih alata, odnosno programa za etiketiranje, zasnovanih pre svega na metodama mašinskog učenja, uz konkretne testove i rezultate različitih programa nad tekstom na srpskom jeziku.
n4	For that purpose some already prepared annotated corpora were used, and 10-fold cross validation was used as the testing framework with a specially devised and developed environment of automated testing based on unix scripting (bash, perl, awk) - TnT has shown best performance, while Tree Tagger and SVMTool taggers have shown somewhat better performance in special cases.	n4	U tu svrhu su korišćeni već pripremLjeni etiketirani korpusi i desetostruka unakrsna provera (10-fold cross-validation), i posebno razvijen postupak automatizovanog testiranja realizovanog unix skriptovima (bash, perl, awk) - TnT je pokazao najboLje performanse, dok su se Tree Tagger i SVMTool pokazali uspešnijim u nekim specijalnim slučajevima.
n5	A possibility of combining different tagging methods and tools (programs) and integration with other NLP environments opens a wide area for further investigations and experiments about these solutions.	n5	Mogućnost uparivanja različitih metoda i programa za etiketiranje, kao i integracija sa drugim okruženjima za OPJ otvaraju mogućnost daLjih ispitivanja ovakvih rešenja.
n6	Keywords: tagging, tagger, PoS, machine learning, NLP, Computational Linguistics, CL	n6	KLjučne reči: etiketiranje, tagger, PoS, mašinsko učenje, NLP, računska lingvistika
n7	1. Introduction - two paradigms of Computational Linguistics	n7	1. Uvod - dve paradigme računske lingvistike
n8	NLP (Natural Language Processing) as an area of Computational Linguistics usually implies very complex processes in terms of computability and time needed for the processing.	n8	Obrada prirodnog jezika (OPJ, tj. NLP, Natural Language Processing) kao jedne od oblasti računske lingvistike (Computational Linguistics) podrazumeva najčešće veoma složene postupke u smislu vremena izvršavanja.
n9	It consists of phases such as lexical analysis (segmentation and tokenization of input speech or text, which starts with finding beginnings and ends of sentences or words, and detecting general lexical categories - future lexemes or tokens: numbers, punctuation characters, words, HTML tags, and similar), morpho-syntactic analysis (structure of a word, sentence or text), and finally, semantic analysis or even pragmatic analysis.	n9	Obrada prirodnog jezika se deli u faze, koje ne moraju u svakoj aplikaciji biti primenjene, kao što su leksička analiza (segmentacija i tokenizacija ulaznog govora ili teksta, gde se najpre određuje početak i kraj rečenica i reči, kao i osnovne leksičke klase - buduće lekseme, ili tokeni: brojevi, interpunkcija, reči, HTML etikete, i slično), morfo-sintaksna analiza (struktura reči, rečenica i teksta), i na kraju semantička analiza ili čak analiza pragmatike.

Bibliša: Aligned Collection Search Tool

Popović, 2010, vol. XI:2, ID: 1.2010.2.2[About]