hadoopСW
`
`
  • ע2018-09-13
  • l160
  • QQ3234520070
  • 360ö
  • ۽z0
  • 0
x21004؏ͣ0

美国足球大联盟赛程 :HanLP~еViterbiSegment~

#
lڣ2019-08-05 10:31
ƪŒcvЇRe2ViterbiSegmentЇRe7漰֪C͗lSC~Ҳ漰ֵķ~@Щ~҂ڌ`гõViterbiSegmentҲֱӷbHanLPеķ~Ҳ]ʹԓ~ͬrıԼһЩȻZ̎΄հеķ~ҲgʹViterbiSegment~

߀B~~ļʹλԼС邃˽‘ԓgithubAԶxЧĆ}M}ƪăݱ^hղغټx
1. ~׃
~PöxConfig.ȻZ̎0@҂~P׃±

www.dhkyo.com DƬD1.jpg


@Nʲôr϶Ƿ~_ʼǰ͕HanLPеViterbiSegment~eԓ^PϵˆDʾ

DƬD2.jpg


^PϵDԿֻҪViterbiSegmenttȕSegment()ʼԓЌ~Ìconfig@Щ׃ǹ׃˿ViterbiSegment錦ֱⲿ޸ôʲôrʹ@Щ׃Ȼڷ~ĕrwĂĂȻViterbiSegmentList<Term> segSentence(char[] sentence)
Ոע߅3ViterbiSegmentķ~@3
2. ~ʹ×lȺҲB~̣
҂֪~׃ʹõλúͿԴ_ÿ~ʹ×lԼÿ~ʹ
1. ~Zַ
1~D
void generateWordNet(final WordNet wordNetStorage)ڴ˷ϵyʹCoreNatureDictionary.txtļзֳпܵķ~·˕r׃useCustomDictionarytruetCustomDictionary.txtе~Ҳ]MfCustomDictionary.txtȼȻZ̎2ԿCoreNatureDictionary.txtHҲ䮔[R~Ԙעİl߅ijЩ~~Ҳг~Լ~Ԍl
2Ñ~A
׃useCustomDictionarytrueҪʹCustomDictionary.txtMиAt߅ķt^ԓEÑ~AǷMȫзЃɷNͬ׃indexMode>0rϵy̎ȫзģʽr

List<Vertex> combineByCustomDictionary(List<Vertex> vertexList, DoubleArrayTrie<CoreDictionary.Attribute> dat, final WordNet wordNetAll)

indexMode=0ϵy̎ͨ~ģʽ

List<Vertex> combineByCustomDictionary(List<Vertex> vertexList, DoubleArrayTrie<CoreDictionary.Attribute> dat)

{õķ҂yȫз֕rϵyCustomDictionary.txtӷ~·ͨз֕rϵyCustomDictionary.txtϲ·@ҲǞʲôеĕrѽCustomDictionary.txt~sЧԭһCoreNatureDictionary.txt~DͲµ·嵽з~·g˕rȥҲ޸CoreNatureDictionary.txtеPֻ~

3Sرx·
List<Vertex> viterbi(WordNet wordNet)˾͵õһֵַķ~YҪעHanLPViterbi~ֻviterbi·[R
3. Re
׃numberQuantifierRecognizetruetڴַֽYĻAMДֺϲtֱ^ԓ
void mergeNumberQuantifier(List<Vertex> termList, WordNet wordNetAll, Config config)
4. wRe
׃nertruertҪMиNwRe^mˆҪעԓ׃wRe׃ӰֻҪ⌍w׃truetner͕truenerfalset^߅헌wRe^m~Ԙעh
1ЇRe
д˲׃nameRecognize횞true{÷
PersonRecognition.recognition(vertexList, wordNetOptimum, wordNetAll)ʹ[RDƾnr.tr.txtͰlnr.txtHanLPṩӖZ҂ԼҲyõнɫעZ҂һֻ޸nr.txtļhnr.txt.binļЧ
2gRe
д˲׃translatedNameRecognize횞true{÷
TranslatedPersonRecognition.recognition(vertexList, wordNetOptimum, wordNetAll)ҪעgRe][Rƥ~漰~nrf.txtÑ޸ԓ~tҪhnrf.txt.trie.datʹЧ
3ձRe
д˲׃japaneseNameRecognize횞true{÷
JapanesePersonRecognition.recognition(vertexList, wordNetOptimum, wordNetAll)ҪעձRe][Rƥ~漰~nrj.txtÑ޸ԓ~tҪhnrj.txt.trie.datnrj.txt.value.datʹЧ
4Re
д˲׃placeRecognize횞true{÷
PlaceRecognition.recognition(vertexList, wordNetOptimum, wordNetAll)ʹ[RDƾns.tr.txtͰlns.txtHanLPṩӖZ҂ԼҲyõнɫעZ҂һֻ޸ns.txtļhns.txt.binļЧ
5CRe
д˲׃organizationRecognize횞true{÷
OrganizationRecognition.recognition(vertexList, wordNetOptimum, wordNetAll)ע@{ÙCRe֮ǰMһReҲnjӯB[R͵Reͨ[R?ʶò?漰ļDƾnt.tr.txtͰlnt.txtHanLPṩӖZ҂ԼҲyõнɫעZ҂һֻ޸nt.txtļhns.txt.binļЧ?ʶWҪʶPн׼ȷ?
~ȫB
߀Ҫע߅ă

]ϵyʹõ~
C~.txt
ȫȫ.txt
~.txt
Ϻ.txt
FhZa~.txt
@Щ~njϵyе~ĸӛµC@ӱ
ҪCֱCoreNatureDictionary.txt3όw
Ҫȥe`Rewֱnr.txtns.txtnt.txt
3. ̷ྀ~
HanLPViterbiSegment~̵ֶྀ֧̔׃threadNumberQԓ׃ĬJ1HanLPfViterbiSegmet~Чߵԭ϶ҲViterbiSegment~̷ֶྀ֧~@ViterbiSegment~ȲеPw@ЩwReЧҲܸFĶ̷ྀ~SegmentList<Term> seg(String text)@ЌFҪעHanLPĶ̷ྀ~ָһݔһLıһ̎ݔı
ķ baiziyu Čăѽ˲޸ڴxgӭһW

ϲg0 u0
DKHadoop߀e
؏
ο