Monday, June 23, 2008

rbtagger suggested 0.2.5

I just released a new version of rbtagger that includes a little feature called suggest. After loading the rule tagger, you can pass it a chunk of text and have it suggest key words to use for tagging. Here's how it works:


require 'rbtagger'
tagger = Brill::Tagger.new( 'LEXICON', 'LEXICALRULEFILE', 'CONTEXTUALRULEFILE' )
tagger.suggest( File.read('sample.txt') )
=> [['doctor', 'NN', 3],['treatment','NN',5]]


The array returned contains the original word of interest, the part of speech, and the frequency of the term within the corpus. All it does is a simple reduction based on a maximum default of 10 tags. In this case, rbtagger picks all the nouns in the text and reduces until within the threshold of 10. Still this could be pretty useful for gathering terms of interest from a body of text. If you have any questions about rbtagger feel free to ask in the group I created for it.

0 comments:

Reading list