Stanford POS tagger Tutorial | Reading Text from File

Introduction
Introduction

This demo shows user–provided sentences (i.e., {@code List<HasWord>}) being tagged by the tagger. The sentences are generated by direct use of the DocumentPreprocessor class.

Source Code
Source Code

package com.interviewBubble.pos;

import java.io.BufferedReader;

import java.io.FileInputStream;

import java.io.InputStreamReader;

import java.io.OutputStreamWriter;

import java.io.PrintWriter;

import java.util.List;

import edu.stanford.nlp.ling.SentenceUtils;

import edu.stanford.nlp.ling.TaggedWord;

import edu.stanford.nlp.ling.HasWord;

import edu.stanford.nlp.ling.CoreLabel;

import edu.stanford.nlp.process.CoreLabelTokenFactory;

import edu.stanford.nlp.process.DocumentPreprocessor;

import edu.stanford.nlp.process.PTBTokenizer;

import edu.stanford.nlp.process.TokenizerFactory;

import edu.stanford.nlp.tagger.maxent.MaxentTagger;

public class TaggerDemo2  {

private TaggerDemo2() {}

public static void main(String[] args) throws Exception {

MaxentTagger tagger = new MaxentTagger(“/Users/admin/LearningSourceControl/CoreNLP/taggers/models/english-left3words-distsim.tagger”);

TokenizerFactory<CoreLabel> ptbTokenizerFactory = PTBTokenizer.factory(new CoreLabelTokenFactory(),

“untokenizable=noneKeep”);

BufferedReader r = new BufferedReader(new InputStreamReader(new FileInputStream(“/Users/admin/LearningSourceControl/CoreNLP/taggers/sample-input.txt”), “utf-8”));

PrintWriter pw = new PrintWriter(new OutputStreamWriter(System.out, “utf-8”));

DocumentPreprocessor documentPreprocessor = new DocumentPreprocessor(r);

documentPreprocessor.setTokenizerFactory(ptbTokenizerFactory);

for (List<HasWord> sentence : documentPreprocessor) {

List<TaggedWord> tSentence = tagger.tagSentence(sentence);

pw.println(SentenceUtils.listToString(tSentence, false));

}

pw.close();

}

}

OUTPUT
OUTPUT

1    [main] INFO  edu.stanford.nlp.tagger.maxent.MaxentTagger  - Loading POS tagger from /Users/admin/LearningSourceControl/CoreNLP/taggers/models/english-left3words-distsim.tagger ... done [2.3 sec].

A/DT passenger/NN plane/NN has/VBZ crashed/VBN shortly/RB after/IN take-off/NN from/IN Kyrgyzstan/NNP ‘s/POS capital/NN ,/, Bishkek/NNP ,/, killing/VBG a/DT large/JJ number/NN of/IN those/DT on/IN board/NN ./.

The/DT head/NN of/IN Kyrgyzstan/NNP ‘s/POS civil/JJ aviation/NN authority/NN said/VBD that/IN out/IN of/IN about/IN 90/CD passengers/NNS and/CC crew/NN ,/, only/RB about/IN 20/CD people/NNS have/VBP survived/VBN ./.

The/DT Itek/NNP Air/NNP Boeing/NNP 737/CD took/VBD off/RP bound/VBN for/IN Mashhad/NNP ,/, in/IN north-eastern/JJ Iran/NNP ,/, but/CC turned/VBD round/NN some/DT 10/CD minutes/NNS later/RB ./.