Stanford POS tagger Tutorial | Stanford’s Part of Speech Label Demo

Introduction

Standford CoreNLP library let you tag the words in your string i.e. for each word, the “tagger” gets whether it’s a noun, a verb ..etc. and then assigns the result to the word. For example:

“Karma of humans is AI”

will be output as

Karma /NN of /IN humans /NNS is /VBZ AI /NNP

Prerequisite
Prerequisite

  • Java
  • Maven

Steps to Follow

Here are steps for using Stanford POSTagger in your Java project.

  1. Create a new project.
  2. Create a new folder called “taggers”.
  3. Download basic English Stanford Tagger from here
  4. Extract the zip file and Open the extracted folder.
  5. Copy all content of extracted foler and paste in taggers folder

Generating Tokens

Here is the code to tag a sentence “Karma of humans is AI“.

MaxentTagger tagger = new MaxentTagger("/Users/admin/LearningSourceControl/CoreNLP/taggers/models/english-left3words-distsim.tagger");List<List> sentences = MaxentTagger.tokenizeText(new StringReader("Karma of humans is AI"));  for (List sentence : sentences) {   List tSentence = tagger.tagSentence(sentence);   System.out.println(SentenceUtils.listToString(tSentence, false));}


Complete Code

package com.interviewBubble.pos;import java.io.StringReader;import java.util.List;import edu.stanford.nlp.ling.SentenceUtils;import edu.stanford.nlp.ling.TaggedWord;import edu.stanford.nlp.ling.HasWord;import edu.stanford.nlp.tagger.maxent.MaxentTagger;public class TaggerDemo  {  private TaggerDemo() {}  public static void main(String[] args) throws Exception { MaxentTagger tagger = new MaxentTagger("/Users/admin/LearningSourceControl/CoreNLP/taggers/models/english-left3words-distsim.tagger");    List<List> sentences = MaxentTagger.tokenizeText(new StringReader("Karma of humans is AI"));    for (List sentence : sentences) {      List tSentence = tagger.tagSentence(sentence);      System.out.println(SentenceUtils.listToString(tSentence, false));    }  }}

OUTPUT

0    [main] INFO  edu.stanford.nlp.tagger.maxent.MaxentTagger  - Loading POS tagger from /Users/admin/LearningSourceControl/CoreNLP/taggers/models/english-left3words-distsim.tagger ... done [0.7 sec].

Karma/NN of/IN humans/NNS is/VBZ AI/NNP

POM.xml
POM.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation=”http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd”>

<modelVersion>4.0.0</modelVersion>

<groupId>com.interviewBubble</groupId>

<artifactId>CoreNLP</artifactId>

<version>1.0-SNAPSHOT</version>

<packaging>jar</packaging>

<name>CoreNLP</name>

<url>http://maven.apache.org</url>

<properties>

<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

<stanford.corenlp.version>3.9.1</stanford.corenlp.version>

</properties>

<dependencies>

<dependency>

<groupId>junit</groupId>

<artifactId>junit</artifactId>

<version>3.8.1</version>

<scope>test</scope>

</dependency>

<!– Stanford dependecies –>

<dependency>

<groupId>edu.stanford.nlp</groupId>

<artifactId>stanford-corenlp</artifactId>

<version>${stanford.corenlp.version}</version>

<scope>compile</scope>

</dependency>

<dependency>

<groupId>edu.stanford.nlp</groupId>

<artifactId>stanford-corenlp</artifactId>

<classifier>models</classifier>

<version>${stanford.corenlp.version}</version>

<scope>compile</scope>

</dependency>

<dependency>

<groupId>org.slf4j</groupId>

<artifactId>slf4j-api</artifactId>

<version>1.7.25</version>

</dependency>

<dependency>

<groupId>org.slf4j</groupId>

<artifactId>slf4j-log4j12</artifactId>

<version>1.7.25</version>

</dependency>

</dependencies>

</project>

Log4j.properties
Log4j.properties

# Set root logger level to DEBUG and its only appender to A1.

log4j.rootLogger=DEBUG, A1

# A1 is set to be a ConsoleAppender.

log4j.appender.A1=org.apache.log4j.ConsoleAppender

# A1 uses PatternLayout.

log4j.appender.A1.layout=org.apache.log4j.PatternLayout

log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x – %m%n