Stanford POS tagger Tutorial | Extracting Nouns from text

INTRODUCTION
INTRODUCTION

Finding particular POS (e.g. Noun) tagged word. Here is the sample program that you can follow.

Extracting Nouns from text
Extracting Nouns from text

package com.interviewBubble.pos;import java.util.ArrayList;import java.util.List;import java.util.Properties;import edu.stanford.nlp.ling.CoreAnnotations;import edu.stanford.nlp.ling.CoreLabel;import edu.stanford.nlp.ling.tokensregex.TokenSequenceMatcher;import edu.stanford.nlp.ling.tokensregex.TokenSequencePattern;import edu.stanford.nlp.pipeline.Annotation;import edu.stanford.nlp.pipeline.StanfordCoreNLP;import edu.stanford.nlp.util.CoreMap;public class StanfordPartOfSpeech {public static void main( String[] args ) {       Properties properties = new Properties();     properties.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse");     StanfordCoreNLP pipeline = new StanfordCoreNLP(properties);     String input = "Karma of humans is AI";     Annotation annotation = pipeline.process(input);     List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);     List<String> output = new ArrayList<>();     String regex = "([{pos:/NN|NNS|NNP/}])"; //extracting Nouns     for (CoreMap sentence : sentences) {         List<CoreLabel> tokens = sentence.get(CoreAnnotations.TokensAnnotation.class);         TokenSequencePattern tspattern = TokenSequencePattern.compile(regex);         TokenSequenceMatcher tsmatcher = tspattern.getMatcher(tokens);         while (tsmatcher.find()) {             output.add(tsmatcher.group());         }     }     System.out.println("Input: "+input);     System.out.println("Output: "+output); }        }

OUTPUT
OUTPUT

0    [main] INFO  edu.stanford.nlp.pipeline.StanfordCoreNLP  - Adding annotator tokenize7    [main] INFO  edu.stanford.nlp.pipeline.TokenizerAnnotator  - No tokenizer type provided. Defaulting to PTBTokenizer.12   [main] INFO  edu.stanford.nlp.pipeline.StanfordCoreNLP  - Adding annotator ssplit16   [main] INFO  edu.stanford.nlp.pipeline.StanfordCoreNLP  - Adding annotator pos732  [main] INFO  edu.stanford.nlp.tagger.maxent.MaxentTagger  - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.7 sec].732  [main] INFO  edu.stanford.nlp.pipeline.StanfordCoreNLP  - Adding annotator lemma733  [main] INFO  edu.stanford.nlp.pipeline.StanfordCoreNLP  - Adding annotator ner2596 [main] INFO  edu.stanford.nlp.ie.AbstractSequenceClassifier  - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.8 sec].3709 [main] INFO  edu.stanford.nlp.ie.AbstractSequenceClassifier  - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [1.1 sec].4352 [main] INFO  edu.stanford.nlp.ie.AbstractSequenceClassifier  - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.6 sec].4354 [main] INFO  edu.stanford.nlp.time.JollyDayHolidays  - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.4587 [main] INFO  edu.stanford.nlp.time.TimeExpressionExtractorImpl  - Using following SUTime rules: edu/stanford/nlp/models/sutime/defs.sutime.txt,edu/stanford/nlp/models/sutime/english.sutime.txt,edu/stanford/nlp/models/sutime/english.holidays.sutime.txt5026 [main] DEBUG edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor  - Ignoring inactive rule: null5027 [main] DEBUG edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor  - Ignoring inactive rule: temporal-composite-8:ranges8750 [main] INFO  edu.stanford.nlp.pipeline.TokensRegexNERAnnotator  - TokensRegexNERAnnotator ner.fine.regexner: Read 580641 unique entries out of 581790 from edu/stanford/nlp/models/kbp/regexner_caseless.tab, 0 TokensRegex patterns.8767 [main] INFO  edu.stanford.nlp.pipeline.TokensRegexNERAnnotator  - TokensRegexNERAnnotator ner.fine.regexner: Read 4857 unique entries out of 4868 from edu/stanford/nlp/models/kbp/regexner_cased.tab, 0 TokensRegex patterns.8767 [main] INFO  edu.stanford.nlp.pipeline.TokensRegexNERAnnotator  - TokensRegexNERAnnotator ner.fine.regexner: Read 585498 unique entries from 2 files17078 [main] INFO  edu.stanford.nlp.pipeline.StanfordCoreNLP  - Adding annotator parse17381 [main] INFO  edu.stanford.nlp.parser.common.ParserGrammar  - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.3 sec].Input: Karma of humans is AIOutput: [Karma, humans, AI]