5 OpenSource NLP Tools for Taming text

we’ve included some useful resources at the end to brush up your knowledge as well as explain some of the key concepts around natural language processing (NLP). To begin your journey, check out these projects:

  1. Stanford’s Core NLP Suite A GPL-licensed framework of tools for processing English, Chinese, and Spanish. Includes tools for tokenization (splitting of text into words), part of speech tagging, grammar parsing (identifying things like a noun and verb phrases), named entity recognition, and more. Once you’ve got the basics, be sure to check out the other projects from the same group at Stanford.
  2. Natural Language Toolkit If your language of choice is Python, then look no further than NLTK for many of your NLP needs. Similar to the Stanford library, it includes capabilities for tokenizing, parsing, and identifying named entities as well as many more features.
  3. Apache Lucene and Solr While not technically targeted at solving NLP problems, Lucene and Solr contain a powerful number of tools for working with text ranging from advanced string manipulation utilities to powerful and flexible tokenization libraries to blazing fast libraries for working with finite state automatons. On top of it all, you get a search engine for free!
  4. Apache OpenNLP Using a different underlying approach than Stanford’s library, the OpenNLP project is an Apache-licensed suite of tools to do tasks like tokenization, part of speech tagging, parsing, and named entity recognition. While not necessarily state of the art anymore in its approach, it remains a solid choice that is easy to get up and running.
  5. GATE and Apache UIMA As your processing capabilities evolve, you may find yourself building complex NLP workflows which need to integrate several different processing steps. In these cases, you may want to work with a framework like GATE or UIMA that standardizes and abstracts much of the repetitive work that goes into building a complex NLP application.

Reference: open source tools for taming text