Text Summarization using NLP I Machine Learning

The objective of the project is to understand the concepts of natural language processing and create a tool for text summarization. The concern in automatic summarization is increasing broadly so the manual work is removed. The project concentrates on creating a tool that automatically summarizes the document.

Platform         : Python
Delivery          :  One Day
Support          : Online Live Session
Deliverables  : Project Files, Report and Presentation
Ask For Price


* Sale Price for only Code / simulation – For Hardware / more Details contact : 8925533488

This project, Automatic text summarization is basically summarizing the given paragraph using natural language processing and machine learning. There has been an explosion in the amount of text data from a variety of sources. This volume of text is an invaluable source of information and knowledge which needs to be effectively summarized to be useful. In this review, the main approaches to automatic text summarization are described. We review the different processes for summarization and describe the effectiveness and shortcomings of the different methods. The system works by assigning scores to sentences in the document to be summarized and using the highest-scoring sentences in the summary. Score values are based on features extracted from the sentence. A linear combination of feature scores is used. Almost all of the mappings from feature to score and the coefficient values in the linear combination are derived from a training corpus. Some anaphor resolution is performed. The system was submitted to the Document Understanding Conference for evaluation. In addition to basic summarization, some attempt is made to address the issue of targeting the text at the user. The intended user is considered to have little background knowledge or reading ability. The system helps by simplifying the individual words used in the summary and by drawing the pre-requisite background information from the web.

Text Summarization using NLP I Machine Learning 4
Text Summarization using NLP I Machine Learning 4


In the modern Internet age, textual data is ever increasing. Need some way to condense this data while preserving the information and meaning. We need to summarize textual data for that. Text summarization is the process of automatically generating natural language summaries from an input document while retaining the important points. It would help in easy and fast retrieval of information. There are two prominent types of summarization algorithms. 

  • Extractive summarization systems form summaries by copying parts of the source text through some measure of importance and then combining those parts/sentences together to render a summary. The importance of sentences is based on linguistic and statistical features. 
  • Abstractive summarization systems generate new phrases, possibly rephrasing or using words that were not in the original text. Naturally abstractive approaches are harder. For a perfect abstractive summary, the model has to first truly understand the document and then try to express that understanding in short possibly using new words and phrases. Much harder than extractive. Has complex capabilities like a generalization, paraphrasing, and incorporating real-world knowledge. The majority of the work has traditionally focussed on extractive approaches due to the ease of defining hard-coded rules to select important sentences than generate new ones. Also, it promises grammatically correct and coherent summaries. But they often don’t summarize long and complex texts well as they are very restrictive.

Potential applications 

Possible current uses of summarization :

  1. People need to learn much from texts. But they tend to want to spend less time while doing this.
  2. It aims to solve this problem by supplying them the summaries of the text from which they want to gain information.
  3. The goals of this project are that these summaries will be as important as possible in the aspect of the texts’ intention.
  4. The user will be eligible to select the summary length.
  5. Supplying the user, a smooth and clear interface.
  6. Configuring a fast replying server system.


The objective of the project is to understand the concepts of natural language processing and create a tool for text summarization. The concern in automatic summarization is increasing broadly so the manual work is removed. The project concentrates on creating a tool that automatically summarizes the document.


The project is wide in scope | all of the limitations stated below may seem to contradict that, but they are the only restrictions applied. This project looks at single document summarization – the area of multi-document summarization is not covered. Also, the summaries produced are largely extracts of the document being summarized, rather than newly generated abstracts. The parameters used are optimal for news articles, although that can be changed easily.


For obtaining automatic text summarization, there are basically two major techniques i.e.-Abstraction based Text Summarization and Extraction based Text Summarization. 

Extraction Based Extraction

 The Extractive summaries are used to highlight the words which are relevant, from the input source document. Summaries help in generating concatenated sentences taken as per the appearance. The decision is made based on every sentence if that particular sentence will be included in the summary or not. For example, Search engines typically use Extractive summary generation methods to generate summaries from web pages. Many types of logical and mathematical formulations have been used to create summaries. The regions are scored and the words containing the highest score are taken into the consideration. In extraction, only important sentences are selected. This approach is easier to implement. There are three main obstacles to the extractive approach. The first thing is a ranking problem which includes a ranking of the word. The second selection problem includes the selection of a subset of particular units of ranks and the third one is coherence that is to know to select various units from the understandable summary. There are many algorithms that are used to solve ranking problems. The two obstacles i.e. – selection and coherence are further solved to improve diversity and help in minimizing the redundancy and picking up the lines which are important. Each sentence is scored and arranged in decreasing order according to the score. It is not a trivial problem that helps in selecting the subsets of sentences for a coherent summary. It helps in the reduction of redundancy. When the list is put in an ordered manner then the first sentence is the most important sentence which helps in forming the summary. The sentence having the highest similarity is selected in the next step is picked from the top half of the list. The process has to be repeated until the limit is reached and a relevant summary is generated.

Software and Hardware Requirements:


  • OS  Windows 7, 8, and 10 (32 and 64 bit)
  • RAM 4GB


  • Python
  • Anaconda

Customer Reviews

There are no reviews yet.

Be the first to review “Text Summarization using NLP I Machine Learning”

Your email address will not be published. Required fields are marked *