### Description

**Introduction:**

Data analysis is playing an important part in analysing datasets and predicting what are situations in the coming years. This analysis can give options

for departments and organizations to take steps in dealing with these problems. In this project prediction of diabetes in coming years is considered as the main problem.

**Abstract:**

The idea of visualizing data by applying machine learning and pandas in python. Taking dataset from a medical background of different people (prime Indians dataset from UCI repository). This data set consists of information on the user’s age, sex type of symptoms related to diabetes. Design a testing and training set and predict are chances of patients having diabetes in the coming five years. Data is classified and shown in the form of different graphs.

Using this project for easy data analysis we will show results of medical information of changes of getting diabetes on universal plots.

**Existing system:**

There were no chances of prediction in existing studies it was just by manual analysis based on existing data but analyzing a large amount of dataset is not considered.

**Proposed system:**

Data analysis and machine learning libraries and algorithms are used for prediction on diabetes and information are shown in detail in the form of different types of graphs (histogram, density plots, box and whisker plots, and correlation matrix plots.

**Histograms**

A fast way to get an idea of the distribution of each attribute is to look at histograms.

Histograms group data into bins and provide you with a count of the number of observations in each bin. From the shape of the bins, you can quickly get a feeling for whether an attribute is Gaussian, skewed, or even has an exponential distribution. It can also help you see possible outliers.

**Density Plots**

Density plots are another way of getting a quick idea of the distribution of each attribute. The plots look like an abstracted histogram with a smooth curve drawn through the top of each bin, much like your eye tried to do with the histograms.

**Box and Whisker Plots**

Another useful way to review the distribution of each attribute is to use Box and Whisker Plots or boxplots for short.

Boxplots summarize the distribution of each attribute, drawing a line for the median (middle value) and a box around the 25th and 75th percentiles (the middle 50% of the data). The whiskers give an idea of the spread of the data and dots outside of the whiskers show candidate outlier values (values that are 1.5 times greater than the size of spread of the middle 50% of the data).

**Correlation Matrix Plot**

Correlation gives an indication of how related the changes are between two variables. If two variables change in the same direction they are positively correlated. If they change in opposite directions together (one goes up, one goes down), then they are negatively correlated.

You can calculate the correlation between each pair of attributes. This is called a correlation matrix. You can then plot the correlation matrix and get an idea of which variables have a high correlation with each other.

This is useful to know because some machine learning algorithms like linear and logistic regression can have poor performance if there are highly correlated input variables in your data.

**System Architecture:**

**SOFTWARE & HARDWARE REQUIREMENT:**

**OS: Windows 7 or above**

**Processor: I3 or above**

**Programming language:** python 3.6

**Distribution tool: Anaconda**.

**RAM**: 4 GB

**Hard Disk**: 160 GB

## Customer Reviews

There are no reviews yet.