To analyze data by considering exiting user?s data set and predict what are chances of diabetes in coming five years. Information is shown in the form of different graphs.
Data analysis is playing important part in analyzing dataset and predicting what are situations in coming years. This analysis can give option for departments and organizations to take steps in dealing with these problems. In this project prediction of diabetes in coming years is considered as main problem.
Idea of visualize data by applying machine learning and pandas in python. Taking dataset from medical background of different people (prime Indians dataset from UCI repository). This data set consists of information of user who age, sex type of symptoms related to diabetes. Design a testing and training set and predict what are chances of patients having diabetes in coming five years. Data is classified and shown in the form of different graphs.
Using this project for easy data analysis we will show results of medical information of changes of getting diabetes on universal plots.
There were no chances of prediction in existing studies it was just by manual analysis based on existing data but analyzing large amount of dataset is not considered.
Data analysis and machine learning libraries and algorithms are used for prediction on diabetes and information is shown in detail in the form of different types of graphs (histogram, density plots, box and whisker plots, and correlation matrix plots.
A fast way to get an idea of the distribution of each attribute is to look at histograms.
Histograms group data into bins and provide you a count of the number of observations in each bin. From the shape of the bins you can quickly get a feeling for whether an attribute is Gaussian?, skewed, or even has an exponential distribution. It can also help you see possible outliers.
Density plots are another way of getting a quick idea of the distribution of each attribute. The plots look like an abstracted histogram with a smooth curve drawn through the top of each bin, much like your eye tried to do with the histograms.
Box and Whisker Plots
Another useful way to review the distribution of each attribute is to use?Box and Whisker Plots or boxplots for short.
Boxplots summarize the distribution of each attribute, drawing a line for the median (middle value) and a box around the 25th and 75th percentiles (the middle 50% of the data). The whiskers give an idea of the spread of the data and dots outside of the whiskers show candidate outlier values (values that are 1.5 times greater than the size of spread of the middle 50% of the data).
Correlation Matrix Plot
Correlation?gives an indication of how related the changes are between two variables. If two variables change in the same direction they are positively correlated. If the change in opposite directions together (one goes up, one goes down), then they are negatively correlated.
You can calculate the correlation between each pair of attributes. This is called a correlation matrix. You can then plot the correlation matrix and get an idea of which variables have a high correlation with each other.
This is useful to know, because some machine learning algorithms like linear and logistic regression can have poor performance if there are highly correlated input variables in your data.
SOFTWARE & HARDWARE REQUIREMENT:
OS: Windows 7 or above
Processor: I3 or above
Programming language: python 3.6
Distribution tool: Anaconda.
RAM: 4 GB
Hard Disk: 160 GB