Diabetes Prediction Using Machine Learning Approaches and Predication Using AI
To analyze data by considering exiting user’s data set and predict what are chances of diabetes in coming five years. Information is shown in the form of different graphs.
Data analysis is playing important part in analyzing dataset and predicting what are situations in coming years. This analysis can give option for departments and organizations to take steps in dealing with these problems. In this project prediction of diabetes in coming years is considered as main problem.
Idea of visualize data by applying machine learning and pandas in python. Taking dataset from medical background of different people (prime Indians dataset from UCI repository). This data set consists of information of user who age, sex type of symptoms related to diabetes. Design a testing and training set and predict what are chances of patients having diabetes in coming five years. Data is classified and shown in the form of different graphs.
Using this project for easy data analysis we will show results of medical information of changes of getting diabetes on universal plots.
Therewere no chances of prediction in existing studies it was just by manual analysis based on existing data but analyzing large amount of dataset is not considered.
Data analysis and machine learning libraries and algorithms are used for prediction on diabetes and information is shown in detail in the form of different types of graphs (histogram, density plots, box and whisker plots and correlation matrix plots.
A fast way to get an idea of the distribution of each attribute is to look at histograms.
Histograms group data into bins and provide you a count of the number of observations in each bin. From the shape of the bins you can quickly get a feeling for whether an attribute is Gaussian’, skewed or even has an exponential distribution. It can also help you see possible outliers.
Density plots are another way of getting a quick idea of the distribution of each attribute. The plots look like an abstracted histogram with a smooth curve drawn through the top of each bin, much like your eye tried to do with the histograms.
Box and Whisker Plots:
Another useful way to review the distribution of each attribute is to use Box and Whisker Plots or boxplots for short.
Boxplots summarize the distribution of each attribute, drawing a line for the median (middle value) and a box around the 25th and 75th percentiles (the middle 50% of the data). The whiskers give an idea of the spread of the data and dots outside of the whiskers show candidate outlier values (values that are 1.5 times greater than the size of spread of the middle 50% of the data).
Correlation Matrix Plot:
Correlation gives an indication of how related the changes are between two variables. If two variables change in the same direction they are positively correlated. If the change in opposite directions together (one goes up, one goes down), then they are negatively correlated.
You can calculate the correlation between each pair of attributes. This is called a correlation matrix. You can then plot the correlation matrix and get an idea of which variables have a high correlation with each other.
This is useful to know, because some machine learning algorithms like linear and logistic regression can have poor performance if there are highly correlated input variables in your data.
Diabetes Prediction using Machine Learning and AI
Software & Hardware Requirement:
- OS: Windows 7 or above
- Processor: I3 or above
- Programming language: python 3.6
- Distribution tool: Anaconda.
- RAM: 4 GB
- Hard Disk: 160 GB
0.00 average based on 0 ratings
More Things You Might Like This
Abstract: Although the educational level of the Portuguese population has improved in the last decades, the statistics keep Portugal at Europe’s tail end due to its high student failure rates. In particular, lack of success in the core classes of Mathematics and the Portuguese language is extremely serious. On the other hand, the fields of
Abstract: Advances in natural language processing (NLP) and educational technology, as well as the availability of unprecedented amounts of educationally-relevant text and speech data, have led to an increasing interest in using NLP to address the needs of teachers and students. Educational applications differ in many ways, however, from the types of applications for which
Machine Learning based Regression Model for Prediction of Soil Surface Humidity over Moderately Vegetated Fields
Abstract: Agriculture is one of the major revenue producing sectors of India and a source of survival. Numerous seasonal, economic and biological patterns influence the crop production but unpredictable changes in these patterns lead to a great loss to farmers. These risks can be reduced when suitable approaches are employed on data related to soil