Description
KDD & Data Mining Approach for Finding Network Attacks using ML
Recently, the huge amounts of data and its incremental increase have changed the importance of information security and data analysis systems for Big Data. An intrusion detection system (IDS) is a system that monitors and analyzes data to detect any intrusion in the system or network. The high volume, variety, and high speed of data generated in the network have made the data analysis process to detect attacks by traditional techniques very difficult. Big Data techniques are used in IDS to deal with Big Data for an accurate and efficient data analysis process. This paper introduced Spark? Chi?SVM model for intrusion detection. In this model, we have used ChiSqSelector for feature selection and built an intrusion detection model by using a support vector machine (SVM) classifier on Apache Spark Big Data platform. We used KDD99 to train and test the model. In the experiment, we introduced a comparison between Chi?SVM classifier and Chi?Logistic Regression classifier. The results of the experiment showed that Spark? Chi?SVM model has high performance, reduces the training time, and is efficient for Big Data.
KDD & Data Mining Approach for Finding Network Attacks using ML
PROPOSED METHOD
Spark Chi SVM
proposed model In this section, the researchers describe the proposed model and the tools and techniques used in the proposed method.
The steps of the proposed model can be summarized as follows:
1 Load dataset and export it into Resilient Distributed Datasets (RDD) and Data Frame in Apache Spark.
2 Data preprocessing.
3 Feature selection.
4 Train Spark-Chi-SVM with the training dataset.
5 Test and evaluate the model with the KDD dataset.
DATASET DESCRIPTION
The KDD99 data set is used to evaluate the proposed model. The number of instances that are used is equal to 494,021. Does the KDD99 dataset have 41 attributes and the class attributes which indicates whether a given instance is a normal instance of an attack?? The table provides a description of KDD99 dataset attributes with class labels.

SOFTWARE REQUIREMENTS:
- Operating System: Windows 7, 8, and 10 (32 and 64 bit)
- Front End: Python
- Packages:? NumPy, Pandas, itertools, matplotlib, sklearn, Spark
- Back End: DataSet
2.3.2 HARDWARE REQUIREMENTS:
- Processor? -? Dual Core
- Speed -? 1 GHz
- RAM -? 4 GB
- Hard Disk – 200 GB
- Key Board -? Standard Windows Keyboard
- Mouse -? Two or Three Button Mouse
- Monitor -? SVG
Customer Reviews
There are no reviews yet.