You are training a machine learning model to support an anomaly-based intrusion detection system (IDS). The model can predict if an IP address is safe or not. Which of the following is the best method or technique to train the model?
A. Regression
B. Classification
C. Clustering
D. Dimensionality Reduction
Kindly be reminded that the suggested answer is for your reference only. It doesn’t matter whether you have the right or wrong answer. What really matters is your reasoning process and justifications.
My suggested answer is B. Classification.
Classification uses predefined labels to group or classify data. The outcome of IP prediction can be labeled as either “SAFE” or “NOT SAFE.” It’s a typical classification problem.
Regression can be used to do so, but it is a continuous function in nature and cares about correlation relationships. Classification is better than regression in this question.
Clustering is used to associate data into “clusters” or groups without predefined labels or buckets, unlike classification (it uses predefined labels).
Dimensionality reduction is typically used as a preprocessing level in Machine Learning to filter out noisy data instead of predicting outcomes.
Machine learning (ML)
- Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.
Source: SAS
- Machine learning (ML) is the study of computer algorithms that improve automatically through experience. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as “training data”, in order to make predictions or decisions without being explicitly programmed to do so.
Source: Wikipedia
Based on data available and the problem to solve, we train a machine learning model (algorithm) using various learning models, such as supervised learning, unsupervised learning, and semi-supervised learning.
Supervised Learning
Learning under supervision means someone is present to guide you; during the learning process, you understand what is right or wrong and make corrections and improve. Similarly, in supervised learning, you are the supervisor. The training data you used to train the model has answers (labeled data) so that the model can be improved iteratively.
As the model gets reliable (fit), it can be used to predict outcomes or solve problems. There are two main areas where supervised learning is useful: classification problems and regression problems. Please refer to Leonel’s Supervised Learning for details.
Unsupervised Learning
On the contrary, the training data used in unsupervised learning doesn’t come with a specific desired outcome or correct answer.
Reference
- 2020 Machine Learning Roadmap
- The 7 steps of machine learning
- Artificial Intelligence and Machine Learning in the Security Operations Center
- Using Artificial Intelligence in Cybersecurity
- Anomaly detection
- Anomaly-based intrusion detection system
- How Machine Learning Can Enable Anomaly Detection
- Machine learning
- 10 Machine Learning Methods that Every Data Scientist Should Know
- Training ML Models
- Beer and Diapers: The Impossible Correlation
- What is the difference between clustering and association rule mining?
- Market Basket Analysis
- Introduction to Market Basket Analysis in Python
- 4 Types of Classification Tasks in Machine Learning
- 10 Clustering Algorithms With Python
- A Two-layer Dimension Reduction and Two-tier Classification Model for Anomaly-Based Intrusion Detection in IoT Backbone Networks
- Regression with Categorical Variables: Dummy Coding Essentials in R
- Regression Analysis Essentials For Machine Learning
- SuperVize Me: What’s the Difference Between Supervised, Unsupervised, Semi-Supervised and Reinforcement Learning?
- Understanding Dimensionality Reduction Techniques To Filter Out Noisy Data
A BLUEPRINT FOR YOUR SUCCESS IN CISSP
My new book, The Effective CISSP: Security and Risk Management, helps CISSP aspirants build a solid conceptual security model. It is not only a tutorial for information security but also a study guide for the CISSP exam and an informative reference for security professionals.
- It is available on Amazon.
- Readers from countries or regions not supported by Amazon can get your copy from the author’s web site.
您正在訓練一種機器學習模型,以支持基於異常的(anomaly-based)入侵檢測系統(IDS)。 該模型可以預測IP地址是否安全。 以下哪項是訓練模型的最佳方法或技巧?
A. 回歸 (Regression)
B. 分類 (Classification)
C. 聚類 (Clustering)
D. 降維 (Dimensionality Reduction)