## PCA: Beyond Dimensionality Reduction

Learn how to use PCA algorithm to find variables that vary togetherContinue reading on Towards Data Science »

Learn how to use PCA algorithm to find variables that vary togetherContinue reading on Towards Data Science »

A step-by-step tutorial to explain the working of PCA and implementing it from scratch in pythonImage By AuthorIntroductionPrincipal Component Analysis or PCA is a commonly used dimensionality reduction method. It works by computing the principal components and performing a change of basis. It retains the data in the direction of maximum variance. The reduced features are uncorrelated with each other. These features can be used for unsupervised clustering and classification. To reduce…

I have learned about eigenvalues and eigenvectors in University in a linear algebra course. It was very dry and mathematical, so I did not get, what it is all about. But I want to present this topic to you in a more intuitive way and I will use many animations to illustrate it.

First, we will look at how applying a matrix to a vector **rotates** and **scales** a vector. This will show us what **eigenvalues** and **eigenvectors** are. Then we will learn about **principal components** and that they are the eigenvectors of the **covariance matrix**. This knowledge will help us understand our final topic, **principal component analysis**.

To understand eigenvalues and eigenvectors, we have to first take a look at matrix multiplication.

*Table of Contents*

1. Introduction

2. Clustering Types

2.1. K-Means

-----Theory

-----The optimal number of clusters

-----Implementation

2.2. Mini-Batch K-Means

2.3. DBSCAN

2.4. Agglomerative Clustering

2.5. Mean-Shift

2.6. BIRCH

3. Image Segmentation with Clustering

4. Data Preprocessing with Clustering

5. Gaussian Mixture Model

-----Implementation

-----How to select the number of clusters?

6. Summary

Unlabeled datasets can be grouped by considering their similar properties with the unsupervised learning technique. However, the point of view of these similar features is different in each algorithm. Unsupervised learning provides detailed information about the dataset as well as labeling the data.

*Table of Contents*

1. Introduction

2. Principal Component Analysis (PCA)

3. Theory

3.1. Calculating PCA

3.1.1. Rescaling (Standardization)

3.1.2. Covariance Matrix

3.1.3. Eigenvalues and Eigenvectors

3.1.4. Sorting in Descent Order

3.2. Is PCA one of the feature extraction&feature selection methods?

4. Implementation

4.1. Traditional Machine Learning Approaches

4.2. Deep Learning Approaches

5. PCA Types

5.1. Kernel PCA

5.2. Sparse PCA

5.3. Randomized PCA

5.4. Incremental PCA

This article covers the definition of PCA, the Python implementation of the theoretical part of the PCA without Sklearn library, the difference between PCA and feature selection & feature extraction, the implementation of machine learning & deep learning, and explained PCA types with an example.

**By Aaron Wang, Master of Business Analytics @ MIT | Data Science**.

This Data Science cheat sheet covers over a semester of introductory machine learning and is based on MIT’s Machine Learning courses 6.867 and 15.072. You should have at least a basic understanding of statistics and linear algebra, although beginners may still find this resource helpful.

Inspired by Maverick’s *Data Science Cheatsheet* (hence the 2.0 in the name), located here.

Topics covered:

- Linear and Logistic Regression
- Decision Trees and Random Forest
- SVM
- K-Nearest Neighbors
- Clustering
- Boosting
- Dimension Reduction (PCA, LDA, Factor Analysis)
- Natural Language Processing
- Neural Networks
- Recommender Systems
- Reinforcement Learning
- Anomaly Detection
- Time Series
- A/B Testing

This cheat sheet will be occasionally updated with new and improved info, so consider a follow or star in the GitHub repo to stay up to date.

The Hyperspectral data expands the capability of Image Classification. The Hyperspectral Data not only distinguishes different land cover types but it also provides the detailed characteristics of each land cover such as minerals, soil, man-made structures (buildings, roads, etc.) and vegetation types.

While dealing with the HyperSpectral data one disadvantage is that there are too many bands to process. Apart from that, it is a challenge to store such a large amount of data. With a large amount of data, the time complexity also increases.

Thus, it becomes crucial to either decrease the amount of data or to select only the relevant bands. It should be kept in mind that the classification quality should not degrade with the reduction in number of bands.

As you can see, Isomap is an **Unsupervised Machine Learning** technique aimed at **Dimensionality Reduction.**

It differs from a few other techniques in the same category by using a **non-linear** approach to dimensionality reduction instead of linear mappings used by algorithms such as PCA. We will see how linear vs. non-linear approaches differ in the next section.

Isomap is a technique that combines several different algorithms, enabling it to use a non-linear way to reduce dimensions while preserving local structures.

Before we look at the example of Isomap and compare it to a linear method of Principal Components Analysis (PCA), let’s list the high-level steps that Isomap performs:

- Use a KNN approach to
**find the k nearest neighbors**of every data point.

As we are moving towards the digital world — cybersecurity is becoming a crucial part of our life. When we talk about security in digital life then the main challenge is to find the abnormal activity.

When we make any transaction while purchasing any product online — a good amount of people prefer credit cards. The credit limit in credit cards sometimes helps us me making purchases even if we don’t have the amount at that time. but, on the other hand, these features are misused by cyber attackers.

To tackle this problem we need a system that can abort the transaction if it finds fishy.

Here, comes the need for a system that can track the pattern of all the transactions and if any pattern is abnormal then the transaction should be aborted.

**By Indraneel Dutta Baruah, AI Driven Solutions Developer**

Photo by Mel Poole on Unsplash

In the current age, the availability of granular data for a large pool of customers/products and technological capability to handle petabytes of data efficiently is growing rapidly. Due to this, it’s now possible to come up with very strategic and meaningful clusters for effective targeting. And identifying the target segments requires a robust segmentation exercise. In this blog, we will be discussing the most popular algorithms for unsupervised clustering algorithms and how to implement them in python.

In this blog, we will be working with clickstream data from an online store offering clothing for pregnant women.