Airline Survey Analysis

Leveraging Latent Space Clustering on High Dimensional and Sparse Survey Data

Autoencoder

Below is a plot showing the Reconstruction Loss history for both the training sets and validations sets. The model appears to converge after roughly 400 epochs.

Clustering

Once the autoencoder compressed the input features into a 2-dimensional latent space, I applied the KMeans clustering algorithm to group the data points into distinct clusters.

Process Overview

To identify the optimal number of clusters, I iteratively evaluated different cluster counts using:

Key Observations

The inertia plot below shows the variance explained by increasing the number of clusters. Significant decreases in inertia occur from 2 to 4 clusters, suggesting that these clusters effectively capture the natural groupings in the data. Beyond 4 clusters, the slope flattens, indicating diminishing improvements in cluster separation.

Since the latent space has only two dimensions, I visualized the clusters in a 2D scatter plot. The resulting visualization revealed four distinct groupings. Data points appear to cluster into separate regions, suggesting the autoencoder effectively compressed complex relationships from the original dataset into the latent space.

A high Silhouette score of 0.85 indicates the clusters are distinct and well separated.

Cluster Distribution

The number of respondents in each cluster is as follows:

Interpretation

The latent space representation preserves the relationship between the original input data and the cluster assignments. This allows for meaningful comparisons and insights between clusters by analyzing original survey features.

Airline Frequency Ratings

The survey question, “How frequently do you fly with each airline?”, allowed respondents to rate how frequently they fly with each of the 9 different airlines on a scale from 1 (frequently) and 4 (Never). The heatmap below summarizes the average frequency ratings for each cluster across the nine airlines.

This feature is critical for understanding customer airline preferences. A few interesting patterns to point out include:

Key Features Differentiating Clusters

To understand what differentiates the clusters, I analyzed survey features with the greatest variation between clusters. Only features where the difference in average ratings exceeded a specified threshold were included in the following visuals. The heatmaps below highlight these differentiating features.

Note: Each of the nominal variables needed to be factored out into ‘dummy-variables’ in the dataset. Thus, they are all binary and on a different scale than the ordinal variables, so I separated them out on their own heatmap.

Cluster Profiles

Cluster 0: “Loyal but Infrequent”

Demographics & Travel Behavior:

Service Ratings:

Actionable Insights:

Engagement: Introduce loyalty incentives and special offers to encourage increased travel frequency. Service Continuity: Maintain high service standards and monitor any emerging issues, ensuring that infrequent flyers continue to have a positive overall experience.

Cluster 1: “Frequent, Satisfied Flyers”

Demographics & Travel Behavior:

Service Ratings:

Actionable Insights:

Loyalty Programs: Focus on premium rewards and exclusive benefits to retain these brand advocates. Sustain Excellence: Continue delivering consistent, high-quality service that meets their elevated expectations.

Cluster 2: “Frequent but Critical”

Demographics & Travel Behavior:

Service Ratings:

Actionable Insights:

Service Improvements: Prioritize enhancements in areas where they express dissatisfaction (such as drink quality and safety measures). Targeted Communication: Clearly communicate any improvements or changes in service. Using pilot programs or direct feedback channels may help convert their critical view into satisfaction.

Cluster 3: “Infrequent and Cautious”

Demographics & Travel Behavior:

Service Ratings:

Actionable Insights:

Building Trust: Focus on reducing friction points (such as waiting times) and emphasize clear, reassuring communication about safety protocols. Incentivization: Consider trial incentives or flexible booking options to gently encourage more frequent travel while building confidence.

Overall Strategic Recommendations

Personalize Offers:

Monitor Key Metrics: