Standard clustering often uses Euclidean distance (price/location). However, amenity data is binary (Has Pool? Yes/No). To solve this, I engineered a different approach:
- Data Engineering: Processed 500+ raw amenities into 27 core value categories (e.g., "Remote Work Ready," "Family Friendly").
- The Algorithm: Applied Hierarchical Clustering (Complete Linkage).
- Why Jaccard? I chose Jaccard Distance over Euclidean because it statistically measures dissimilarity better for binary data sets.
