Similarity
Cosine Similarity
This file makes use of the track features generated by the Cosine Pipeline in order to calculate the similarity between a playlist feature vector the a dataset of tracks. This enables a selection of the most similar tracks.
Cosine Similarity inherits from the Similarity Interface.
Cosine Similarity Documentation
TracksCosineSimilarity
Bases: Similarity
The class implements a Cosine similarity between a playlist vector and the tracks dataset to determine similar tracks to the playlist.
This class inherits the Similarity interface.
Attributes:
| Name | Type | Description |
|---|---|---|
additional_weighting |
int
|
The weighting factor applied to weighted columns. |
playlist |
DataFrame
|
The tracks dataset dataframe (before pipeline transformation) |
tracks |
DataFrame
|
The playlist tracks dataframe (before pipeline transformation) |
playlist_features |
DataFrame
|
The tracks dataset features (after transformation pipeline) |
track_features |
DataFrame
|
The playlist tracks features (after transformation pipeline) |
similarity |
Series
|
The ordered ranking of track similarity to the playlist vector (The index is uris) |
Source code in src/similarity.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 | |
__init__(playlist, tracks, weighted_features)
The initialization of the Cosine Similarity class
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
playlist |
DataFrame
|
The tracks dataset dataframe (before pipeline transformation) |
required |
tracks |
DataFrame
|
The playlist tracks dataframe (before pipeline transformation) |
required |
weighted_features |
list
|
A list of features to be weighted in order to prioritize feature importance in similarity calculation. |
required |
Source code in src/similarity.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | |
access_similarity_scores()
Getter method to access the similarity class field.
Returns:
| Type | Description |
|---|---|
Series
|
The track similarity to the playlist vector. Similarity is a Series using uris as the index. |
Source code in src/similarity.py
56 57 58 59 60 61 62 | |
calculate_similarity()
Method calculates the similarity between a playlist vector and the tracks feature matrix using cosine similarity
This calculation populates the self.similarity field.
The playlist feature dataframe is mean of each feature, creating a playlist vector.
Source code in src/similarity.py
41 42 43 44 45 46 47 48 49 50 51 52 53 54 | |
get_top_n(n)
This method should return the top-n most similar tracks as a Dataframe with essential features included.
Note, due to the cosine similarity. A similarity value of 1 indicates a high similarity, while a value near 0 indicates a low similarity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n |
int
|
The top-n most similar tracks to the playlist vector |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
A dataframe containing the top-n tracks. |
Source code in src/similarity.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | |
separate_playlist_from_tracks(features)
Method separates the feature dataframe (from pipeline) into tracks and playlist feature dataframes
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
features |
DataFrame
|
The track dataset features dataframe (This contains the playlist tracks too) |
required |
Returns:
| Name | Type | Description |
|---|---|---|
playlist_features |
Dataframe
|
The playlist track features dataframe |
tracks_features |
DataFrame
|
The track dataset features dataframe |
Source code in src/similarity.py
80 81 82 83 84 85 86 87 88 89 90 91 | |
vectorize_playlist()
Method vectorizes the playlist track features by determining the mean value of each track feature
Returns:
| Type | Description |
|---|---|
Numpy vector
|
The playlist feature vector |
Source code in src/similarity.py
93 94 95 96 97 98 99 100 | |
weight_features(weighted_columns)
Method weights the track dataset features (all features are normalized [0, 1]) by a scaler value to increase the effect of that feature in the similarity calculation.
Wighting in cosine similarity increases the impact of the feature in the similarity calculation
Note, this method directly alters the self.track_features field.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
weighted_columns |
list
|
The columns to be weighted by the additional weighting factor. |
required |
Source code in src/similarity.py
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 | |