Visualization thoughts

Recommendation system is useful to individual user — so focus on visualization that works for the individual
Parallels between fashion world and music world: trends and styles occur in both — tags represent, in some ways, the trends of music.
Engagement - want something interactive so user will spend some time playing around with vis.
Artistic - individuals may or may not be attuned to data visualization best practices. Trend toward artistic interpretation (where exact numbers may or may not matter so much) to draw them in more.
Don’t dump too much data on them. Find balance between interesting and too much.

Notes on final vis

generated for 2,000 users with top number of songs listened to.
showing top 10 tags
showing max of top 35 songs played for each tag

Initial processing

Filtered out songs with song / track mismatches
- as recommended by msd
Converted song id to its first available track
- easier to connect with other datasets

Initial Analysis

General workflow:

parse data with python.
Output counts
Visualize with R

Counts collected:

number of users listen to song - song_counts
number of times song has been played - song_plays
number of songs user has listened to - user_songs
number of play counts associated with user - user_plays

Basic analysis:

song_counts - how many people have listened to a song

min: 1
median: 13
max: 90,444

average: 121

takeaway: Most songs not played very often. Some have been heard by a lot of folks. Mean is skewed by the long tail.

user_songs - how many songs a user has listened to

min: 3
median: 26
max: 4,316

average: 45

takeaway: most users have listened to 10 - 100 songs. Some power users have listened to many more.

user songs vs play counts

what happens when we plot the number of songs a user has listened to vs the total number of play counts for that user?

dot for every user. x-axis is number of songs user listened to. y-axis is total play count for user.

takeaway: Most users listen to a few songs a few times. Some users listen to a ton of songs only a few times. Some listen to very few songs many times.

song play count vs audience

what happens when we plot the number of users a song has been listened to by vs the total number of play counts for that song?

dot for every song. x-axis is the number of users listened to the song. y-axis is the total play count for song.

takeaway: two trends: songs that are heard by many people - but only a few times. These are popular songs that aren’t worth listening to very much.

songs that are heard by many people - and most listen to the song many times. These are songs that seem to have some staying power - users listen to them a bunch.

could be just a few users listening to these songs a whole bunch? Possible - but unlikely given the previous graph that very few users have very high play counts.

what makes these songs in the higher trend stand out from songs that have a similar number of users?

Top Song Visual

Revisions

Use raw play counts. Connect matches with straight lines

Revisions

idea: tree of tags / songs.

collapsible. similar to:

prototype:

collapsed - just tags:

expanded with songs:

problem: many songs have many different tags. Don’t work well with tree structure.

idea: network of tags / songs

Similar to traditional force-directed network.

problem: becomes hairball fast.

idea: multiple foci

avoid showing all connections at same time

keep tag nodes in consistent location

idea: use size of bubbles in circle plot to represent percentage of songs with that tag

problem: difficult to see what is going on. difficult to compare between tags.