Amanda Cleary – Spotify’s Daylist Feature

Amanda Cleary

Responsible A.I.

Auditing Spotify’s Daylist Feature

In September of 2023, Spotify released a feature called the daylist, which the company referred to as “your day in a playlist” (Spotify). Throughout the day, the playlist updates and replaces its fifty songs to reflect what the algorithm anticipates its user’s taste is at that moment. It uses a combination of metadata (such as artist, key signature, location, or even “danceability”, which can be culturally subjective) and user behavior to create “constellations” of similarly related song/genre clusters (Torabi, Spotify), then uses this algorithmic mapping to synthesize a playlist of similarly related songs encompassed in a genre-representative title ranging anywhere from “rainy day slow dance” to “60s British invasion”. The algorithm also selects subtitles out of the genres it considered while generating the songs and title. For example, a daylist titled “happy folk wanderlust” lists “adventure, mountain, stomp and holler, positive, indie folk, and happy indie” genres. Creating a personalized, adaptable experience, the daylist serves as a “guiding tool to minimize listeners’ selection cognitive load” (Torabi). By nature of its aim, it “reduces decisions on taste… constructing social groups and cultural value in the process” (Werner, 81), therefore warranting an evaluation for biased representations of genres, marginalization of artists, and reduction of consumption diversity through a content bubble. 

Use of the daylist directly influences which media a consumer consumes, altering the music industry’s interaction with the public (Fleischer, Rasmus, and Pelle Snickars, 136). As such, the algorithmic process underpinning the daylist’s recommendations and generative titles warrants examination for its effect on musical artists—the “cold start problem” for creators entering the field with no previous establishment in the algorithm (Torabi). A content bubble would exacerbate such marginalization of newer artists and remove their control over the distribution of their work, an ability greatly restricted when considering the industry is already “highly skewed toward major stars and record labels” (Eriksson, Maria, et al., 3). A content bubble also impacts a listener’s future preferences, “creating a loop that might neglect diverse musical experiences” or reinforce existing cultural biases (Torabi). Bubbles not only underserve new creators but also introduce societal and cultural influences on a user’s perception of populations and identities.                                                                                                                                        

To generate recommendations for a daylist, Spotify’s algorithm requires a user’s “taste profile”, mapped by their listening behavior constellation and personal background “gathered through explicit or implicit interactions and stereotypical information such as location, age, and/or gender” (Torabi). Several questions arise when preparing to audit Spotify’s new feature: What generalizations does the daylist algorithm make about users based on location, age, race, gender, and other inferred data aside from listening behavior? Does the daylist recommend different genres or songs by gender, nationality, age, or ethnicity? Does the algorithm produce a content bubble that excludes traditionally underrepresented demographics such as female rock artists (Eriksson, Maria, et al., 126) or otherwise generate “lower content diversity” while driving engagement (Anderson, Ashton, et al., 2161)? How accurately can the daylist predict a listener’s mood, and do advertisements reflect an inclination by Spotify to take advantage of the user’s state of mind or inferred demographics to generate a higher revenue (Eriksson, Maria, et al., 137)? Can title-generation output a stereotype? What data does the algorithm use to name its playlists? These questions seek illumination on whether or how the daylist algorithm deals with the following concerns: replacement of human cognition and creativity with artificial intelligence, marginalization of populations through bias, and propagation of stereotypes. 

Due to the nature of the daylist’s many regenerations per day, an audit requires significant time to collect meaningful data. Each daylist generation provides an immense amount of data for the researcher to parse through: its time of day, its title, its songs and artists, its genres, and its relevance to the user, each of which provides its own string of backing data. To make matters worse, Spotify requires five to thirty days to retrieve any requested data. A time constraint of less than a month for an audit completely removes the possibility of considering any data not manually monitored for results. Provided an audit had sufficient time to collect, retrieve, and parse data, the next limitation arises from resources. While a single account can provide some insights into the daylist algorithm, a combination of human-operated and bot-operated accounts can collect far more information to populate the research data. 

Human engagement in the audit would account for the diversity of Spotify’s users, drawing not only from separate regional locations, but also ethnic backgrounds, social classes, race, and gender, among other demographics. Recorded data from human participants other than personal demographics could include what ads Spotify Free allocated to them, which songs on the daylist they remembered hearing before, and whether the daylist (and its ads) matched their mood at the time. Human accounts would provide a holistic view of the daylist algorithm. The bot accounts, on the other hand, would consider specific demographics to study bias more directly.

Taking inspiration from the interventional research project described in Spotify Teardown, a bot army would separate into subgroups representing isolated demographics. Since human engagement provides more accurate representation of ethnicity, wealth, race, and education, the bots would measure bias according to the two pieces of personal data Spotify requires to set up an account—birth date and gender—as well as inferred location. Spotify does not allow the creation of an account without the user selecting one of the offered gender options provided, indicating that such data is “vital to the functioning of Spotify, at least for marketing purposes” (Eriksson, Maria, et al., 131). The mandatory age input implies agreement that the new user is at least 13 years old, as required by Spotify’s Terms of Use, but may also affect what kind of music the daylist algorithm prioritizes. For example, does a younger user get a smaller number of recommended songs with explicit content? (A quick, retroactive study for this example revealed that age did not immediately affect which songs the algorithm recommended, whereas gender did. Nonetheless, a longer audit should study this more in depth.) In either case, a bot subgroup representing a younger audience and another representing an older audience would stream the same selection of songs in the same order, then monitor which songs and genres the daylist generates from otherwise identical data. Another group of bots could have subgroups representing each available option for gender. This process would repeat for each demographic studied. The automatic collection and organization of the sheer amount of data presented in a Spotify audit would prove invaluable in determining algorithmic bias over a large volume of demographics and accounts when humans may otherwise get overwhelmed by maintaining identical accounts and measuring daylist outputs. 

This proof-of-concept audit limited itself to a single week’s worth of data collection in two regards. The first, representative of human engagement, measured the daylist’s accuracy in predicting mood and its diversity in content by familiarity to the user. The second technique represented a bot approach, intended to survey daylist differences between two gendered accounts listening to the same music in the same order at the same times.  Unfortunately, due to the constrictions of time, resources, and skills, this excludes measurement of bias in provided ads along with the daylist. Though the American Spotify offers gender options “Man”, “Woman”, “Non-binary”, “Something else”, and “Prefer not to say”, the platform’s perceived inclusivity is not replicated in every version of the software depending on location (Eriksson, Maria, et al., 130). Since this audit could not stream music from different locations and required simplicity, it further limited itself to the gender demographics of “Man” and “Woman”. It therefore could not measure location as proxy for race, wealth, or other demographics. As a result, the outcomes of this audit are far from comprehensive. However, collected data still suggests that “Spotify’s music recommendations can be considered as one of the venues where gendered norms and ideals are reproduced and manifested” (Fleischer, Rasmus, and Pelle Snickars, 141) as well as propose its likelihood to “enforce existing biases or perpetuate cultural stereotypes” (Torabi). 

For proof-of-concept human engagement, I measured my own usage of the daylist feature on my personal Spotify account. I logged information including Song Titles, Song Artists, Daylist Titles, Times of Day, Song Familiarity, Artist Familiarity, Relevancies, and whether I had previously Saved the song to my library. I measured Song Familiarity from 0 to 3. A score of 0 meant I had never heard (or at least had no memory of hearing) the song before; 1 meant I recognized it; 2 signified I listened to it semi-regularly; and 3 I reserved for my favorite songs. Artist Familiarity used the same system 0-3. I scored Relevance 0-3 as well, but with different meanings. A 0 score in this case meant I disliked the song so much while listening to the daylist that I had to skip past it; I tolerated 1 scored songs, but I easily could have skipped past them; scores of 2 meant that I wanted to listen to the song again; and once again, I reserved scores of 3 for my absolute favorite recommendations—ones that perfectly fit my mood and that I thoroughly enjoyed listening to at that moment. Saved had either a Y or N designation, where Y meant I had saved the song previously to a playlist or my Liked Songs, and N meant I had not. 

To make this data meaningful, I then considered the daylist as a whole. I created a Familiarity Score, Relevancy Score, and Bubble Score for each playlist. I defined the Familiarity Score by the averages of both Song Familiarity and Artist Familiarity averaged. The Relevancy Score took the average of the daylist’s Relevance scores. Since I could not spend the same amount of time listening to every playlist to score an equal amount of song recommendations pertaining to that specific time of day, the Relevancy Score had less uniformity in creation than the Familiarity Score. For the Bubble Score, I counted the number of songs marked Y for Saved and added 1 for each time an artist or song title repeated within the same playlist. 

Of the 13 daylists I gathered enough significant data for, Familiarity Score ranged from 0.48 to 1.68 (average of 1.05), Relevancy Score ranged from 0.95 to 2.67 (average of 1.65), and Bubble Score ranged from 16 to 32 (average of 23.6). Often, higher Bubble Scores correlated with higher Familiarity Scores, but neither necessarily affected the Relevancy Score of the daylist; a daylist with high Familiarity could have either a low or high Relevancy. The daylist data did not affirm nor entirely refute the claim that “algorithmic recommendations are associated with reduced consumption diversity” (Anderson, Ashton, et al., 2156). Further study of the algorithm would consider the effects on consumption diversity if the listener limited oneself to exclusively daylist-recommended music for several weeks. Perhaps then the Familiarity and Bubble Scores would skyrocket and reveal a concrete content bubble. Then the audit could consider that bubble’s effect on artists’ outreach and if the algorithm had replaced the user’s musical cognition and taste. 

While I found daylist recommendations often accurately predicted my mood and what I wanted to listen to, none of the algorithm-generated titles seemed particularly appealing. While I could understand what kind of songs “warm fuzzy feeling slow dance” would recommend, I blinked hard at “puppy love otherkin”. An Arizona State University student-written article about the campus’s perspective on the daylist feature includes screenshots of a social media post that showcase daylist-generated titles “Jewish writer”, “banjo Jewish”, “kindness Jewish”, and “earthy Jewish” (Dirst). This observation inspired my first implementation of the proof-of-concept “bot” accounts. 

I first created a profile (“Man” with a birth date in 2003) wherein I only played hispanic and latin mixes. The daylist’s generated titles “dance party bailar music for this moment” and “alto rhumba Saturday” did not directly reproduce stereotypes or generalizations (playlists I shuffled included “Latin Party”) like I had been attempting to reproduce, but the genres listed as justification nonetheless revealed inherent cultural bias to the algorithm, automatically associating Latin music with Hispanic cultural stereotypes such as “cleaning” (occupation of maid or house cleaner) (Barreto, Matt A., et al., Quinones Rivas and Saenz), “family” (family-oriented) (Barreto, Matt A., et al.),  or “sultry” (overly sexual) (Quinones Rivas and Saenz, 1). As I collected more listening data for the bot, it appeared the number of generalizations decreased. Unfortunately, I could not gather enough data to prove this pattern. Perhaps genres fulfilling cultural stereotypes would eventually disappear from the daylist with a longer audit, but their initial existence proves the daylist’s ability to reproduce biases. The “bot” accounts further emphasize the algorithm’s stereotype-patterning in selecting different genres to present to the user based on gender.

For my second implementation of bot accounts, I fed both a “Man” and a “Woman” account the same selection of Taylor Swift music simultaneously, then surveyed their output daylists. Immediately after each account finished listening to “Cruel Summer”, they had their own daylist. The daylists’ algorithmic outputs seemed identical. Titled “delulu writer music for this moment”, both daylists recommended the same songs in the same order. However, the daylist justified its selections for the woman’s account with “Since you’ve recently listened to scream and belter. Here’s some: writerlyricist, scream, delicate, relationship, and emotional” whereas the man’s proclaimed, “Since you’ve recently listened to belter and 170 bpm. Here’s some: scream, belter, delicate, treadmillstrut, and relationship.” Already, the daylists differed after one identical song. Further generated daylists introduced repeated concepts of “delicate” and “heartbroken” for the Woman and “belter” and “teen rock” for the Man. Seen as the Woman’s playlist never mentioned rock or guitar as justification genres despite consisting of the same songs and artists while the man’s repeatedly emphasized those ideas, the daylist audit backed the claim that “Spotify’s functions for recommendations help reconstruct dominating genres like rock as male-focused, masculine, and white” (Werner, 88). Effectively, the daylist considered the Man’s account to value the rock genre more highly than the Woman’s. Unlike the previous bot account, the gendered accounts did not seem to decline in the generalizations they made for each user. Instead, their differences in genres appeared to amplify. However, they still had significant enough overlap to relegate these findings to mostly speculation, requiring a longer audit with more accounts and artists to draw conclusions about the algorithm’s gendered biases. 

While time constraints and a lack of data limited this audit’s ability to consider the daylist as in depth as desired, it nonetheless provided insight into the algorithm’s effectiveness at anticipating a user’s mood and its demographic biases. Studying my personal recommendations, I found a relatively healthy balance between my listening desires in the moment and new songs or artists. However, genre selection revealed hints of cultural stereotyping and differences in recommendations based on gender despite identical listening patterns. To fill the gaps of this audit’s research, I considered projects that audited the Spotify platform altogether and used information on the entire platform’s underlying algorithms to guide my understanding of the daylist’s algorithm. Therefore, studies on musical diversity (Anderson, Ashton, et al.) and gender (Werner) in algorithmic recommendation software provide greater specifics on such effects. This audit instead focused on proving that Spotify’s daylist is the latest implementation of pre-existing algorithm cultures that marginalize, stereotype, and replace human creativity by ascribing cultural value through hidden code (Werner, 78). 

Bibliography

Anderson, Ashton, et al. “Algorithmic Effects on the Diversity of Consumption on Spotify.” ACM Digital Library, Association for Computing Machinery, 1 Apr. 2020, dl.acm.org/doi/abs/10.1145/3366423.3380281. 

“Algorithmic Effects on the Diversity of Consumption on Spotify” asserts that algorithmic consumption of streaming music removes listening diversity of the user, creating a “filter bubble”. It examines the listening behavior of millions of users to understand how the algorithm affects the overall user experience. The article provides research about Spotify’s negative algorithmic effects on human behavior and interaction with the art of music, which the Daylist feature exacerbates. It also grants an example of how to approach auditing Spotify algorithms by using a filter of the user’s generality verses specificity in music taste.

Barreto, Matt A., et al. The Impact of Media Stereotypes on Opinions and Attitudes Towards Latinos, National Hispanic Media Coalition, Sept. 2012, www.chicano.ucla.edu/files/news/NHMCLatinoDecisionsReport.pdf. 

This NHMC report examines cultural stereotypes of Latin and Hispanic culture in American media. It provides insight on common preconceptions which a biased algorithm trained on American data may replicate. 

Dirst, Andrew. “Spotify’s ‘Daylist’: Enhancing vs. Diminishing the Art of the Playlist.” The Arizona State Press, The Echo, 19 Oct. 2023, www.statepress.com/article/2023/10/echo-spotify-ai-convenience-daylist-playlist-new-technology.

As a college report, this article most simply examines college-age reception of Spotify’s Daylist feature to understand its appeal. It considers the risk automated playlisting poses to human curators and minimize the control of the artist. It also touches on the generalizations algorithms make by sharing a post about playlists arbitrarily labeled “Jewish Writer” or “Kindness Jewish” that surfaces the concern of how the algorithm represents populations and reinforces stereotypes. It offers perspective on the general public’s current reception of the technology and whether or not the algorithm’s potential harms are a concern to the everyday person.

Eriksson, Maria, et al. Spotify Teardown: Inside the Black Box of Streaming Music. The MIT Press, 2019. 

Spotify Teardown takes a deep dive into the Spotify corporation, examining how it operates. It reverse-engineers the product and questions the streaming service’s ethics and lack of regulation. The book’s research into the back-end of Spotify illuminates processes otherwise hidden from the public, which informs the investigation into Spotify as a whole.

Fleischer, Rasmus, and Pelle Snickars. “Discovering Spotify – A Thematic Introduction.” Culture Unbound, Journal of Current Cultural Research, 31 Oct. 2017, cultureunbound.ep.liu.se/article/view/65. 

The authors synthesize conclusions from various research projects on Spotify’s algorithmic effects in the space of the music industry, gendering and bubbling of content, and advertisement.

Goldrick, Stacy. “Get Fresh Music Sunup to Sundown with Daylist, Your Ever-Changing Spotify Playlist.” For the Record, Spotify, 12 Sept. 2023, newsroom.spotify.com/2023-09-12/ever-changing-playlist-daylist-music-for-all-day/.

This is the official announcement from Spotify about what the daylist feature is, where to find it, and how to share it.

“Learn About Those Music Genres You May Not Have Heard Of.” For the Record, Spotify, 30 Nov. 2022, newsroom.spotify.com/2022-11-30/learn-about-those-music-genres-you-may-not-have-heard-of/.

Spotify describes various, niche genres of music into which its algorithms categorize songs. It describes criteria and scaling for “bounciness”, “valence”, and “danceability”.

Quinones Rivas and Saenz. “American Pop Culture’s Perpetuation of Latino Paradigms And …” Ursidae, Ursidae: The Undergraduate Research Journal at the University of Northern Colorado, June 2017, digscholarship.unco.edu/cgi/viewcontent.cgi?article=1127&context=urj. Accessed 22 Mar. 2024. 

This research paper justifies the pursuit of understanding whether an algorithm perpetuates stereotypes by considering the effects of such stereotypes propagated by American pop culture. 

“Spotify Privacy Policy.” Spotify, www.spotify.com/us/legal/privacy-policy/. Accessed 21 Nov. 2023. 

Spotify’s privacy policy informs users of how the service collects and stores their data. Though fraught with its own inconsistencies and lack of clarity, the policy offers a glimpse into what kind of data may inform the Daylist feature to curate personalized collections.

Torabi, Nima. “The Inner Workings of Spotify’s AI-Powered Music Recommendations: How Spotify Shapes Your Playlist.” Medium, Fundamentals of Product Management, 28 Aug. 2023, medium.com/neemz-product/the-inner-workings-of-spotifys-ai-powered-music-recommendations-how-spotify-shapes-your-playlist-a10a9148ee8d.

Torabi provides a comprehensive look into how Spotify uses artificial intelligence to recommend music to listeners. Not only does he explain the process in detail, but he also examines how the algorithm favors established artists and may contain cultural bias. His synthesis of which data informs the AI, including user data, song information, and “sonic identity”, paints a vivid picture of how Spotify’s Daylist works behind the scenes to create niche playlists throughout a user’s day.

Werner, Ann. “Organizing Music, Organizing Gender: Algorithmic Culture and Spotify Recommendations.” Taylor & Francis Online, Taylor & Francis Group, LLC, 10 Jan. 2020, www.tandfonline.com/doi/pdf/10.1080/15405702.2020.1715980. 

Werner examines how Spotify’s classifications of music for use in algorithms represents gender. She considers the software’s role in doubling down on gendered stereotypes as it recommends music. Her argument of Spotify’s quantification of identity by gender and nationality supports the idea that the Daylist creates a culturally insensitive bubble for the user.