A data science approach to movies and film director analysis
First Monday

A data science approach to movies and film director analysis by Chris May and Lior Shamir

The creation of movies involves a careful process of planning, recording, and editing of the visual content. Here we propose a quantitative computer-based analysis of movies to identify similarities that can indicate on influential links between films, genres, or film directors. The method works by first extracting a comprehensive set of numerical image content descriptors from a large number of frames from each movie. Then, the most informative descriptors are selected, and the values of the frames are compared to each other to create a similarity matrix between the movies. The similarity matrix is visualized using a phylogeny to show a network of similarities between movies. Experimental results with a dataset of 104 movies show that the method is able to predict the movie based on a single frame with accuracy of ∼74 percent, and the similarity analysis tends to cluster movies of the same directors or of the same movie series. These results show that computer analysis is able to analyze similarities between movies, providing a quantitative approach to film studies. The automatic association of the movies by directors shows that the film director has stronger influence on the visual outcome than cinematographers or actors.


1. Introduction
2. Data
3. Movie analysis method
4. Results
5. Conclusions



1. Introduction

Cinema is one of the most popular forms of art, with millions and even billions of viewers around the world (Mulvey, 1975). As cinema progressed since the nineteenth century, different film genres developed, as well as signs and symbols that defined how films are communicated (Metz, 1991; Monaco, 2000). The development of different movie genres led to different ways in which movies could be communicated to viewers.

Movies and movie genres have high impact on culture, and those effects can be attributed to Hollywood (Belton, 1996; Giroux, 2002) as well as non-Hollywood movie industries (Hogan, 2008). For instance, the Vietnam war and the American policy and culture related to it were highly influenced by the movie genre that it sparked, and the way the Vietnam war was communicated through these movies (Corrigan, 1991). Films made after the war such as Full Metal Jacket by Stanley Kubrick and Apocalypse Now by Francis Ford Coppala directly tapped into the mood during the Vietnam War, while the latter also worked in themes motifs from other works like the novella Heart of Darkness by Joseph Conrad. Another example is the impact of movies on the perception of occupations such as lawyers or accountants through the way these occupations were communicated to the viewers (Beard, 1994).

Although more modern ways of film making allow a group of people to create a movie in collaboration (Newman, 2008), the commercial movie industry is still dominated by the auteur theory (Santas, 2001; Caughie, 1981), according which the movie director has a substantial impact on the created movie (Monaco, 2000), equivalent to the impact of an author on a novel. The cinematographer is responsible for visual elements such as the camera and lighting, but the final decisions are made by the director of the movie, who also supervises the editing of the movie before its final production (Monaco, 2000). It should be noted, however, that the auteur theory is controversial (Staples, 1966), and alternative approaches based on a collaborative process (Kael, 1996) proposed that the impact of the director is not necessarily more substantial than the script writer (Kipen, 2006), actors, and producer (Sadoul, 1962).

The work described in this paper aims at providing an automatic method of analyzing film visual data, which can be used for studying films in a manner that is not dependent on the subjectivity of manual interpretation. Such methods can utilize automatic video analysis and machine vision foundations to provide a method that can be used within the humanities, and specifically film studying.

Substantial work has been done in the past on video analysis in the context of engineering-oriented tasks such as action recognition (Soomro, et al., 2012; Kuehne, et al., 2011), or analysis of video data for surveillance (Hsieh, et al., 2006; Brutzer, et al., 2011). However, that kind of analysis aims at identifying the specific and well-defined activities in the video, rather than its artistic aspects. Other previous work aimed at analyzing movies through metadata and other related information that is not the visual content of the movies. These include automatic analysis of movie data for rating movies based on social media analytics (Oghina, et al., 2012). Some related work focused on recommendation framework for movies (Melville, et al., 2002; Debnath, et al., 2008; Jung, 2012; Lamprecht, et al., 2015). Collections of movies have also been analyzed as networks through actors participating in the same movies (Gallos, et al., 2013), or critics who review and rate the same movies (Fatemi and Tokarchuk, 2012). Social networks have also been widely used as tools to analyze movies without the need to analyze the audio or video data directly (Weng, et al., 2009).

Here we apply image analysis algorithms to analyze and profile movies by their visual content. The algorithms are based on previous work of analyzing visual art, demonstrating the ability of the algorithms to identify between different schools of art (Shamir, et al., 2010), identify similarities between painters (Shamir, 2012b), and profile art history in a manner that is largely in agreement with the analysis of art historians (Shamir and Tarakhovsky, 2012). The quantitative analysis of the visual content also allowed to profile the changes in artistic style over time (Burcoff and Shamir, 2017), and identified unique styles of certain painters such as Vincent van Gogh and Jackson Pollock (Shamir, 2015, 2012b). That is, machine analysis and algorithms developed initially for engineering-oriented tasks were used to address problems within the humanities, and specifically analysis of visual art.

Since computer algorithms are able to analyze visual art to a certain degree, the research question of this study is whether such algorithms are able to analyze the art expressed in the process of movie making. For that purpose, each frame in the movie can be considered a work of art, and the analysis of a large number of frames can be used to profile potential links of similarities or influence between movies. Quantitative analysis of movies can initiate a new quantitative approach to film studies, where movies can be analyzed quantitatively by computers that can measure a wide range of visual cues that reflect the artistic style of the film maker.



2. Data

The dataset used in the experiment contains 104 Hollywood films from 24 different directors. The films that were selected are among the most popular films in Hollywood’s recent history, made between the years of 1960 and 2017, and had substantial impact on a large and diverse population of viewers (Bordwell, et al., 2003; Maltby, 2003). The genre of each movie was also taken into account, although the genre-movie relationship is not necessarily a one-to-many relationship, as a movie can belong in more than one genre. For instance, some movies can be considered action or drama movies, but at the same time be considered a comedy or horror film.

In this study the genre was determined based on the primary intent of the movie and target audience, and was categorized into one of six genres: comedy, drama, action, horror, thriller, and western. For example, Evil Dead 2 is a horror film that is also considered to be a comedy, but its main goal target audience is the horror genre. The complete list of movies, genres and directors are summarized in Table 1.


Table 1: Movies used in this experiment.
2001: A Space OdysseyStanley Kubrick1968SciFi
The Adventures of TintinSteven Spielberg2008Action
AlienRidley Scott1979SciFi
AliensJames Cameron1986SciFi
Alien: CovenantRidley Scott2017SciFi
Angels & DemonsRon Howard2009Thriller
Ant-ManPeyton Reed2015SciFi
Apollo 13Ron Howard1995Drama
Army of DarknessSam Raimi1992Horror
The AvengersJoss Whedon2012Action
Avengers: Age of UltronJoss Whedon2015Action
The AviatorMartin Scorsese2004Drama
The BirdsAlfred Hitchcock1963Horror
Baby DriverEdgar Wright2017Action
BackdraftRon Howard1991Drama
A Beautiful MindRon Howard2001Drama
Blade RunnerRidley Scott1982SciFi
Blade Runner 2049Denis Villeneuve2017SciFi
Bottle RocketWes Anderson1996Comedy
Cape FearMartin Scorsese1991Thriller
Captain AmericaJoe Johnson2011Action
Captain America: Civil WarJoe Russo Anthony Russo2016Action
Captain America: The Winter SoldierJoe Russo Anthony Russo2014Action
The Color PurpleSteven Spielberg1985Drama
Darjeeling LimitedWes Anderson2007Comedy
DarkmanSam Raimi1990Action
The Da Vinci CodeRon Howard2006Thriller
The DepartedMartin Scorsese2006Drama
Drag Me to HellSam Raimi2009Horror
The Dark Knight RisesChristopher Nolan2012Action
The Dark KnightChristopher Nolan2008Action
Doctor StrangeScott Derrikson2016Action
DunkirkChristopher Nolan2017Drama
E.T. the Extra-TerrestrialSteven Spielberg1982SciFi
Evil Dead (2013)Fede Álvarez2013Horror
Evil DeadSam Raimi1981Horror
Evil Dead 2Sam Raimi1987Horror
Eyes Wide ShutStanley Kubrick1999Thriller
Fantastic Mr. FoxWes Anderson2009Comedy
FollowingChristopher Nolan1998Thriller
Friday the 13thSean S. Cunningham1980Horror
Friday the 13th Part 2Steven Miner1981Horror
Friday the 13th Part IIISteven Miner1982Horror
Friday the 13th: The Final ChapterJosesph Zito1985Horror
Friday the 13th Part VII: The New BloodJohn Carl Buechler1988Horror
Full Metal JacketStanley Kubrick1987Drama
Gangs of New YorkMartin Scorsese2002Drama
Grand Budapest HotelWes Anderson2014Comedy
The GiftSam Raimi2000Thriller
GoodfellasMartin Scorsese1990Drama
Guardians of the GalaxyJames Gunn2014Action
Guardians of the Galaxy Vol. 2James Gunn2017Action
Hateful EightQuentin Tarantino2015Western
Heart of the SeaRon Howard2015Drama
Hotel ChevalierWes Anderson2007Drama
Hot FuzzEdgar Wright2007Comedy
How the Grinch Stole ChristmasRon Howard2000Comedy
HugoMartin Scorsese2011Drama
InceptionChristopher Nolan2010Thriller
Incredible HulkLouis Leterrier2008Action
Indiana Jones and the Raiders of the Lost ArkSteven Spielberg1981Action
InfernoRon Howard2016Thriller
Inglourious BasterdsQuentin Tarantino2009Action
InsomniaChristopher Nolan2002Thriller
InterstellarChristopher Nolan2014SciFi
Iron ManJon Favreau2008Action
Iron Man 2Jon Favreau2010Action
Iron Man 3Shane Black2013Action
Jackie BrownQuentin Tarantino1997Action
JawsSteven Spielberg1975Horror
Jurassic ParkSteven Spielberg1993Action
Kill Bill: Vol. 1Quentin Tarantino2003Action
Kill Bill: Vol. 2Quentin Tarantino2004Action
The Life Aquatic with Steve ZissouWes Anderson2004Comedy
The MartianRidley Scott2015SciFi
MementoChristopher Nolan2001Thriller
Moonrise KingdomWes Anderson2012Comedy
Oz the Great and PowerfulSam Raimi2013Action
The PrestigeChristopher Nolan2006Thriller
PsychoAlfred Hitchcock1960Horror
Pulp FictionQuentin Tarantino1994Action
The Quick and the DeadSam Raimi1995Western
Reservoir DogsQuentin Tarantino1992Action
The Royal TenenbaumsWes Anderson2001Comedy
RushRon Howard2013Drama
RushmoreWes Anderson1998Comedy
Saving Private RyanSteven Spielberg1998Drama
Scott Pilgrim vs. the WorldEdgar Wright2010Action
Shindler’s ListSteven Spielberg1993Drama
The ShiningStanley Kubrick1980Horror
Shutter IslandMartin Scorsese2010Horror
SilenceMartin Scorsese2016Drama
Spider-ManSam Raimi2002Action
Spider-Man 3Sam Raimi2007Action
Spider-Man: HomecomingJon Watts2017Action
Shaun of the DeadEdgar Wright2004Comedy
The Sugarland ExpressSteven Spielberg1974Drama
Taxi DriverMartin Scorsese1976Drama
ThorKenneth Branagh2011Action
Thor: The Dark WorldAlan Taylor2013Action
Thor: RagnarokTaika Waititi2017Action
The World’s EndEdgar Wright2013Comedy
WillowRon Howard1988Action
The Wolf of Wall StreetMartin Scorsese2013Comedy


The fillms were encoded using H.264/AVC video coding standard, and resolution of 720p. Each film was separated into a set of 100 frames, such that the time intervals between each two frames were equal throughout the movie. For instance, a 100-minute film was separated into 100 frames such that the interval between each two frames is 60 seconds, and the first frame is extracted at 00:00:60. Each frame is of resolution of 1280×720, and was converted into a TIF (Tagged Image Format) file.



3. Movie analysis method

The frames extracted from movies as described in Section 2 were analyzed using an image analysis method that can analyze large and complex visual content in a comprehensive manner. The method is based on the Wndchrm feature set (Shamir, et al., 2008), which has been used for a wide variety of tasks requiring comprehensive image analysis (Shamir, et al., 2008; Orlov, et al., 2008; Shamir, et al., 2013, 2010; Shamir, 2012a, 2012b; Kuminski, et al., 2014; Shamir, et al., 2016).

In particular, it has been used widely to study visual art in a quantitative fashion by applying computational analysis to complex visual content (Shamir, et al., 2010). For instance, it showed that the computer analysis of art is largely in agreement with the way art historians view influential links between different schools of European art (Shamir and Tarakhovsky, 2012). It was also used to identify artistic elements typical to Jackson Pollock (Shamir, 2015), and show evidence of mathematical similarities between Jackson Pollock and Vincent von Gogh (Shamir, 2012b). Another use of the Wndchrm scheme related to automatic analysis of art is the studying of art perception, showing patterns of differences between abstract expressionism art and paintings drawn by young children animals (Shamir, et al., 2016).

Wndchrm computes a large set of 2881 numerical visual content descriptors from each frame. These descriptors are numerical values that reflect the visual content, and change when the visual content changes. Since measuring visual content is a complex task, the extraction of numerical values that reflect the content allows the quantitative analysis. The numerical image content descriptors includes various characteristics of the visual content such as fractals, textures (Haralick, Tamura, Gabor), polynomial decomposition of the pixel intensities (Chebyshev polynomials statistics, Zernike polynomials, Chebyshev Fourier spectral analysis, Radon features), statistics of the pixel intensities (multi-scale histograms, first-four moments, image entropy), and high-contrast features (Prewitt edge statistics, objects statistics, Euler number). The complete description of the feature set is provided in full details in Shamir, et al. (2008); Orlov, et al. (2008); Shamir, et al. (2013, 2010); Shamir (2012a); Shamir, et al. (2016). The source code of the method is also publicly available (Shamir, 2017). In summary, fractals measure the degree of which the visual content is made of shapes that are part of similar larger shapes. Textures reflect the repetitive patterns of variation of the frame’s surface. Polynomial decomposition of pixel intensities measures the consistency of the changes of the pixel intensities by fitting it into a mathematical equation of a high-order polynomial function. Statistical distribution of the pixel intensities is a simple way of reflecting the changes in the brightness of the pixels in the frame. Object statistics measures the number and distribution of objects that are substantially brighter or dimmer than the frame’s background. Edge statistics measures the frequency, direction, and distribution of lines that separate between neighboring areas inside the frame with substantially different brightness. The mathematical definitions of these numerical content descriptors are available in Shamir, et al. (2008).

To extract more information from each frame, the content descriptors are extracted not only from the raw pixels, but also from transforms of the frame and transforms of transforms (Shamir, et al., 2009). Image transform represent the same content as the raw pixels, but in a different form that provides different relationship between the pixels, and therefore their analysis provide different information than the information of the raw pixels. The transforms used by the scheme are the Fast Fourier Transform (FFT), Wavelet (Symlet 5, level 1) two-dimensional decomposition of the image, the Chebyshev transform, and the Edge Transform, which is the magnitude component of the image’s Prewitt gradient. Figure 1 shows frames from movies used in this study, and the different transforms of these frames. Some of these transforms are not understandable to the human eye, but contain information that can be used by machines. The output of some of the transforms, such as the Chebyshev transform, are of different dimensions than the original image.


Frames from The Avengers (left) and Eyes Wide Shut, and the transforms of these frames
Figure 1: Frames from The Avengers (left) and Eyes Wide Shut, and the transforms of these frames. The transforms are (from top to bottom) Fourier Transform, Chebyshev transform, Wavelet (Symlet 5) transform, edge transform, Fourier transform of the edge transform, Fourier transform of the Chebyshev transform, and Fourier transform of the Wavelet transform.


To extract as much information as possible about the complex visual content of movies, the numerical content descriptors are computed from the raw images, but also from the transforms. The content descriptors extracted from all transforms are the statistics and texture features, which include the first four moments, Haralick textures, multi-scale histograms, Tamura textures, and Radon features. The polynomial decomposition descriptors (Zernike features, Chebyshev statistics, and Chebyshev-Fourier spectral features) are extracted from all transforms, except from the Fourier and Wavelet transforms of the Chebyshev transform, and the Wavelet and Chebyshev transforms of the Fourier transform. The high contrast features (edge statistics, object statistics, and Gabor features) are extracted just from the raw pixels, and not from the transforms.

To ignore non-informative features, the numerical content descriptors are ranked by their Fisher discriminant scores (Bishop, 2006). Fisher discriminant scores can be conceptualized as the ratio between the variance of the numerical values computed from frames of each movie, divided by the variance of the means of all movies. That is, the Fisher scores gets higher as the values computed from each movie are closer to each other, and the means of the values of each movie are more different from the means of the values of the other movies. A detailed description of Fisher discriminant scores can be found in Bishop (2006).

The similarity MI,X between a certain frame I and a certain movie X is determined by the minimum Euclidean distance between the features of frame I and any of the frames of movie X, as shown by Equation 1:


Equation 1


where If is the value of feature f extracted from the frame I, and Xi,f is the value of feature f extracted from the frame i of movie X. Naturally, when classifying a certain frame the predicted movie is determined by the movie X that has the minimum MI,X. The similarity between any pair of movies X and Y is determined by the mean of MI,X, such that I is all test set frames of movie Y.

Analyzing the similarities between all pairs of movies in the dataset provides a matrix of all similarities between all pairs of movies. The values in the similarity matrix are normalized by dividing all values in each row by the similarity value of the movie to itself, and therefore all similarity values are between 0 and 1. The similarity matrix is then visualized by using a phylogeny, which is a method of visualization of relationship of similarities. The phylogeny visualization package used in this experiment is PHYLIP (Felsenstein, 1989). It is used with randomized input order of sequences where 97 is the seed, 10 jumbles, and Equal-Daylight arc optimization (Felsenstein, 1989).



4. Results

The method described earlier was applied to the data, 104 Hollywood films from 24 different directors. In the first experiment 10 movies from the dataset were analyzed such that 90 frames from each movie were used for training and the remaining 10 frames for testing. The experiment was repeated 20 times such that in each run different samples were randomly allocated to the training and test sets.

The results show that the algorithm was able to associate a test frame with its movie based on the other training frames in accuracy of 79 percent. Since the dataset of the experiment has 10 movies, random chance would have associated a frame with the movie it is part of in 10 percent of the cases. The accuracy of the algorithm is much higher than mere chance, showing that the algorithm is capable of identifying image content that is related to a certain movie. Table 2 shows the confusion matrix of the experiment.


Confusion matrix of the experiment with 10 different movies
Table 2: Confusion matrix of the experiment with 10 different movies.
Note: Larger version available here.



Similarity matrix of the 10 movies
Table 3: Similarity matrix of the 10 movies.
Note: Larger version available here.


The confusion matrix shows that the higher numbers are along the diagonal of the matrix, which is aligned with the classification accuracy and therefore it is expected that most movies are classified correctly. Some movies tend to have a higher confusion between them such as The Shining and Eyes Wide Shut, or Full Metal Jacket and Eyes Wide Shut.

As described earlier, the algorithm can provide the measured similarity between different movies. Table 3 shows the matrix of similarities between the 10 different movies. Due to the different samples used by the different classes, and also due to the imperfect accuracy of the algorithm, the measured similarity between movie i and movie j is not always identical to the measured similarity between movie j and movie i, but the values are in most cases close, as the similarity matrix shows.

The similarity matrix is visualized using a phylogeny as described earlier. Figure 2 shows the phylogeny that visualizes the similarity matrix of Table 3.


Phylogeny of the similarity matrix of Table 3
Figure 2: Phylogeny of the similarity matrix of Table 3.


The phylogeny shows that the three movies by Stanley Kubrick (Eyes Wide Shut, Full Metal Jacket, The Shining) were clustered close to each other. The two movies by the director Edgar Wright (The World’s End, Hot Fuzz) were also placed close to each other, with a third movie (Baby Driver) placed close to that pair. The movies The World’s End and Hot Fuzz have several things in common in addition to being made by the same director. The two movies are part of a conscious trilogy developed by the director. Additionally, they also share some of the same actors. However, each movie has a different cinematographer, Bill Pope for The World’s End and Jess Hall for Hot Fuzz.

For the movies directed by Kubrick, the three movies span over three different genres (horror, drama, thriller), and also do not share any actors, and are not part of any specific movie series or set. None of these movies also shared the cinematographer, as the three movies were made by three different cinematographers: Larry Smith, Douglas Milsome, and John Alcott.

The observation that these movies were clustered together by the algorithm indicates that the algorithm was able to detect some visual cues that reflect the way the films were made. The fact that all films were made by the same director provides an indication that the director has an obvious effect on the visual content of the film, and that impact can be identified by the algorithm. The differences between the cinematographers of these movies showed that the director has the strongest impact on the visual content of the movie.

The similarities between the movies were determined in an unsupervised manner, meaning that the algorithm did not have knowledge on similarities between the movies, but determined the network of similarities without prior knowledge. Therefore, the clusters of movies made by the same directors indicate that the algorithm is driven by the visual content of the movies, and not by artifacts (Shamir, 2011, 2008). When the algorithm is trained using defined classes, the algorithm can use artifacts such as the compression algorithm or image format to associate between frames and the movies they were taken from (Shamir, 2011, 2008). In the experiment the film director was not part of the training process, but the algorithm was still able to identify similarities between movies of the same directors, showing that the analysis is driven by the visual content of the movie.

The same methodology was also tested with a dataset of 23 films. The accuracy of automatic association of a frame to a movie was 74 percent, which is far higher than the expected mere chance accuracy of ∼4.3 percent. Figure 3 shows the resulting phylogeny when using a dataset of 23 movies. As the figure shows, the Friday the 13th films are clearly clustered together, showing that the computer analysis was able to identify that these films are part of the same set, although not all of them were made by the same director. Close to that cluster, three movies are grouped together: The Shining, Eyes Wide Shut, and Full Metal Jacket. These films were directed by Stanley Kubrick, indicating that the computer analysis could detect visual similarities that result from the work of the same director on different films.


Phylogeny of the experiment when using a dataset of 23 movies
Figure 3: Phylogeny of the experiment when using a dataset of 23 movies.


Another small cluster contained the movies The World's End and Hot Fuzz, which are two films made by the director Edgar Wright, again showing that the similarities between movies as identified by the algorithm are closely linked to the director who made the films. The pair of movies Moonrise Kingdom and Fantastic Mr. Fox were made by film director Wes Anderson, and are placed on the same branch, as the two films Army of Darkness and Evil Dead 2 were directed by Sam Raimi. It should be noted that the film Evil Dead (2013) was made by a different director, Fede Alvarez, and is also placed in the phylogeny far from the movie Evil Dead 2.


Phylogeny of the experiment when using a dataset of 104 movies
Figure 4: Phylogeny of the experiment when using a dataset of 104 movies.
Note: Larger version available here.


Figure 4 displays the phylogeny generated after applying the same methodology to a dataset of 104 movies. The classification accuracy of associating a frame to its movie is ∼74%, which is higher than mere chance accuracy of ∼1%. As the phylogeny shows, horror movies are grouped together at around the top left part of the phylogeny. The movies Evil Dead, Evil Dead 2, and Army of Darkness, are placed close to each other, and all of them were made by the same director — Sam Raimi. In the lower right nodes, and the node directly above those two shows a cluster of several movies that are part of the Marvel Cinematic Universe (MCU). These movies were made by different directors, but share characters and genre along with a movie franchise.

Some of the movies are loosely grouped in what “Marvel”, the company that produced the movies, called “phases”. Basically, a set of movies released around the same time and were considered to share the same plot thread. Two of the three completed phases were clustered together by the algorithm. For instance, Thor, Iron Man, Iron Man 2, The Incredible Hulk, and Captain America are all part of “phase I”, and these movies are all part of the same cluster. Just one movie from the that “phase”, The Avengers, was not clustered together with the other Marvel Phase I movies.

Iron Man 3, Thor: The Dark World, Captain America: The Winter Soldier, Guardians of the Galaxy, Avengers: Age of Ultron, and Ant-Man are part of “Phase II”. With the exception of Ant-Man, all other movies of the phase are placed in the same cluster.

For some movies, such as Thor, Iron Man, Incredible Hulk, and Captain America, there are no shared actors, cinematographers, or directors between these movies. The two Avengers movies have some common elements such as shared actors. Figure 4 shows tighter grouping of all the Marvel movies. As described earlier, the frames are analyzed by a collection of general global descriptors, and without applying detection of large objects such as faces, so the presence or absence of shared actors is not expected to impact the analysis.

In Figure 4, in the dense cluster in the upper left, several movies by director Sam Raimi are grouped together (Evil Dead, Evil Dead 2, Darkman, The Gift, and Spider-Man). It is important to note that there are two iterations of Spider-Man on the chart that are not clustered together. Spider-Man and Spider-Man 3 were directed by Sam Raimi and Spider-Man: Homecoming was directed by Jon Watts. This shows that the algorithm could separate between movies made by different directors, although the topic of the movie is the same. It could also be due to the fact that Spider-Man: Homecoming is a Marvel Cinematic Universe film and shares a few actors with the Iron Man films, although those characters are not dominant in the film, and the algorithm uses low level features and does not identify specific actors.

In the upper branch in the lower left side of Figure 4 there is a cluster of films by Martin Scorsese (The Aviator, Gangs of New York, Silence, and The Departed).

The analysis is based on a large number of numerical image content descriptors, from which the most informative descriptors are selected by computing the Fisher discriminant of each feature, and then using the features with the highest Fisher discriminant scores as described earlier. That is, features with higher Fisher discriminant scores are assumed to be more informative compared to features with lower Fisher discriminant scores, and therefore have more impact on the results of the analysis. Figure 5 shows the sums of the Fisher discriminant scores of the features used on the analysis, extracted from the different transforms.


Fisher discriminant scores of the features used to classify and measure similarities between the movies
Figure 5: Fisher discriminant scores of the features used to classify and measure similarities between the movies.
Note: Larger version available here.


The graph shows that numerical image content descriptors computed from the raw pixels of the frames provide relatively little information about the movies compared to numerical content descriptors extracted from the transforms of the frames. For instance, the Zernike, first four moments, and Haralick texture features computed from the Fourier transform of the frames provide substantial information about the movie, as well as other features such as the Tamura textures extracted from the Chebyshev transform. Since these features are extracted from transforms of the frames, they are not intuitive, showing the complex nature of the visual content expressed in modern films.

The contention that image transforms provide more informative about the visual content compared to the raw image has been studied and demonstrated in the past (Shamir, et al., 2009). Empirical analysis shows that when the visual content is more complex, such as in the case of visual art, the raw pixels do not capture all information, and relationship between pixels can be meaningful for pixels in different parts of the frame (Shamir, et al., 2010). Image transforms, and transforms of image transforms, allow to capture relationships between the pixels that often cannot be done easily by analyzing the raw pixels directly. For instance, the distribution of repetitive or similar elements that appear in different parts of the frame might be better analyzed through the Fourier transform of the raw pixels rather than directly through the raw pixels.

The information extracted by using transforms and transforms of transforms is specifically useful when analyzing complex visual content such as visual art (Shamir, et al., 2010; Shamir and Tarakhovsky, 2012; Shamir, et al., 2016; Shamir, 2012b; Burcoff and Shamir, 2017). Because film directors also prioritize and carefully craft the visual content of their films, it is expected that image features extracted from image transforms and compound image transforms provide useful information about the visual content of films.



5. Conclusions

While movies are one of the most common forms of popular art and culture, little work has yet been done on quantitative analysis of the visual content of movies. Here we used comprehensive quantitative analysis to analyze and compare different movies in a quantitative fashion. The results show that the algorithm is able to associate a frame with the movie it is part of in accuracy far higher than mere chance, indicating the existence of visual consistency in the movies that are detectable by computer algorithms. It also shows that it can analyze and visualize the similarities between movies, in a way that allows to compare the visual content of movies in a quantitative manner. The method can be used to study influential links between films by using a quantitative approach that is not dependent on the subjective human perception, and therefore can augment film analysis by providing another form of analysis in addition to the human impression, which is often subjective and difficult to quantify.

Online databases such as IMDB and on-line streaming services can be used for objective comparison of movies based on genres, actor’s names and directors. These services can be used to categorize and recommend movies to their users, but in most cases that information is based on metadata rather than the visual content of the movies. The results shown in in this study provide evidence that movies can be categorized not just by metadata such as actors, year of production, genres, or directors, but also based on the visual content of the films.

That comparison showed that in many cases the algorithm grouped together movies that were created by the same director, even in cases where the genre, cinematographer, and actors were different. That provides quantitative evidence that movie directors have visual impact on their films in a way that makes them visually similar regardless of their topic or genre.

The study is focused on popular Hollywood movies aiming at larger audiences, rather than artistic movies created for a smaller number of viewers. Also, movies combine the visual information and the audio information of the film’s soundtrack. This study is focused on just the visual content of the movie, while it does not analyze the audio data. The results of these experiments provide an indication that the visual content created by film directors and cinematographers can be quantified, introducing a quantitative approach to the studying of film making. Instead of relying on the human impression of the movie, the approach proposed in this paper is based on quantitative analysis of the visual content of the movie. While the human analysis can reflect cognitive, emotional, and visual aspects of the movie perceived by the human brain, the computer analysis is not subjective, and can provide analysis of visual elements such as composition of colors or textures that are more difficult to measure and quantify manually.

The approach described in this study can be used as a methodology to study films, providing new a quantitative approach to the studying of the history of cinema. Such methods can join similar approaches used to study other fields of human creation such as art and music, in which the use of computers is more prevalent. The source code of the method is publicly available (Shamir, 2017). End of article


About the authors

Christopher May, Lawrence Technological University in Southfield, Michigan.

Lior Shamir is Associate Professor of Computer Science at Kansas State University in Manhattan, Kansas.
E-mail: lshamir [at] ksu [dot] edu



The study was funded in part by National Science Foundation grant IIS-1546079, and Howard Hughes Medical Institute grant 52008705, We would like to thank First Monday’s anonymous reviewers for their insightful comments that helped us to improve the manuscript.



V. Beard, 1994. “Popular culture and professional identity: Accountants in the movies,” Accounting, Organizations and Society, volume 19, number3, pp. 303–318.
doi: https://doi.org/10.1016/0361-3682(94)90038-8, accessed 20 May 2019.

J. Belton (editor), 1996. Movies and mass culture. New Brunswick, N.J.: Rutgers University Press.

C. Bishop, 2006. Pattern recognition and machine learning. Berlin: Springer-Verlag.

D. Bordwell, J. Staiger, and K. Thompson, 2003. “An excessively obvious cinema,” In: The classical Hollywood cinema: Film style and mode of production to 1960. Abingdon, Oxfordshire: Taylor and Francis, pp. 21–29.

S. Brutzer, B. Hoferlin, and G. Heidemann, 2011. “Evaluation of background subtraction techniques for video surveillance,” CVPR ’11: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1,937–1,944.
doi: https://doi.org/10.1109/CVPR.2011.5995508, accessed 20 May 2019.

A. Burcoff and L. Shamir, 2017. “Computer analysis of Pablo Picasso’s artistic style,” International Journal of Art, Culture and Design Technologies, volume 6, number 1, pp. 1–18.
doi: https://doi.org/10.4018/IJACDT.2017010101, accessed 20 May 2019.

J. Caughie (editor), 1981. Theories of authorship: A reader. Boston: Routledge & Kegan Paul in association with the British Film Institute.

T. Corrigan, 1991. A cinema without walls: Movies and culture after Vietnam. New Brunswick, N.J.: Rutgers University Press.

S. Debnath, N. Ganguly, and P. Mitra, 2008. “Feature weighting in content based recommendation system using social network analysis,” WWW ’08: Proceedings of the 17th International Conference on World Wide Web, pp. 1,041–1,042.
doi: https://doi.org/10.1145/1367497.1367646, accessed 20 May 2019.

M. Fatemi and L. Tokarchuk, 2012. “An empirical study on IMDb and its communities based on the network of co-reviewers,” MPM ’12: Proceedings of the First Workshop on Measurement, Privacy, and Mobility, article 7.
doi: https://doi.org/10.1145/2181196.2181203, accessed 20 May 2019.

J. Felsenstein, 1989. “PHYLIP — Phylogeny Inference Package (Version 3.2),rdquo; Cladistics, volume 5, number 2, pp. 164–166.

L. Gallos, F. Potiguar, J. Andrade, Jr., and H. Makse, 2013. “IMDB network revisited: Unveiling fractal and modular properties from a typical small-world network,” PLoS ONE, volume 8, number 6, e66443.
doi: https://doi.org/10.1371/journal.pone.0066443, accessed 20 May 2019.

H. Giroux, 2002. Breaking in to the movies: Film and the culture of politics. Malden, Mass.: Blackwell.

P. Hogan, 2008. Understanding Indian movies: Culture, cognition, and cinematic imagination. Austin: University of Texas Press.

J.-W. Hsieh, S.-H. Yu, Y.-S. Chen, and W.-F. Hu, 2006. “Automatic traffic surveillance system for vehicle tracking and classification,” IEEE Transactions on Intelligent Transportation Systems, volume 7, number 2, pp. 175–187.
doi: https://doi.org/10.1109/TITS.2006.874722, accessed 20 May 2019.

J. Jung, 2012. “Attribute selection-based recommendation framework for short-head user group: An empirical study by MovieLens and IMDB,” Expert Systems with Applications, volume 39, number 4, pp. 4,049–4,054.
doi: https://doi.org/10.1016/j.eswa.2011.09.096, accessed 20 May 2019.

P. Kael, 1996. Raising Kane and other essays. New York: M. Boyars.

D. Kipen, 2006. The Schreiber theory: A radical rewrite of American film history. Hoboken, N.J.: Melville House.

H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, 2011. “HMDB: A large video database for human motion recognition,” Proceedings of the 2011 IEEE International Conference on Computer Vision, pp. 2,556–2,563.
doi: https://doi.org/10.1109/ICCV.2011.6126543, accessed 20 May 2019.

E. Kuminski, J. George, J. Wallin, and L. Shamir,2014. “Combining human and machine learning for morphological analysis of galaxy images,” Publications of the Astronomical Society of the Pacific, volume 126, number 944, pp. 959–967.
doi: https://doi.org/10.1086/678977, accessed 20 May 2019.

D. Lamprecht, F. Geigl, T. Karas, S. Walk, D. Helic, and M. Strohmaier, 2015. “Improving recommender system navigability through diversification: A case study of IMDb,” i-KNOW ’15: Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business, article number 21.
doi: https://doi.org/10.1145/2809563.2809603, accessed 20 May 2019.

R. Maltby, 2003. Hollywood cinema. Second edition. Malden, Mass.: Blackwell.

P. Melville, R. Mooney, and R. Nagarajan, 2002. “Content-boosted collaborative filtering for improved recommendations,” Proceedings of the Eighteenth National Conference on Artificial Intelligence. pp. 187–192.

C. Metz, 1991. Film language: A semiotics of the cinema. Translated by M. Taylor. Chicago: University of Chicago Press.

J. Monaco, 2000. How to read a film: The world of movies, media, and multimedia: Language, history, theory. Third edition, completely revised and expanded. New York: Oxford University Press.

L. Mulvey, 1975. “Visual pleasure and narrative cinema,” Screen, volume 16, number 3, pp. 6–18.
doi: https://doi.org/10.1093/screen/16.3.6, accessed 20 May 2019.

S. Newman, 2008. “Making sense of Atlantic world histories: A British perspective,” Nuevo Mundo Mundos Nuevos, at https://journals.openedition.org/nuevomundo/42413, accessed 20 May 2019.
doi: https://doi.org/10.4000/nuevomundo.42413, accessed 20 May 2019.

A. Oghina, M. Breuss, M. Tsagkias, and M. de Rijke, 2012. “Predicting IMDB movie ratings using social media,” In: R. Baeza-Yates, A. de Vries, H. Zaragoza, B. Cambazoglu, V. Murdock, R. Lémpel, and F. Silvestri (editors). Advances in information retrieval. Lecture Notes in Computer Science, volume 7224. Berlin: Springer-Verlag, pp. 503–507.
doi: https://doi.org/10.1007/978-3-642-28997-2_51, accessed 20 May 2019.

N. Orlov, L. Shamir, T. Macura, J. Johnston, D. Eckley, and I. Goldberg, 2008. “WND-CHARM: Multi-purpose image classification using compound image transforms,” Pattern Recognition Letters, volume 29, number 11, pp. 1,684–1,693.
doi: https://doi.org/10.1016/j.patrec.2008.04.013, accessed 20 May 2019.

G. Sadoul, 1962. Histoire du cin&ecute;ma. Paris: Librairie Flammarion.

C. Santas, 2001. Responding to film: A text guide for students of cinema art. Chicago: Burnham Publishers.

L. Shamir, 2017. “UDAT: A multi-purpose data analysis tool,” Astrophysics Source Code Library, record ascl:1704.002.

L. Shamir, 2015. “What makes a Pollock Pollock: A machine vision approach,” International Journal of Arts and Technology, volume 8, number 1, pp. 1–10.
doi: https://doi.org/10.1504/IJART.2015.067389, accessed 20 May 2019.

L. Shamir, 2012a. “Automatic detection of peculiar galaxies in large datasets of galaxy images,” Journal of Computational Science, volume 3, number 3, pp. 181–189.
doi: https://doi.org/10.1016/j.jocs.2012.03.004, accessed 20 May 2019.

L. Shamir, 2012b. “Computer analysis reveals similarities between the artistic styles of van Gogh and Pollock,” Leonardo, volume 45, number 2, pp. 149–154.
doi: https://doi.org/10.1162/LEON_a_00281, accessed 20 May 2019.

L. Shamir, 2011. “Assessing the efficacy of low-level image content descriptors for computer-based fluorescence microscopy image analysis,” Journal of Microscopy, volume 243, number 3, pp. 284–292.
doi: https://doi.org/10.1111/j.1365-2818.2011.03502.x, accessed 20 May 2019.

L. Shamir, 2008. “Evaluation of face datasets as tools for assessing the performance of face recognition methods,” International Journal of Computer Vision, volume 79, number 3, pp. 225–230.
doi: https://doi.org/10.1007/s11263-008-0143-7, accessed 20 May 2019.

L. Shamir and J. Tarakhovsky, 2012. “Computer analysis of art,” Journal on Computing and Cultural Heritage, volume 5, number 2, article number 7.
doi: https://doi.org/10.1145/2307723.2307726, accessed 20 May 2019.

L. Shamir, J. Nissel, and E. Winner, 2016. “Distinguishing between abstract art by artists vs. children and animals: Comparison between human and machine perception,” ACM Transactions on Applied Perception, volume 13, number 3, article number 17.
doi: https://doi.org/10.1145/2912125, accessed 20 May 2019.

L. Shamir, A. Holincheck, and J. Wallin, 2013. “Automatic quantitative morphological analysis of interacting galaxies,” Astronomy and Computing, volume 2, pp. 67–73.
doi: https://doi.org/10.1016/j.ascom.2013.09.002, accessed 20 May 2019.

L. Shamir, N. Orlov, and I. Goldberg, 2009. “Evaluation of the informativeness of multi-order image transforms,” IPCV 2009: Proceedings of the 2009 International Conference on Image Processing, Computer Vision, & Pattern Recognition, pp. 37–42.

L. Shamir, T. Macura, N. Orlov, D. Eckley, and I. Goldberg, 2010. “Impressionism, expressionism, surrealism: Automated recognition of painters and schools of art,” ACM Transactions on Applied Perception, volume 7, number 2, article number 8.
doi: https://doi.org/10.1145/1670671.1670672, accessed 20 May 2019.

L. Shamir, N. Orlov, D. Eckley, T. Macura, J. Johnston, and I. Goldberg, 2008. “Wndchrm — An open source utility for biological image analysis,” Source Code for Biology and Medicine, volume 3, number 13.
doi: https://doi.org/10.1186/1751-0473-3-13, accessed 20 May 2019.

K. Soomro, A. Zamir, and M. Shah, 2012. “UCF101: A dataset of 101 human actions classes from videos in the wild,” arXiv (3 December), at https://arxiv.org/abs/1212.0402, accessed 20 May 2019.

D. Staples, 1966. “The auteur theory reexamined,” Cinema Journal, volume 6, pp. 1–7.
doi: https://doi.org/10.2307/1225411, accessed 20 May 2019.

C.-Y. Weng, W.-T. Chu, and J.-L. Wu, 2009. “RoleNet: Movie analysis from the perspective of social networks,” IEEE Transactions on Multimedia, volume 11, number 2, pp. 256–271.
doi: https://doi.org/10.1109/TMM.2008.2009684, accessed 20 May 2019.


Editorial history

Received 23 January 2019; revised 15 March 2019; accepted 19 May 2019.

Creative Commons License
This paper is licensed under a Creative Commons Attribution 4.0 International License.

A data science approach to movies and film director analysis
by Chris May and Lior Shamir.
First Monday, Volume 24, Number 6 - 3 June 2019
doi: http://dx.doi.org/10.5210/fm.v24i6.9629

A Great Cities Initiative of the University of Illinois at Chicago University Library.

© First Monday, 1995-2019. ISSN 1396-0466.