# Introducing Quasi-Experimental Plus/Minus (QEPM)

In this article, I introduce a variant of Plus/Minus which might be useful to basketball analysts (despite not being a metric).

## MEASURING THE VALUE OF A BASKETBALL PLAYER

The Holy Grail of sports analytics is player evaluation. From a managerial point of view, being able to assess the real value of players allows one to avoid those who are overvalued and to recruit those who are undervalued.

A player’s worth is usually assessed based on subjective judgment (including common knowledge) and his individual statistics. However, one can reasonably doubt that these two components capture the real value of a player (this is particularly true since the Moneyball story). The usual individual statistics (first-serve percentage in tennis, batting average in baseball, goals scored in soccer, etc.) are somehow thought to reflect the value of players but they do not reflect their overall value. Therefore, there are some statisticians looking for new stats that would reveal the true value of players.

Basketball is no exception. Everybody knows that the basic individual stats (points, rebounds, assists, steals, blocks, turnovers, shot attempts) do not capture the overall value of a player, especially his defensive contribution. Several stats have been proposed that are supposed to reflect the overall value of players.

The first one is the NBA’s efficiency measure which combines the basic individual stats into a single number:

``(points + rebounds + assists + steals + blocks) – [(field goals attempted − field goals made) + (free throws attempted − free throws attempted) + turnovers)]``

The second stat is the player efficiency rating (PER) proposed by ESPN analyst John Hollinger. This is also an all-in-one measure but somewhat more elaborated than the NBA’s efficiency measure (see here for the formula).

The third stat is Plus/Minus which we are going to consider in detail.

## THE PLUS/MINUS APPROACH

The core idea of the Plus/Minus approach is that the overall value of a player is his effect on his team’s performance. How to measure the effect of a player on his team’s performance? One has to compare the team’s performance when the player is on the court to the team’s performance when he is off the court. This is basic experimental science: if you want to test the effect of X on Y, you vary X and observe how this variation affects Y. So experimentally speaking, we would say that the target player is the independent variable while his team’s performance is the dependent variable.

There are two main Plus/Minus stats: Net Plus/Minus (NPM) and Adjusted Plus/Minus (APM).

Net Plus/Minus was defined by Roland Beech (who launched 82games during the 2002-03 season). The calculation of NPM is straightforward. There are two equivalent ways to compute NPM. Let’s consider the following example. The table below shows the points scored and allowed by a given player’s team when he was on the court and when he was off the court:

Player on the courtPlayer off the court
Points scored (per 100 possessions)11598
Points allowed (per 100 possessions)110105

First manner: NPM is the difference between the point differential (points scored – points allowed) when the player is on the court and the point differential when he is off the court.

NPM = (115 – 110) – (98 – 105) = 5 – (– 7) = 12

Second manner: NPM is the difference between a) points scored when the player is on the court minus points scored when he is off the court (Net Offensive Plus/Minus) and b) points allowed when the player is on the court minus points allowed when he is off the court (Net Defensive Plus/Minus).

NPM = (115 – 98) – (110 – 105) = 17 – (5) = 12

The more a player contributes to his team’s performance (offensively and defensively), the higher his NPM. In this example, the player’s team does better when he is on the court than when he is not.

It is easy to see that Net Plus/Minus faces a strong limitation. Indeed, comparing a team’s performance when a player is on the court to the performance when the player is off the court is fundamentally flawed if the teammates and opponents are not the same in both conditions. For instance, a player who is surrounded by good teammates when he is on court will get a higher NPM than a player who is often surrounded by bad teammates (all other things being equal).

Experimentally speaking, if you want to test the effect of X on Y, the conditions that you compare must differ only regarding X. Indeed, if the conditions also differ on Z, any observed variation on Y could be attributed either to X or Z (which is then called a confounding variable). If one wants to draw a valid causal inference, conditions must differ on X all other things being equal.

Adjusted Plus/Minus is Plus/Minus when controlling for teammates and opponents. This is done through a common statistical method called multiple regression. APM has been proposed by Dan Rosenbaum in this 2004 article. Eli Witus (Basketball Operations Analyst for the Houston Rockets) has described precisely how to compute APM here. I have also described how to compute NPM and APM in this article on my blog.

## QUASI-EXPERIMENTAL PLUS/MINUS

As mentioned above, the main issue regarding Plus/Minus is to control for teammates and opponents. In APM, this issue is addressed statistically through multiple regression. With QEPM, I propose to address this issue experimentally (more precisely, quasi-experimentally). What does it mean?

The goal is to compare how a team performs when a given player is on the court (condition A) to how the team performs when he is off the court (condition B), teammates and opponents being the same in both conditions. In other words, we want to compare a given player to another player of the same team, all other things being equal. Therefore, the question is: in a NBA match, are there couples of game segments that differ only regarding two players of the same team, all other things (teammates and opponents) being equal? A typical NBA game includes about 30 game segments. In certain games, we can actually find such couples of game segments.

Let’s take a look at the raw data file that allows us to identify these couples of game segments. The data file we are going to consider in this example is relative to the 2007-08 season and it shows all games segments of the 1230 games that were played during this season. The file (and those of other seasons) can be found on BasketballValue (see below “Calculating Quasi-Experimental Plus/Minus”)

Here is a sample of the file:

GameIDStartTimeEndTimeElapsedTimeElapsedSecsHomePlayer1IDHomePlayer2IDHomePlayer3IDHomePlayer4IDHomePlayer5IDAwayPlayer1IDAwayPlayer2IDAwayPlayer3IDAwayPlayer4IDAwayPlayer5IDHomePlayer1NameHomePlayer2NameHomePlayer3NameHomePlayer4NameHomePlayer5NameAwayPlayer1NameAwayPlayer2NameAwayPlayer3NameAwayPlayer4NameAwayPlayer5NameStartScoreHomeStartScoreAwayEndScoreHomeEndScoreAwayPointsScoredHomePointsScoredAwayPlusMinusHomePlusMinusAwayPossessionsHomePossessionsAwayOffensiveRtgHomeOffensiveRtgAwayOverallRtgHomevsAwayOverallRtgAwayvsHomeOffensiveReboundsHomeOffensiveReboundsAwayDefensiveReboundsHomeDefensiveReboundsAwayOffensiveReboundingRateHomeOffensiveReboundingRateAwayDefensiveReboundingRateHomeDefensiveReboundingRateAway
20071129DENLAL00:15:1300:12:5200:02:211411865751875161932523463663BryantFarmarOdomTuriafVujacicAnthonyCambyIversonNajeraSmith74678173761-15514012020-2010020.333333NULLNULL0.666667
20071129DENLAL00:17:4300:15:1300:02:301501861941031874592523463663BryantBynumFisherOdomWaltonAnthonyCambyIversonNajeraSmith7462746705-55440125-12512521240.3333330.3333330.6666670.666667
20071129DENLAL00:24:0000:17:4300:06:1737718619410318745925232984621BryantBynumFisherOdomWaltonAnthonyCambyCarterIversonMartin5557746219514-141212158.33341.6667116.667-116.66714630.250.40.60.75

The sample shows the first 13 game segments of a Lakers vs. Nuggets game that took place November 29, 2007 (that game has 24 game segments in total). Basically, each row represents of game segment. For each game segment, there are three main information: the players that were on the court, the points scored and allowed by each team, and the number of possessions of each team. Thus, one can calculate the per-possession Plus/Minus of each lineup for each game segment.

Let’s see if we can find couples of game segments that allow for a comparison all other things being equal. For instance, consider the two following game segments:

5th game segment (5:04 to 6:18)
home players: Farmar, Radmanovic, Turiaf, Vujacic, Walton
away players: Camby, Iverson, Jones, Kleiza, Smith

11th game segment (12:52 to 15:13)
home players: Bryant, Farmar, Odom, Turiaf, Vujacic
away players: Anthony, Camby, Iverson, Najera, Smith

Regarding the home players, there are two differences between the lineups: Radmanovic/Odom and Walton/Bryant. And regarding the away players, there are also two differences between the lineups: Jones/Anthony and Kleiza/Najera. So the comparison of these two game segments is not “all other things being equal”.

Now consider the two following game segments:

6th game segment (6:32 to 8:29)
home players: Farmar, Radmanovic, Turiaf, Vujacic, Walton
away players: Anthony, Iverson, Kleiza, Najera, Smith

10th game segment (12:00 to 12:52)
home players: Bryant, Farmar, Radmanovic, Turiaf, Vujacic
away players: Anthony, Iverson, Kleiza, Najera, Smith

There is only one difference between the home players lineups (Walton/Bryant) while the away players lineups are identical. Therefore, this comparison is valid because it involves only one difference, all other things being equal. There is another valid comparison, between the 9th game segment (10:31 to 12:00) and the 10th game segment (12:00 to 12:52): the only difference is also Walton/Bryant.

In summary, the principle of QEPM is to find, for a given player over a season, all couples of game segments that allow for comparisons “all other things being equal”. Such an approach is called a quasi-experiment.

In an experiment, one tests whether X has a causal influence on Y by comparing conditions that differ only on X. Crucially, subjects are randomly assigned to the conditions. Random assignment ensures that the conditions eventually differ only on X. This is what is basically done when the effect of a new drug is tested (a type of study called randomized controlled trial). Participants are randomly assigned to the experimental (drug) or control (placebo) conditions and eventually, the two groups are compared on one or several criteria.

But there are cases in which the random assignment to conditions is not feasible (for practical or ethical reasons). For instance, imagine that we want to test whether the sexual orientation of parents influences children’s personality. Basically, we compare two conditions: children being raised by a mother and a father on one hand and children being raised by parents of the same sex on the other hand. Obviously, it would not be ethical to randomly assign newborns to these two conditions and assess their personality thereafter.

In such cases, a typical procedure is to select a group of children being raised by a mother and a father and a group of children being raised by parents of the same sex. Because of the lack of random assignment, the two groups might differ on other features than the sexual orientation of parents (age, socioeconomic status, etc.). Some of these confounding variables could account for any observed difference regarding children’s personality. Therefore, the two groups must be made so that they are matched on several important features: one wants to be as close as possible to a comparison “all other things being equal”.

Such a study is called a quasi-experiment: as an experiment, its goal is to test a causal relationship between two variables, but unlike an experiment, it lacks random assignment so that one cannot be entirely sure that any difference on the dependent variable is truly caused by the independent variable.

## CALCULATING QUASI-EXPERIMENTAL PLUS/MINUS

I have created an algorithm that implements QEPM. The algorithm is a function coded in R. Basically, it takes a player’s ID as input, scans the raw data file, and provides a data frame as output which shows all the comparisons “all other things being equal” for that player.

The raw data file can be downloaded on BasketballValue. Go to: “Downloads”, “2007-2008” (for instance), and download the folder “List of each matchup of one unit against another”. In the folder, the file is “matchups20072008reg20081211.text”. Open the file and save it in csv format. Here is the file: matchups20072008reg20081211

Here are the first 10 rows of the file (content is described just above):

GameIDStartTimeEndTimeElapsedTimeElapsedSecsHomePlayer1IDHomePlayer2IDHomePlayer3IDHomePlayer4IDHomePlayer5IDAwayPlayer1IDAwayPlayer2IDAwayPlayer3IDAwayPlayer4IDAwayPlayer5IDHomePlayer1NameHomePlayer2NameHomePlayer3NameHomePlayer4NameHomePlayer5NameAwayPlayer1NameAwayPlayer2NameAwayPlayer3NameAwayPlayer4NameAwayPlayer5NameStartScoreHomeStartScoreAwayEndScoreHomeEndScoreAwayPointsScoredHomePointsScoredAwayPlusMinusHomePlusMinusAwayPossessionsHomePossessionsAwayOffensiveRtgHomeOffensiveRtgAwayOverallRtgHomevsAwayOverallRtgAwayvsHomeOffensiveReboundsHomeOffensiveReboundsAwayDefensiveReboundsHomeDefensiveReboundsAwayOffensiveReboundingRateHomeOffensiveReboundingRateAwayDefensiveReboundingRateHomeDefensiveReboundingRateAway

First, load the file in R. When you execute the command below, a box opens and you must select the file “matchups20072008reg20081211.csv” We rename the file “d”:

```d <- read.csv2 (file.choose())
```

Second, let’s work the file just a little bit: we recode the variable “GameID” and we number the games.

```nrow_d <- nrow(d)
nrow_d

for (i in 1:nrow_d) {
d\$Date[i] <- substr(d\$GameID[i], start = 1, stop = 8)
d\$AwayTeam[i] <- substr(d\$GameID[i], start = 9, stop = 11)
d\$HomeTeam[i] <- substr(d\$GameID[i], start = 12, stop = 14)
d\$GameName[i] <- toString(c(d\$Date[i], d\$AwayTeam[i], d\$HomeTeam[i]))
}

d <- d[,-which(names(d) %in% c("GameID"))]

##ON NUMEROTE LES GAMES
d\$GameID <- rep(c(0), nrow(d))
d\$GameID[1] <- 1

for(i in 2:nrow_d)
if(d\$GameName[i] == d\$GameName[i-1]) {d\$GameID[i] <- d\$GameID[i-1]} else {d\$GameID[i] <- d\$GameID[i-1]+1}

##ON SAUVEGARDE LE FICHIER BRUT COMPORTANT CES MISES EN FORME
save(d, file = "d.RData")
```

Now the file looks like this (first 10 rows):

StartTimeEndTimeElapsedTimeElapsedSecsHomePlayer1IDHomePlayer2IDHomePlayer3IDHomePlayer4IDHomePlayer5IDAwayPlayer1IDAwayPlayer2IDAwayPlayer3IDAwayPlayer4IDAwayPlayer5IDHomePlayer1NameHomePlayer2NameHomePlayer3NameHomePlayer4NameHomePlayer5NameAwayPlayer1NameAwayPlayer2NameAwayPlayer3NameAwayPlayer4NameAwayPlayer5NameStartScoreHomeStartScoreAwayEndScoreHomeEndScoreAwayPointsScoredHomePointsScoredAwayPlusMinusHomePlusMinusAwayPossessionsHomePossessionsAwayOffensiveRtgHomeOffensiveRtgAwayOverallRtgHomevsAwayOverallRtgAwayvsHomeOffensiveReboundsHomeOffensiveReboundsAwayDefensiveReboundsHomeDefensiveReboundsAwayOffensiveReboundingRateHomeOffensiveReboundingRateAwayDefensiveReboundingRateHomeDefensiveReboundingRateAwayDateAwayTeamHomeTeamGameNameGameID

Here is the algorithm that computes QEPM (annotations are in French, sorry…). It is a function in R called “qepm”. Note that you must install the two R packages “psych” and “prettyR”.

```
rm(list=ls())

library(psych)
library(prettyR)

############################## FUNCTION START #############################
qepm <- function(x){

####################ON SELECTIONNE LES GAMES DANS LESQUELS LE TP A JOUE

game_ID_d <- unique(d\$GameID)
game_ID_d

d\$selection <- rep(c(0), nrow(d))
d

for (i in 1:length(game_ID_d))
if(x %in% d[d\$GameID == i,]\$HomePlayer1ID == TRUE | x %in% d[d\$GameID == i,]\$HomePlayer2ID == TRUE | x %in% d[d\$GameID == i,]\$HomePlayer3ID == TRUE | x %in% d[d\$GameID == i,]\$HomePlayer4ID == TRUE | x %in% d[d\$GameID == i,]\$HomePlayer5ID == TRUE | x %in% d[d\$GameID == i,]\$AwayPlayer1ID == TRUE | x %in% d[d\$GameID == i,]\$AwayPlayer2ID == TRUE | x %in% d[d\$GameID == i,]\$AwayPlayer3ID == TRUE | x %in% d[d\$GameID == i,]\$AwayPlayer4ID == TRUE | x %in% d[d\$GameID == i,]\$AwayPlayer5ID == TRUE) {d[d\$GameID == i,]\$selection <- rep(c(1), length(d[d\$GameID == i,]\$GameID))} else {d[d\$GameID == i,]\$selection <- rep(c(0), length(d[d\$GameID == i,]\$GameID))}

d

##
d2 <- subset(d, d\$selection == 1)
d2

###################################################################################

####################ON FAIT EN SORTE QUE: UN GAME = UN OBJET
game_ID <- unique(d2\$GameID)
game_ID

length(game_ID) ## Games played by TP

data_raw = list()
for (i in 1:length(game_ID)){
data_raw[[i]] <- d2[d2\$GameID == game_ID[i],]
}

####################MAIN LOOP

result <- list()

for (k in 1:length(game_ID)) {
#for (k in 1:1) {

result[[k]] <- list()

####################ON DETERMINE SI L'EQUIPE DU TP EST "Home" ou "Away"
TP_home_away <- ifelse(x %in% data_raw[[k]]\$HomePlayer1ID == TRUE | x %in% data_raw[[k]]\$HomePlayer2ID == TRUE | x %in% data_raw[[k]]\$HomePlayer3ID == TRUE | x %in% data_raw[[k]]\$HomePlayer4ID == TRUE | x %in% data_raw[[k]]\$HomePlayer5ID == TRUE, "Home", "Away")

team_name <- if(TP_home_away == "Home") {as.character(unique(data_raw[[k]]\$HomeTeam))} else {as.character(unique(data_raw[[k]]\$AwayTeam))}
opp_name <- if(TP_home_away == "Home") {as.character(unique(data_raw[[k]]\$AwayTeam))} else {as.character(unique(data_raw[[k]]\$HomeTeam))}

date <- as.character(unique(data_raw[[k]]\$Date))

gameID <- as.character(unique(data_raw[[k]]\$GameID))

####################ON IDENTIFIE LES DIFFERENTES LINEUPS

##LINEUPS DE L'EQUIPE DU TP ET DU CP
nrow_data_raw <- nrow(data_raw[[k]])

for (i in 1:nrow_data_raw) {
if(TP_home_away == "Home") {data_raw[[k]]\$T_l_h[i] <- as.numeric(paste(as.character(sort(c(data_raw[[k]]\$HomePlayer1ID[i],data_raw[[k]]\$HomePlayer2ID[i],data_raw[[k]]\$HomePlayer3ID[i],data_raw[[k]]\$HomePlayer4ID[i],data_raw[[k]]\$HomePlayer5ID[i]))), collapse = ''))} else {data_raw[[k]]\$T_l_h[i] <- as.numeric(paste(as.character(sort(c(data_raw[[k]]\$AwayPlayer1ID[i],data_raw[[k]]\$AwayPlayer2ID[i],data_raw[[k]]\$AwayPlayer3ID[i],data_raw[[k]]\$AwayPlayer4ID[i],data_raw[[k]]\$AwayPlayer5ID[i]))), collapse = ''))}

##LINEUPS DE L'EQUIPE EN FACE
if(TP_home_away == "Away") {data_raw[[k]]\$O_l_h[i] <- as.numeric(paste(as.character(sort(c(data_raw[[k]]\$HomePlayer1ID[i],data_raw[[k]]\$HomePlayer2ID[i],data_raw[[k]]\$HomePlayer3ID[i],data_raw[[k]]\$HomePlayer4ID[i],data_raw[[k]]\$HomePlayer5ID[i]))), collapse = ''))} else {data_raw[[k]]\$O_l_h[i] <- as.numeric(paste(as.character(sort(c(data_raw[[k]]\$AwayPlayer1ID[i],data_raw[[k]]\$AwayPlayer2ID[i],data_raw[[k]]\$AwayPlayer3ID[i],data_raw[[k]]\$AwayPlayer4ID[i],data_raw[[k]]\$AwayPlayer5ID[i]))), collapse = ''))}

}

################ON NUMEROTE LES LINEUPS
T_lineups <- cbind(data.frame(data_raw[[k]]\$T_l_h), class=as.numeric(as.factor(do.call(paste, data.frame(data_raw[[k]]\$T_l_h)))))

O_lineups <- cbind(data.frame(data_raw[[k]]\$O_l_h), class=as.numeric(as.factor(do.call(paste, data.frame(data_raw[[k]]\$O_l_h)))))
O_lineups

data_raw[[k]]\$T_l <- T_lineups\$class

data_raw[[k]]\$O_l <- O_lineups\$class

################ON IDENTIFIE LES LINEUPS UNIQUES
T_l_unique_raw <- data_raw[[k]][!duplicated(data_raw[[k]]\$T_l),]
T_l_unique_raw

O_l_unique_raw <- data_raw[[k]][!duplicated(data_raw[[k]]\$O_l),]
O_l_unique_raw

T_l_unique <- T_l_unique_raw[order(T_l_unique_raw\$T_l),]
T_l_unique

O_l_unique <- O_l_unique_raw[order(O_l_unique_raw\$O_l),]
O_l_unique

for (i in 1:nrow_data_raw)
data_raw[[k]]\$c1[i] <- toString(c(data_raw[[k]]\$T_l[i],data_raw[[k]]\$O_l[i]))

c1_id <- cbind(data.frame(data_raw[[k]]\$c1), class=as.numeric(as.factor(do.call(paste, data.frame(data_raw[[k]]\$c1)))))
c1_id

data_raw[[k]]\$c2 <- c1_id\$class

for (i in 1:nrow_data_raw) {
data_raw[[k]]\$occ[i] <- ifelse(length(which(data_raw[[k]]\$c2 %in% data_raw[[k]]\$c2[i])) == 1, 1, which(which(data_raw[[k]]\$c2 %in% data_raw[[k]]\$c2[i]) %in% i))

data_raw[[k]]\$c4[i] <- toString(c(data_raw[[k]]\$T_l[i],data_raw[[k]]\$O_l[i],data_raw[[k]]\$occ[i]))
}

################ON GENERE TOUTES LES COMPARAISONS DEUX A DEUX
comp_all_raw <- data.frame(t(combn(data_raw[[k]]\$c4, 2)))
comp_all_raw

comp_all_raw\$X1 <- as.character(comp_all_raw\$X1)
comp_all_raw\$X2 <- as.character(comp_all_raw\$X2)

nrow_comp_all_raw <- nrow(comp_all_raw)
nrow_comp_all_raw

for(i in 1:nrow_comp_all_raw) {
comp_all_raw\$PA_T_l[i] <- (strsplit(comp_all_raw\$X1, ", ")[[i]])[1]

comp_all_raw\$PB_T_l[i] <- (strsplit(comp_all_raw\$X2, ", ")[[i]])[1]

comp_all_raw\$PA_O_l[i] <- (strsplit(comp_all_raw\$X1, ", ")[[i]])[2]

comp_all_raw\$PB_O_l[i] <- (strsplit(comp_all_raw\$X2, ", ")[[i]])[2]

comp_all_raw\$PA_T_l_O_l_occ[i] <- (strsplit(comp_all_raw\$X1, ", ")[[i]])[3]

comp_all_raw\$PB_T_l_O_l_occ[i] <- (strsplit(comp_all_raw\$X2, ", ")[[i]])[3]

}

comp_all_raw

###########ON RETIRE LES COMPARAISONS NON PERTINENTES (i.e. comparaisons de lineups identiques: PA_T_l == PB_T_l)
comp_all_raw\$h <- ifelse(comp_all_raw\$PA_T_l == comp_all_raw\$PB_T_l, 1, 0)
comp_all_raw

comp_all <- subset(comp_all_raw, comp_all_raw\$h != 1)
comp_all

nrow_comp_all <- nrow(comp_all)
nrow_comp_all

comp_all\$PA_T_l <- as.numeric(comp_all\$PA_T_l)
comp_all\$PB_T_l <- as.numeric(comp_all\$PB_T_l)
comp_all\$PA_O_l <- as.numeric(comp_all\$PA_O_l)
comp_all\$PB_O_l <- as.numeric(comp_all\$PB_O_l)

################ON IDENTIFIE LES JOUEURS DES LINEUPS UNIQUES
T_l_unique_players <- if(TP_home_away == "Home") {T_l_unique[,c('HomePlayer1ID','HomePlayer2ID','HomePlayer3ID','HomePlayer4ID','HomePlayer5ID')]} else {T_l_unique[,c('AwayPlayer1ID','AwayPlayer2ID','AwayPlayer3ID','AwayPlayer4ID','AwayPlayer5ID')]}
T_l_unique_players

O_l_unique_players <- if(TP_home_away == "Away") {O_l_unique[,c('HomePlayer1ID','HomePlayer2ID','HomePlayer3ID','HomePlayer4ID','HomePlayer5ID')]} else {O_l_unique[,c('AwayPlayer1ID','AwayPlayer2ID','AwayPlayer3ID','AwayPlayer4ID','AwayPlayer5ID')]}
O_l_unique_players

T_l_unique_players_transp <- data.frame(t(T_l_unique_players))
T_l_unique_players_transp

O_l_unique_players_transp <- data.frame(t(O_l_unique_players))
O_l_unique_players_transp

for(i in 1:nrow_comp_all) {
comp_all\$PA_T_l_players[i] <- toString(T_l_unique_players_transp[,comp_all\$PA_T_l[i]])

comp_all\$PB_T_l_players[i] <- toString(T_l_unique_players_transp[,comp_all\$PB_T_l[i]])

comp_all\$PA_O_l_players[i] <- toString(O_l_unique_players_transp[,comp_all\$PA_O_l[i]])

comp_all\$PB_O_l_players[i] <- toString(O_l_unique_players_transp[,comp_all\$PB_O_l[i]])
}

comp_all

################ON IDENTIFIE LES JOUEURS QUI DIFFERENT ENTRE LES LINEUPS (DE L'EQUIPE DU TP)

##
outersect <- function(x, y) {
sort(c(setdiff(x, y),
setdiff(y, x)))
}
##

for(i in 1:nrow_comp_all) {
comp_all\$T_players_differing[i] <- toString(outersect(T_l_unique_players_transp[,comp_all\$PA_T_l[i]], T_l_unique_players_transp[,comp_all\$PB_T_l[i]]))

comp_all\$n_differing[i] <- length(strsplit(comp_all\$T_players_differing[i], ", ")[[1]])
}

comp_all

################ON SELECTIONNE LES COMPARAISONS "1" (SEULS DEUX JOUEURS DIFFERENT ENTRE LES LINEUPS)
comp_2 <- subset(comp_all, comp_all\$n_differing == 2)
comp_2

nrow_comp_2 <- nrow(comp_2)
nrow_comp_2

################ON IDENTIFIE LES DEUX JOUEURS DANS CHAQUE COMPARAISON
for(i in 1:nrow_comp_2) {
comp_2\$player_A[i] <- (strsplit(comp_2\$T_players_differing, ", ")[[i]])[1]

comp_2\$player_B[i] <- (strsplit(comp_2\$T_players_differing, ", ")[[i]])[2]
}

comp_2

comp_2\$player_A <- as.numeric(comp_2\$player_A)
comp_2\$player_B <- as.numeric(comp_2\$player_B)

################ON SELECTIONNE LES COMPARAISONS QUI IMPLIQUENT LE TP
comp_2\$h2 <- ifelse(comp_2\$player_A == x | comp_2\$player_B == x, 1 ,0)
comp_2

comp_2_TP <- subset(comp_2, comp_2\$h2 == 1)
comp_2_TP

nrow_comp_2_TP <- nrow(comp_2_TP)
nrow_comp_2_TP

if(nrow_comp_2_TP == 0) {result[[k]] <- 0} else {

################ON IDENTIFIE LE CP DANS CHAQUE COMPARAISON
comp_2_TP\$TP_ID <- rep(c(x), nrow_comp_2_TP)

comp_2_TP\$CP_ID <- ifelse(comp_2_TP\$player_A == x, comp_2_TP\$player_B, comp_2_TP\$player_A)

comp_2_TP

################ON "SORT" LES JOUEURS DES QUATRE LINEUPS IMPLIQUEES DANS CHAQUE COMPARAISON

for(i in 1:nrow_comp_2_TP) {

##PA_T_l_players
comp_2_TP\$PA_T_l_p1[i] <- strsplit(comp_2_TP\$PA_T_l_players, ", ")[[i]][1]
comp_2_TP\$PA_T_l_p2[i] <- strsplit(comp_2_TP\$PA_T_l_players, ", ")[[i]][2]
comp_2_TP\$PA_T_l_p3[i] <- strsplit(comp_2_TP\$PA_T_l_players, ", ")[[i]][3]
comp_2_TP\$PA_T_l_p4[i] <- strsplit(comp_2_TP\$PA_T_l_players, ", ")[[i]][4]
comp_2_TP\$PA_T_l_p5[i] <- strsplit(comp_2_TP\$PA_T_l_players, ", ")[[i]][5]

##PB_T_l_players
comp_2_TP\$PB_T_l_p1[i] <- strsplit(comp_2_TP\$PB_T_l_players, ", ")[[i]][1]
comp_2_TP\$PB_T_l_p2[i] <- strsplit(comp_2_TP\$PB_T_l_players, ", ")[[i]][2]
comp_2_TP\$PB_T_l_p3[i] <- strsplit(comp_2_TP\$PB_T_l_players, ", ")[[i]][3]
comp_2_TP\$PB_T_l_p4[i] <- strsplit(comp_2_TP\$PB_T_l_players, ", ")[[i]][4]
comp_2_TP\$PB_T_l_p5[i] <- strsplit(comp_2_TP\$PB_T_l_players, ", ")[[i]][5]

##PA_O_l_players
comp_2_TP\$PA_O_l_p1[i] <- strsplit(comp_2_TP\$PA_O_l_players, ", ")[[i]][1]
comp_2_TP\$PA_O_l_p2[i] <- strsplit(comp_2_TP\$PA_O_l_players, ", ")[[i]][2]
comp_2_TP\$PA_O_l_p3[i] <- strsplit(comp_2_TP\$PA_O_l_players, ", ")[[i]][3]
comp_2_TP\$PA_O_l_p4[i] <- strsplit(comp_2_TP\$PA_O_l_players, ", ")[[i]][4]
comp_2_TP\$PA_O_l_p5[i] <- strsplit(comp_2_TP\$PA_O_l_players, ", ")[[i]][5]

##PB_O_l_players
comp_2_TP\$PB_O_l_p1[i] <- strsplit(comp_2_TP\$PB_O_l_players, ", ")[[i]][1]
comp_2_TP\$PB_O_l_p2[i] <- strsplit(comp_2_TP\$PB_O_l_players, ", ")[[i]][2]
comp_2_TP\$PB_O_l_p3[i] <- strsplit(comp_2_TP\$PB_O_l_players, ", ")[[i]][3]
comp_2_TP\$PB_O_l_p4[i] <- strsplit(comp_2_TP\$PB_O_l_players, ", ")[[i]][4]
comp_2_TP\$PB_O_l_p5[i] <- strsplit(comp_2_TP\$PB_O_l_players, ", ")[[i]][5]
}

comp_2_TP

comp_2_TP\$PA_T_l_p1 <- as.numeric(comp_2_TP\$PA_T_l_p1)
comp_2_TP\$PA_T_l_p2 <- as.numeric(comp_2_TP\$PA_T_l_p2)
comp_2_TP\$PA_T_l_p3 <- as.numeric(comp_2_TP\$PA_T_l_p3)
comp_2_TP\$PA_T_l_p4 <- as.numeric(comp_2_TP\$PA_T_l_p4)
comp_2_TP\$PA_T_l_p5 <- as.numeric(comp_2_TP\$PA_T_l_p5)

comp_2_TP\$PB_T_l_p1 <- as.numeric(comp_2_TP\$PB_T_l_p1)
comp_2_TP\$PB_T_l_p2 <- as.numeric(comp_2_TP\$PB_T_l_p2)
comp_2_TP\$PB_T_l_p3 <- as.numeric(comp_2_TP\$PB_T_l_p3)
comp_2_TP\$PB_T_l_p4 <- as.numeric(comp_2_TP\$PB_T_l_p4)
comp_2_TP\$PB_T_l_p5 <- as.numeric(comp_2_TP\$PB_T_l_p5)

comp_2_TP\$PA_O_l_p1 <- as.numeric(comp_2_TP\$PA_O_l_p1)
comp_2_TP\$PA_O_l_p2 <- as.numeric(comp_2_TP\$PA_O_l_p2)
comp_2_TP\$PA_O_l_p3 <- as.numeric(comp_2_TP\$PA_O_l_p3)
comp_2_TP\$PA_O_l_p4 <- as.numeric(comp_2_TP\$PA_O_l_p4)
comp_2_TP\$PA_O_l_p5 <- as.numeric(comp_2_TP\$PA_O_l_p5)

comp_2_TP\$PB_O_l_p1 <- as.numeric(comp_2_TP\$PB_O_l_p1)
comp_2_TP\$PB_O_l_p2 <- as.numeric(comp_2_TP\$PB_O_l_p2)
comp_2_TP\$PB_O_l_p3 <- as.numeric(comp_2_TP\$PB_O_l_p3)
comp_2_TP\$PB_O_l_p4 <- as.numeric(comp_2_TP\$PB_O_l_p4)
comp_2_TP\$PB_O_l_p5 <- as.numeric(comp_2_TP\$PB_O_l_p5)

################ON ASSOCIE A "TP_ID" et "CP_ID" LES JOUEURS DE LEUR LINEUP ET CEUX DE LA LINEUP EN FACE
comp_2_TP\$TP_T_l_players <- ifelse(comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p1 | comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p2 | comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p3 | comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p4 | comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p5, comp_2_TP\$PA_T_l_players, comp_2_TP\$PB_T_l_players)

comp_2_TP\$CP_T_l_players <- ifelse(comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p1 | comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p2 | comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p3 | comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p4 | comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p5, comp_2_TP\$PA_T_l_players, comp_2_TP\$PB_T_l_players)

comp_2_TP\$TP_O_l_players <- ifelse(comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p1 | comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p2 | comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p3 | comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p4 | comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p5, comp_2_TP\$PA_O_l_players, comp_2_TP\$PB_O_l_players)

comp_2_TP\$CP_O_l_players <- ifelse(comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p1 | comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p2 | comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p3 | comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p4 | comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p5, comp_2_TP\$PA_O_l_players, comp_2_TP\$PB_O_l_players)

comp_2_TP

comp_2_TP\$TP_T_l <- ifelse(comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p1 | comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p2 | comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p3 | comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p4 | comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p5, comp_2_TP\$PA_T_l, comp_2_TP\$PB_T_l)

comp_2_TP\$CP_T_l <- ifelse(comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p1 | comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p2 | comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p3 | comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p4 | comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p5, comp_2_TP\$PA_T_l, comp_2_TP\$PB_T_l)

comp_2_TP\$TP_O_l <- ifelse(comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p1 | comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p2 | comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p3 | comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p4 | comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p5, comp_2_TP\$PA_O_l, comp_2_TP\$PB_O_l)

comp_2_TP\$CP_O_l <- ifelse(comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p1 | comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p2 | comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p3 | comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p4 | comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p5, comp_2_TP\$PA_O_l, comp_2_TP\$PB_O_l)

comp_2_TP

comp_2_TP\$TP_T_l_O_l_occ <- ifelse(comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p1 | comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p2 | comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p3 | comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p4 | comp_2_TP\$TP_ID == comp_2_TP\$PA_T_l_p5, comp_2_TP\$PA_T_l_O_l_occ, comp_2_TP\$PB_T_l_O_l_occ)

comp_2_TP\$CP_T_l_O_l_occ <- ifelse(comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p1 | comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p2 | comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p3 | comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p4 | comp_2_TP\$CP_ID == comp_2_TP\$PA_T_l_p5, comp_2_TP\$PA_T_l_O_l_occ, comp_2_TP\$PB_T_l_O_l_occ)

comp_2_TP

################DANS LE TABLEAU DES COMPARAISONS ("comp_2_TP"), ON CREE LES COLONNES DANS QUI VONT PERMETTRE LES COMPARAISONS
TP_raw <- data_raw[[k]]
TP_raw

nrow_TP_raw <- nrow(TP_raw)
nrow_TP_raw

ncol_TP_raw <- ncol(TP_raw)
ncol_TP_raw

for(i in 1:nrow_comp_2_TP)
for(j in 1:nrow_TP_raw)
TP_raw[j,i+ncol_TP_raw] <- ifelse((TP_raw\$T_l[j] == comp_2_TP\$TP_T_l[i] & TP_raw\$O_l[j] == comp_2_TP\$TP_O_l[i] & TP_raw\$occ[j] == comp_2_TP\$TP_T_l_O_l_occ[i]), comp_2_TP\$TP_ID[i], ifelse((TP_raw\$T_l[j] == comp_2_TP\$CP_T_l[i] & TP_raw\$O_l[j] == comp_2_TP\$CP_O_l[i] & TP_raw\$occ[j] == comp_2_TP\$CP_T_l_O_l_occ[i]), comp_2_TP\$CP_ID[i], 0) )

TP_raw

####NOUVEL OBJET: H (MEME NOMBRE DE LIGNES QUE "comp_2_TP")
col1 <- match("V61", names(TP_raw))
col1
col_last <- ncol(TP_raw)
col_last

H1 <- data.frame(t(TP_raw[,c(col1:col_last)]))
H1

nrow_H1 <- nrow(H1)
ncol_H1 <- ncol(H1)

H2 <- H1

################
H2\$TP_ID <- comp_2_TP\$TP_ID

H2\$CP_ID <- comp_2_TP\$CP_ID

H2

for(i in 1:nrow_H1)
H2\$TP_n[i] <- length(which(as.numeric(as.vector(H1[i,])) == H2\$TP_ID[i]))

for(i in 1:nrow_H1)
H2\$CP_n[i] <- length(which(as.numeric(as.vector(H1[i,])) == H2\$CP_ID[i]))

H2

################ON ISOLE LES DIFFERENTES MESURES

##
H_StartTime <- as.character(TP_raw\$StartTime)
H_StartTime

##
H_EndTime <- as.character(TP_raw\$EndTime)
H_EndTime

##
H_ElapsedTime <- TP_raw\$ElapsedSecs
H_ElapsedTime

##
H_Possessions_T <- if(TP_home_away == "Home") {as.vector(TP_raw\$PossessionsHome)} else {as.vector(TP_raw\$PossessionsAway)}
H_Possessions_T

H_Possessions_O <- if(TP_home_away == "Home") {as.vector(TP_raw\$PossessionsAway)} else {as.vector(TP_raw\$PossessionsHome)}
H_Possessions_O

##
H_PointsScored_T <- if(TP_home_away == "Home") {as.vector(TP_raw\$PointsScoredHome)} else {as.vector(TP_raw\$PointsScoredAway)}
H_PointsScored_T

H_PointsScored_O <- if(TP_home_away == "Home") {as.vector(TP_raw\$PointsScoredAway)} else {as.vector(TP_raw\$PointsScoredHome)}
H_PointsScored_O

##
H_PlusMinus <- if(TP_home_away == "Home") {as.vector(TP_raw\$PlusMinusHome)} else {as.vector(TP_raw\$PlusMinusAway)}
H_PlusMinus

################ON CALCULE LES DIFFERENTES MESURES POUR TP ET CP

######START TIME

##TP
H_StartTime_TP <- H1
H_StartTime_TP

for(i in 1:nrow_H1)
for(j in 1:ncol_H1)
H_StartTime_TP[i,j] <- ifelse(H2[i,j] == H2\$TP_ID[i], H_StartTime[j], 0)

H_StartTime_TP

for(i in 1:nrow_H1)
H_StartTime_TP\$x2[i] <- H_StartTime[which(H1[i,] %in% H2\$TP_ID[i])]

##CP
H_StartTime_CP <- H1
H_StartTime_CP

for(i in 1:nrow_H1)
for(j in 1:ncol_H1)
H_StartTime_CP[i,j] <- ifelse(H2[i,j] == H2\$CP_ID[i], H_StartTime[j], 0)

H_StartTime_CP

for(i in 1:nrow_H1)
H_StartTime_CP\$x2[i] <- H_StartTime[which(H1[i,] %in% H2\$CP_ID[i])]

##
H2\$TP_StartTime <- H_StartTime_TP\$x2
H2\$CP_StartTime <- H_StartTime_CP\$x2

H2

######END TIME

##TP
H_EndTime_TP <- H1
H_EndTime_TP

for(i in 1:nrow_H1)
for(j in 1:ncol_H1)
H_EndTime_TP[i,j] <- ifelse(H2[i,j] == H2\$TP_ID[i], H_EndTime[j], 0)

H_EndTime_TP

for(i in 1:nrow_H1)
H_EndTime_TP\$x2[i] <- H_EndTime[which(H1[i,] %in% H2\$TP_ID[i])]

##CP
H_EndTime_CP <- H1
H_EndTime_CP

for(i in 1:nrow_H1)
for(j in 1:ncol_H1)
H_EndTime_CP[i,j] <- ifelse(H2[i,j] == H2\$CP_ID[i], H_EndTime[j], 0)

H_EndTime_CP

for(i in 1:nrow_H1)
H_EndTime_CP\$x2[i] <- H_EndTime[which(H1[i,] %in% H2\$CP_ID[i])]

##
H2\$TP_EndTime <- H_EndTime_TP\$x2
H2\$CP_EndTime <- H_EndTime_CP\$x2

H2

######ELAPSED TIME

##TP
H_ElapsedTime_TP <- H1
H_ElapsedTime_TP

for(i in 1:nrow_H1)
for(j in 1:ncol_H1)
H_ElapsedTime_TP[i,j] <- ifelse(H2[i,j] == H2\$TP_ID[i], H_ElapsedTime[j], 0)

H_ElapsedTime_TP

H_ElapsedTime_TP\$x1 <- rowSums(H_ElapsedTime_TP)
H_ElapsedTime_TP

#AUTRE METHODE
for(i in 1:nrow_H1)
H_ElapsedTime_TP\$x2[i] <- H_ElapsedTime[which(H1[i,] %in% H2\$TP_ID[i])]

##CP
H_ElapsedTime_CP <- H1
H_ElapsedTime_CP

for(i in 1:nrow_H1)
for(j in 1:ncol_H1)
H_ElapsedTime_CP[i,j] <- ifelse(H2[i,j] == H2\$CP_ID[i], H_ElapsedTime[j], 0)

H_ElapsedTime_CP

H_ElapsedTime_CP\$x1 <- rowSums(H_ElapsedTime_CP)
H_ElapsedTime_CP

#AUTRE METHODE
for(i in 1:nrow_H1)
H_ElapsedTime_CP\$x2[i] <- H_ElapsedTime[which(H1[i,] %in% H2\$CP_ID[i])]

##
H2\$TP_ElapsedTime <- H_ElapsedTime_TP\$x2
H2\$CP_ElapsedTime <- H_ElapsedTime_CP\$x2

H2

######POSSESSIONS

##TP_T
H_Possessions_TP_T <- H1
H_Possessions_TP_T

for(i in 1:nrow_H1)
for(j in 1:ncol_H1)
H_Possessions_TP_T[i,j] <- ifelse(H2[i,j] == H2\$TP_ID[i], H_Possessions_T[j], 0)

H_Possessions_TP_T

H_Possessions_TP_T\$x1 <- rowSums(H_Possessions_TP_T)
H_Possessions_TP_T

#AUTRE METHODE
for(i in 1:nrow_H1)
H_Possessions_TP_T\$x2[i] <- H_Possessions_T[which(H1[i,] %in% H2\$TP_ID[i])]

##TP_O
H_Possessions_TP_O <- H1
H_Possessions_TP_O

for(i in 1:nrow_H1)
for(j in 1:ncol_H1)
H_Possessions_TP_O[i,j] <- ifelse(H2[i,j] == H2\$TP_ID[i], H_Possessions_O[j], 0)

H_Possessions_TP_O

H_Possessions_TP_O\$x1 <- rowSums(H_Possessions_TP_O)
H_Possessions_TP_O

#AUTRE METHODE
for(i in 1:nrow_H1)
H_Possessions_TP_O\$x2[i] <- H_Possessions_O[which(H1[i,] %in% H2\$TP_ID[i])]

##CP_T
H_Possessions_CP_T <- H1
H_Possessions_CP_T

for(i in 1:nrow_H1)
for(j in 1:ncol_H1)
H_Possessions_CP_T[i,j] <- ifelse(H2[i,j] == H2\$CP_ID[i], H_Possessions_T[j], 0)

H_Possessions_CP_T

H_Possessions_CP_T\$x1 <- rowSums(H_Possessions_CP_T)
H_Possessions_CP_T

#AUTRE METHODE
for(i in 1:nrow_H1)
H_Possessions_CP_T\$x2[i] <- H_Possessions_T[which(H1[i,] %in% H2\$CP_ID[i])]

##CP_O
H_Possessions_CP_O <- H1
H_Possessions_CP_O

for(i in 1:nrow_H1)
for(j in 1:ncol_H1)
H_Possessions_CP_O[i,j] <- ifelse(H2[i,j] == H2\$CP_ID[i], H_Possessions_O[j], 0)

H_Possessions_CP_O

H_Possessions_CP_O\$x1 <- rowSums(H_Possessions_CP_O)
H_Possessions_CP_O

#AUTRE METHODE
for(i in 1:nrow_H1)
H_Possessions_CP_O\$x2[i] <- H_Possessions_O[which(H1[i,] %in% H2\$CP_ID[i])]

##
H2\$TP_Possessions_T <- H_Possessions_TP_T\$x2
H2\$TP_Possessions_O <- H_Possessions_TP_O\$x2
H2\$CP_Possessions_T <- H_Possessions_CP_T\$x2
H2\$CP_Possessions_O <- H_Possessions_CP_O\$x2

H2

######POINTS SCORED

##TP_T
H_PointsScored_TP_T <- H1
H_PointsScored_TP_T

for(i in 1:nrow_H1)
for(j in 1:ncol_H1)
H_PointsScored_TP_T[i,j] <- ifelse(H2[i,j] == H2\$TP_ID[i], H_PointsScored_T[j], 0)

H_PointsScored_TP_T

H_PointsScored_TP_T\$x1 <- rowSums(H_PointsScored_TP_T)
H_PointsScored_TP_T

#AUTRE METHODE
for(i in 1:nrow_H1)
H_PointsScored_TP_T\$x2[i] <- H_PointsScored_T[which(H1[i,] %in% H2\$TP_ID[i])]

##TP_O
H_PointsScored_TP_O <- H1
H_PointsScored_TP_O

for(i in 1:nrow_H1)
for(j in 1:ncol_H1)
H_PointsScored_TP_O[i,j] <- ifelse(H2[i,j] == H2\$TP_ID[i], H_PointsScored_O[j], 0)

H_PointsScored_TP_O

H_PointsScored_TP_O\$x1 <- rowSums(H_PointsScored_TP_O)
H_PointsScored_TP_O

#AUTRE METHODE
for(i in 1:nrow_H1)
H_PointsScored_TP_O\$x2[i] <- H_PointsScored_O[which(H1[i,] %in% H2\$TP_ID[i])]

##CP_T
H_PointsScored_CP_T <- H1
H_PointsScored_CP_T

for(i in 1:nrow_H1)
for(j in 1:ncol_H1)
H_PointsScored_CP_T[i,j] <- ifelse(H2[i,j] == H2\$CP_ID[i], H_PointsScored_T[j], 0)

H_PointsScored_CP_T

H_PointsScored_CP_T\$x1 <- rowSums(H_PointsScored_CP_T)
H_PointsScored_CP_T

#AUTRE METHODE
for(i in 1:nrow_H1)
H_PointsScored_CP_T\$x2[i] <- H_PointsScored_T[which(H1[i,] %in% H2\$CP_ID[i])]

##CP_O
H_PointsScored_CP_O <- H1
H_PointsScored_CP_O

for(i in 1:nrow_H1)
for(j in 1:ncol_H1)
H_PointsScored_CP_O[i,j] <- ifelse(H2[i,j] == H2\$CP_ID[i], H_PointsScored_O[j], 0)

H_PointsScored_CP_O

H_PointsScored_CP_O\$x1 <- rowSums(H_PointsScored_CP_O)
H_PointsScored_CP_O

#AUTRE METHODE
for(i in 1:nrow_H1)
H_PointsScored_CP_O\$x2[i] <- H_PointsScored_O[which(H1[i,] %in% H2\$CP_ID[i])]

##
H2\$TP_PointsScored_T <- H_PointsScored_TP_T\$x2
H2\$TP_PointsScored_O <- H_PointsScored_TP_O\$x2
H2\$CP_PointsScored_T <- H_PointsScored_CP_T\$x2
H2\$CP_PointsScored_O <- H_PointsScored_CP_O\$x2

H2

######PLUS/MINUS RAW

##TP
H_PlusMinus_TP <- H1
H_PlusMinus_TP

for(i in 1:nrow_H1)
for(j in 1:ncol_H1)
H_PlusMinus_TP[i,j] <- ifelse(H2[i,j] == H2\$TP_ID[i], H_PlusMinus[j], 0)

H_PlusMinus_TP

H_PlusMinus_TP\$x1 <- rowSums(H_PlusMinus_TP)
H_PlusMinus_TP

#AUTRE METHODE
for(i in 1:nrow_H1)
H_PlusMinus_TP\$x2[i] <- H_PlusMinus[which(H1[i,] %in% H2\$TP_ID[i])]

##CP
H_PlusMinus_CP <- H1
H_PlusMinus_CP

for(i in 1:nrow_H1)
for(j in 1:ncol_H1)
H_PlusMinus_CP[i,j] <- ifelse(H2[i,j] == H2\$CP_ID[i], H_PlusMinus[j], 0)

H_PlusMinus_CP

H_PlusMinus_CP\$x1 <- rowSums(H_PlusMinus_CP)
H_PlusMinus_CP

#AUTRE METHODE
for(i in 1:nrow_H1)
H_PlusMinus_CP\$x2[i] <- H_PlusMinus[which(H1[i,] %in% H2\$CP_ID[i])]

##
H2\$TP_PM_raw <- H_PlusMinus_TP\$x2
H2\$CP_PM_raw <- H_PlusMinus_CP\$x2

######PLUS/MINUS PER-POSSESSION
H2\$TP_PM_perposs <- (H2\$TP_PointsScored_T/H2\$TP_Possessions_T) - (H2\$TP_PointsScored_O/H2\$TP_Possessions_O)
H2\$CP_PM_perposs <- (H2\$CP_PointsScored_T/H2\$CP_Possessions_T) - (H2\$CP_PointsScored_O/H2\$CP_Possessions_O)

################ON MET EN FORME LES RESULTATS
TP_results <- data.frame(as.vector(rep(c(gameID), nrow_H1)), as.vector(rep(c(date), nrow_H1)), as.vector(rep(c(team_name), nrow_H1)), as.vector(rep(c(opp_name), nrow_H1)), as.vector(rep(c(TP_home_away), nrow_H1)), H2\$TP_ID, H2\$CP_ID, comp_2_TP\$TP_T_l, comp_2_TP\$CP_T_l, comp_2_TP\$TP_O_l, comp_2_TP\$CP_O_l, comp_2_TP\$TP_T_l_players, comp_2_TP\$CP_T_l_players, comp_2_TP\$TP_O_l_players, comp_2_TP\$CP_O_l_players, H2\$TP_n, H2\$CP_n, H2\$TP_StartTime, H2\$TP_EndTime, H2\$TP_ElapsedTime, H2\$CP_StartTime, H2\$CP_EndTime, H2\$CP_ElapsedTime, H2\$TP_Possessions_T, H2\$TP_Possessions_O, H2\$CP_Possessions_T, H2\$CP_Possessions_O, H2\$TP_PointsScored_T, H2\$TP_PointsScored_O, H2\$CP_PointsScored_T, H2\$CP_PointsScored_O, H2\$TP_PM_raw, H2\$CP_PM_raw, H2\$TP_PM_perposs, H2\$CP_PM_perposs)
names(TP_results) <- c("gameID", "date", "team_name", "opp_name", "TP_Home/Away", "TP", "CP", "TP_team", "CP_team", "TP_opp", "CP_opp", "TP_team_players", "CP_team_players", "TP_opp_players", "CP_opp_players", "TP_n", "CP_n", "TP_StartTime", "TP_EndTime", "TP_ElapsedTime", "CP_StartTime", "CP_EndTime", "CP_ElapsedTime", "TP_Possessions_T", "TP_Possessions_O", "CP_Possessions_T", "CP_Possessions_O", "TP_PointsScored_T", "TP_PointsScored_O", "CP_PointsScored_T", "CP_PointsScored_O", "TP_PM_raw", "CP_PM_raw", "TP_PM_perposs", "CP_PM_perposs")
TP_results

TP_results_raw <- TP_results[order(TP_results\$CP),]
TP_results_raw

##ALL THINGS BEING EQUAL ("ATBE")
TP_results_ATBE <- subset(TP_results_raw, TP_results_raw\$TP_opp == TP_results_raw\$CP_opp)
TP_results_ATBE

##
result[[k]] <- list(gameID = gameID, TP_home_away = TP_home_away, team_name = team_name, opp_name = opp_name, date = date, T_lineups = T_lineups, O_lineups = O_lineups, T_l_unique = T_l_unique, O_l_unique = O_l_unique, comp_all_raw = comp_all_raw, comp_all = comp_all, comp_2 = comp_2, comp_2_TP = comp_2_TP, TP_raw = TP_raw, H2 = H2, TP_results_raw = TP_results_raw, TP_results_ATBE = TP_results_ATBE)

}

}

##BILAN GAMES SANS COMPARAISON "1"
w1 <- NULL
for (k in 1:length(game_ID))
if(is.atomic(result[[k]]) == TRUE) {w1[k] <- 1} else {w1[k] <- 0}

W <- data.frame(game_ID, w1)
W

sum(w1) ##nombre de games non pris en compte suivant ce critère

no_comparison_at_all <- which(W\$w1 == 1)
no_comparison_at_all

##BILAN GAMES AVEC COMPARAISON "1" MAIS SANS COMPARAISON ATBE
w2 <- NULL
for (k in 1:length(game_ID))
if(k %in% no_comparison_at_all == FALSE) {w2[k] <- nrow(result[[k]]\$TP_results_ATBE)} else {w2[k] <- NA}

W <- data.frame(game_ID, w1, w2)
W

w3 <- NULL
w4 <- NULL
for (k in 1:length(game_ID)) {
if(W\$w2[k] == 0 & is.na(W\$w2[k]) == FALSE) {w3[k] <- 1} else {w3[k] <- 0}
if(W\$w2[k] != 0 & is.na(W\$w2[k]) == FALSE) {w4[k] <- W\$w2[k]} else {w4[k] <- 0}
}

W <- data.frame(game_ID, w1, w2, w3, w4)
W

sum(w3) ##nombre de games non pris en compte suivant ce critère

no_comparison_ATBE <- which(W\$w3 == 1)
no_comparison_ATBE

sum(w4) ##nombre de comparaisons ATBE au final

##
results_ATBE_raw <- list()

for (k in 1:length(game_ID))
if(k %in% no_comparison_at_all == FALSE & k %in% no_comparison_ATBE == FALSE) {results_ATBE_raw[[k]] <- result[[k]]\$TP_results_ATBE} else {results_ATBE_raw[[k]] <- NA}

results_ATBE_raw

results_ATBE_raw_OK <- do.call(rbind, results_ATBE_raw[!is.na(results_ATBE_raw)])
results_ATBE_raw_OK

nrow(results_ATBE_raw_OK)

##ON RETIRE LES COMPARAISONS IMPLIQUANT UN NOMBRE DE POSSESSIONS = 0 (PM = NaN ou Inf)
results_ATBE_OK <- subset(results_ATBE_raw_OK, results_ATBE_raw_OK\$TP_Possessions_T != 0 & results_ATBE_raw_OK\$TP_Possessions_O != 0 & results_ATBE_raw_OK\$CP_Possessions_T != 0 & results_ATBE_raw_OK\$CP_Possessions_O != 0)

####################
return(results_ATBE_OK)

}

############################## FUNCTION END #############################

```

Now, to get the results of a given player, you just have to execute the “qepm” function. For instance, Steve Nash’s ID is #8. To get his results, execute:

```x <- qepm(8)
x
```

(the computation might take some time, about 5 minutes on my computer)

Here is a sample of the output of the algorithm:

gameIDdateteam_nameopp_nameTP_Home/AwayTPCPTP_teamCP_teamTP_oppCP_oppTP_team_playersCP_team_playersTP_opp_playersCP_opp_playersTP_nCP_nTP_StartTimeTP_EndTimeTP_ElapsedTimeCP_StartTimeCP_EndTimeCP_ElapsedTimeTP_Possessions_TTP_Possessions_OCP_Possessions_TCP_Possessions_OTP_PointsScored_TTP_PointsScored_OCP_PointsScored_TCP_PointsScored_OTP_PM_rawCP_PM_rawTP_PM_perpossCP_PM_perposs
1420071101PHXSEAAway8488632216, 7, 4, 8, 537488, 16, 7, 4, 537172, 714, 715, 168, 173172, 714, 715, 168, 1731100:32:5500:31:506500:33:1800:32:552332110203-2-3-1.00000000-3.00000000
3820071104PHXCLEHome81651887, 12, 486, 4, 816, 7, 12, 486, 4561, 257, 260, 264, 265561, 257, 260, 264, 2651100:00:4200:00:004200:14:4600:12:0016622762375-12-0.500000000.16666667
3820071104PHXCLEHome8486418816, 7, 12, 4, 816, 7, 12, 486, 4561, 257, 260, 264, 265561, 257, 260, 264, 2651100:15:1600:14:463000:14:4600:12:0016622763375020.000000000.16666667
5020071106PHXCHAAway848883121216, 486, 4, 8, 78488, 16, 486, 4, 78307, 126, 117, 90, 111307, 126, 117, 90, 1111100:31:5600:30:407600:32:4400:31:5648332320042-40.66666667-1.33333333
7520071109PHXMIAAway8486324416, 7, 12, 4, 816, 7, 12, 486, 4243, 201, 206, 185, 196243, 201, 206, 185, 1961100:41:0000:39:0611400:15:1800:14:582044113400-10-0.250000000.00000000
8420071110PHXORLAway8488748816, 12, 4, 8, 78488, 16, 12, 4, 78245, 277, 419, 149, 170245, 277, 419, 149, 1701100:15:3800:14:425600:14:4200:12:00162225540244-22.00000000-0.40000000
8420071110PHXORLAway8488747716, 12, 4, 8, 78488, 16, 12, 4, 78277, 419, 161, 149, 170277, 419, 161, 149, 1701100:40:1100:37:1617500:37:1600:36:0175652382046-40.93333333-1.33333333
10120071113PHXNYKHome81684151512, 4, 8, 131, 7816, 12, 4, 131, 78597, 679, 249, 154, 255597, 679, 249, 154, 2551100:15:4500:14:545100:14:5400:12:0017422550146-1-2-0.50000000-0.40000000
10120071113PHXNYKHome816842212, 4, 8, 131, 7816, 12, 4, 131, 78597, 249, 235, 154, 254597, 249, 235, 154, 2541100:39:2000:38:215900:38:2100:36:0014111662368-1-2-1.00000000-0.33333333
10120071113PHXNYKHome813164151516, 12, 4, 8, 7816, 12, 4, 131, 78597, 679, 249, 154, 255597, 679, 249, 154, 2551100:16:2200:15:453700:14:5400:12:00174115530463-23.00000000-0.40000000
11720071115PHXCHIHome87613316, 12, 486, 4, 816, 7, 12, 486, 4122, 113, 110, 112, 559122, 113, 110, 112, 5591100:08:4200:08:004200:12:4300:12:004312212402-2-20.00000000-2.00000000
11720071115PHXCHIHome8772131316, 12, 4, 8, 7816, 7, 12, 4, 78122, 113, 119, 383, 559122, 113, 119, 383, 5591100:39:4900:38:555400:38:5500:37:1310222340402-4-2-2.00000000-0.50000000
11720071115PHXCHIHome8486419916, 7, 12, 4, 816, 7, 12, 486, 4122, 113, 119, 112, 559122, 113, 119, 112, 5591100:15:1000:14:254500:14:2500:12:4310212330403-4-3-2.00000000-1.00000000

For instance, the first row says this: in the game Phoenix vs. Seattle that took place on November 1, 2007, there is one comparison “all other things being equal” involving player #8, Steve Nash (“TP” column), the other player involved in the comparison being player #488, Marcus Banks (“CP” column).

The two game segments corresponding to that comparison are described in detail. The lineup including Steve Nash is 16, 7, 4, 8, 537 (“TP_team_players” column), the opposing lineup being 172, 714, 715, 168, 173 (“TP_opp_players” column). The lineup including Marcus Banks is 488, 16, 7, 4, 537 (“CP_team_players” column), the opposing lineup being 172, 714, 715, 168, 173 (“CP_opp_players” column).

The per-possession Plus/Minus of the lineup including Steve Nash is -1.0 (“TP_PM_perposs” column) while that of the lineup including Marcus Banks is -3.0 (“CP_PM_perposs” column).

As one might expect, most of games contain no comparisons “all other things being equal”, while some games contain several.

The command below displays the different players to which the target player can be compared:

```describe(as.factor(x\$CP))
```

Here is the output in our example with Steve Nash:

``````Value   Count   Percent
16      27      31.03
486     16      18.39
7       11      12.64
12      8       9.2
133     8       9.2
488     6       6.9
4       4       4.6
78      3       3.45
131     1       1.15
203     1       1.15
mode=16  Valid n=87   12 categories - only first 10 shown``````

Steve Nash can be compared “all other things being equal” to:
• player #16 (Leandro Barbosa) in 27 observations
• player #486 (Grant Hill) in 16 observations
• player #7 (Raja Bell) in 11 observations
etc.

In some cases, there are enough observations to test whether the difference on per-possession Plus/Minus between the two conditions is statistically significant. The relevant statistical test is a paired t-test. Here is a good example. Let’s consider Manu Ginobili (player #24) as the target player. It turns out that Ginobili can be compared to Michael Finley (player #29) “all other things being equal” in 83 observations, which is a significant amount of observations (both players are shooting guards).

Let’s calculate the mean per-possession Plus/Minus of both players:

```comp_P29 <- x[x\$CP == 29,]
comp_P29
nrow(comp_P29)
mean(comp_P29\$TP_PM_perposs)
mean(comp_P29\$CP_PM_perposs)
```

Ginobili: 0.078, Finley: -0.222. Is that difference statistically significant? The command below performs the paired t-test to compare the two values:

```t.test(comp_P29\$TP_PM_perposs, comp_P29\$CP_PM_perposs, paired = TRUE)
```

Here is the result:

``````t = 1.9192, df = 82, p-value = 0.05844
alternative hypothesis: true difference in means is not equal to 0``````

The difference between the mean per-possession Plus/Minus of Ginobili (0.078) and that of Finley (-0.222) is almost statistically significant. It means that during the 2007-08 season, the Spurs did better when Ginobili was on the court than when Finley was on the court, all other things being equal.

## CONCLUSION

Compared to Net Plus/Minus (NPM) and Adjusted Plus/Minus (APM), Quasi-Experimental Plus/Minus (QEPM) is not a metric. In fact, QEPM is an (quasi) experimental approach of Plus/Minus: it allows one to compare how a team performs when a given player is on the court to how the team performs when another player is on the court, all other things (teammates and opponents) being equal. I think this information can be quite useful to basketball analysts.

Fundamentally, QEPM shows that in certain sports like basketball, it is possible to test rigorously causal relationships even though many people think this is not possible. For instance, Hirsch and Hirsch (2011) have published a book in which they massively criticize the scientific approach of sports. Here is what they write (p.49) about the impossibility to test causal relationships in baseball:

“Take the suggestion, as discussed above, that Roger Maris hit better with Mickey Mantle batting behind him. We supplied data to support that conclusion, but the data hardly prove that manager Ralph Houk was right to bat Mantle behind Maris. The first question he’d need to ask is whether Maris’s success with Mantle batting behind him amounts to a real finding or a random phenomenon rooted in a small sample size. Assuming he determined that is was real, he’d need to consider the converse: might not Maris protect Mantle equally well? How could he establish which way they were better off?

The obvious answer is to switch the order for awhile, and see how Maris, Mantle and the Yanks fare with Maris behind Mantle as compared to how they’d done previously. In fact, this experiment would not yield reliable information. Experiments provide the most useful information when the scientific method is used – all relevant factors kept constant except one (in this case, where Maris and Mantle bat). If you vary only the factor in question, the results can confidently be attributed to the change in the single variable. Baseball does not permit such a controlled experiment. The flipping of Maris and Mantle in the lineup will necessarily be accompanied by numerous other changes as well – the teams the Yankees play, the pitchers they face, the weather, the performance of the batters before and after Maris and Mantle and, most importantly, the players themselves. […] In short, any hypothesis about which batting order best serves the Yankees can never reliably be tested.

That does not hold for basketball which allows for quasi-experiments.