Tuesday, October 8, 2013

Sab-R-Metrics Data Links

For those of you still visiting this website to practice your R skills using my Sab-R-Metrics series, I want to let you know that the data sets are no longer available at the links provided in each article.  I have moved my personal website off of the University of Michigan servers (finally) now that I am at Florida, and that site had hosted the data sets for the posts.  Unfortunately, I do not have time to go back through and re-link all of the data.

Nearly all of the data I used should be publicly accessible and you can modify the code for your own data formats.  If you are dying to get a hold of the original data files, please send me an email or note in this post what you are looking for.  I will see what I can do on a person by person basis.

While posting is sparse at this blog these days, it is still something I would like to get back to in the future.  Don't think I have completely abandoned my post.


Thursday, July 18, 2013

Advanced sab-R-metrics: Parallelization with the 'mgcv' Package

Carson Sievert (creator of the really neat pitchRx package) and Steamer Projections posed a question about reasonable run times of the mgcv package on large data in R yesterday, and promised my Pitch F/X friends I would post here with a quick tip on speeding things up.  Most of this came from Google searching, so I do not claim to be the original creator of the code used below.

A while back, I posted on using Generalized Additive Models instead of the basic loess package in R to fit Pitch F/X data, specifically with a binary dependent variable (for example, probability of some pitch being called a strike).  Something went weird with the gam package, so I switched over to the mgcv package, which has now provided the basis of analysis for my 2 most recent academic publications.  I like to fancy this the best way to work with binary F/X data--but I am biased.  Note that the advantages of the mgcv package can also be leveraged in fitting other data distributions besides the binomial.  This includes the negative binomial distribution, which can be more helpful for data that are zero-inflated (probably most of the other binomial data we want to model in baseball pitches).

One advantage of mgcv is that it uses generalized cross-validation in order to estimate the smoothing parameter.  Why is this important?  Well, because we have different sample sizes when making comparisons--for example, across umpire strike zones--and we also have different variances, we might not want to fit each one the same.  Additionally, smoothing by looking at the plot until it "looks good" can create biases.  Therefore, this allows a more objective way to fit the data.  I also like the ability to fit interactions of the vertical and horizontal location variables.  If you fit them separately and additively, you end up missing out on some of the asymmetries of the strike zone.  These ultimately tend to be pretty important, with the zone tipping down and away from the batter (See the Appendix for comparison code, see below for picture of tilt; figure from this paper).


One thing that I did note on Twitter is that for the binary models, a larger sample size tends to be necessary.  My rule of thumb--by no means a hard line--is that about 5,000 pitches are necessary to make a reasonable model of the strike zone.  The is close to the bare minimum without having things go haywire and look funky like the example below, but depending on the data you might be able to fit a model with fewer observations. 


Also, if you know a good way to integrate regression to the mean in some sort of Bayesian fashion in these models, that might help (simply some weighted average of all umpire calls and the pitches called by Umpire X that does not have enough experience yet).

Because R tends to work on a single thread, instead of using all the cores on your computer, the models can become rather cumbersome.  Believe me, I know.  For a while, I was fitting models with 1.3 million pitches, 125 dummy fixed effects, and some 30 other control variables at a time for this paper.  It took anywhere from 1-3 hours, depending on whether my computer felt happy that day--and I kept forgetting to include a variable here, change something there, etc.

OK, so parallelization.  It's actually incredibly easy in the mgcv package.  You first want to know if your computer has multiple cores, and if so, how many.  You can do this through the code below (note that I first load all the necessary packages for what I want to do):

###load libraries
 
library(mgcv)
library(parallel)
 
###see if you have multiple cores
 
detectCores()
 
###indicate number of cores used for parallel processing
if (detectCores()>1) {
cl <- makeCluster(detectCores()-1)
} else cl <- NULL
 
cl
Created by Pretty R at inside-R.org

That last 'cl' just tells you how many cores you will be using.  Note that this leaves one of your cores ready for processing other things.  You can use all of them, but it could end up keeping you from being able to do anything else on your computer while your model is running.  You can also use less.  Simply change the second line from '-1' to '-2', or whatever you want to do.  From here, mgcv has a single command for using multiple cores.  You'll want to use the 'cl' designation as the cores to use.

One should also note that, in R, large data sets and massive matrix inversions take up a significant amount of RAM.  When I came to Florida I had to convince our IT people that I needed at least 32 GB of RAM, specifically to run the models in the paper linked above.  Running the single model got me up to 8-10 GB, while doing multiple models in a single instance in R subsequently maxed me out at around 28 GB before I closed R and opened another instance.  This is a limitation that can be addressed to some extent with mgcv, but if you're not running every single pitch available in the F/X database, you probably won't have to worry about this. 

In case you do, mgcv also has a nice option that breaks the data up into chunks and has a much lower memory footprint.  It is called bam() and works just as the gam() function does, but allows analysis on larger data sets when you have more limited memory by breaking it into chunks.  The help file claims that it can work much faster on its own in addition to saving memory.  And--most relevant to this post--this is the function that includes the option to parallelize your analyses.  The code is exactly the same with the extra command using our 'cl' defined above.  Note that I use the combined smooth and limit the degrees of freedom of the smooth to 50.  Those are, of course, choices of the modeler and dependent on the type of data you are analyzing.

###fit your model
 
strikeFit1 <- bam(strike_call ~ s(px, pz, k=51), method="GCV.Cp", data=called, 
   family=binomial(link="logit"), cluster=cl)
 
summary(strikeFit1)
Created by Pretty R at inside-R.org

Boom.  That's it.  You can also consider fitting smooths based on handedness.  You can do one for each type of batter by breaking up the data and the modeling, or you can do the following below:

###fit model while controlling for batter handedness
 
strikeFit2 <- bam(strike_call ~ s(px, pz, by=factor(batter_stand), k=51) + factor(batter_stand), 
   method="GCV.Cp", data=called, family=binomial(link="logit"), cluster=cl)
 
summary(strikeFit2)
Created by Pretty R at inside-R.org

And of course you can add covariates to your model that you want to estimate parametrically, such as the impact of count or pitch type:

###fit model controlling for batter handedness, count, and pitch type
strikeFit3 <- bam(strike_call ~ s(px, pz, by=factor(batter_stand), k=51) + factor(batter_stand) + 
   factor(pitch_type) + factor(BScount), method="GCV.Cp", data=called, family=binomial(link="logit"), cluster=cl)
 
summary(strikeFit3)
Created by Pretty R at inside-R.org

With the model, creating figures is as easy as using the predict() function and using code as I have shown here before.  And, thanks to Carson, much of the figure production is now automated in the pitchRx package.

Note that much of my reading about this package comes from an excellent book by its creator, Simon Wood, called Generalized Additive Models: An Introduction with R.  If these models are interesting to you, this is a must have resource.


Appendix: The reason I use the interaction term is that the UBRE score is significantly better by doing so, as suggested in the previously cited text.  The code to compare the two models is also included below.  Note that your variable names and data name may differ, so change accordingly:
###Model with separate smooths
fit <- bam(strike_call ~ s(px, k=51) + s(pz, k=51), method="GCV.Cp", data=called, 
   family=binomial(link="logit"), cluster=cl)
summary(fit)
 
###Model with combined smooth
fit.add <- bam(strike_call ~ s(px, pz, k=51), method="GCV.Cp", 
   data=called, familiy=binomial(link="logit"), cluster=cl)
summary(fit.add)
###combined smooth UBRE score is lower
 
###compare models with Wald test
anova(fit, fit.add)
Created by Pretty R at inside-R.org

Friday, July 5, 2013

What is Big Data, anyway?

My graduate school advisor, Rod Fort, posed the question in this post's title on Twitter today.  I gave answers and, as he usually does, he made me think more about my answers and their precision.  Technically, what I was trying to get across was that the use of Big Data, in most cases, is terribly imprecise.  I should have been able to explain the use of the term quickly, but it took a while and a number of "well, we've always done that" from Rod.  It is thrown around a lot, and in most cases not in any meaningful way.  I got a similar reaction to my mention of a prospective certificate in Complex Systems while at Michigan (which I did not pursue--mainly because my mathematical background wasn't strong enough and I had time constraints pursuing other things).

So, assuming we want to separate the use of "Big Data" with "Analytics", I think we can amply sum up the term with the following:

Big Data describes the relationship between the ability to collect data, and the ability to do something with it.  Data is BIG at the margin at which one more unit of data would leave us unable to analyze it all with the given technological capability.

This leaves Big Data flexible for the given tool.  The growth to collect and store large amounts of data has outpaced the ability to do anything meaningful with it.  This isn't anything new.  In the same way that dynamic pricing isn't really a new idea, just a new implementation.  In the same way that analytics aren't new, just a clear recognition of the integration between statisticians, programmers, and managers in the use of the term today.  In the same way that Moneyball, the idea, isn't new.  All tend to improve over time just as any field.

When it comes to analysis of Big Data--not the term big data itself--the holy grail is to have the ability to push a button, and have the answer directly to the decision maker, what I called "streamlining" on Twitter.  But this isn't Big Data itself (and it's really a fantasy at its extreme).  Certainly we can get closer to this, but data changes, behavior changes, the world changes.  These will always have to be updated, and in many ways I don't know that Big Data and Analytics as terms are completely separable.  In this case, though, let's be specific:

Analytics is the pursuit of simplistic, streamlined statistical information in a context understandable to the decision maker.

Again, unless we believe the movie Paycheck, this won't be 100% possible.  But the fantasy idea is that the computer and its data will tell us the answer to everything.  I enjoy this quote from the Big Data article linked above:  

"May 2012 danah boyd and Kate Crawford publish “Critical Questions for Big Data” in Information, Communications, and Society...(3) Mythology: the widespread belief that large data sets offer a higher form of intelligence and knowledge that can generate insights that were previously impossible, with the aura of truth, objectivity, and accuracy.”

However, we can do things to ease the use of large amounts of information to make decisions.  This requires the cooperation of statisticians, programmers, and managers.  Managers need to pose the problem in a way that is tractable and understandable.  Statisticians need to know the best methods for the distribution and variability in some given set of information, and be able to communicate this back to the manager.  The programmer needs to be able to collect the data accurately--possibly with the help of many other experts--and deliver the methodology in a way that can happen in real time or as close to it as possible.  In many ways, Dynamic Pricing is an outcome of these things--but not a new idea.  Big Data commentaries and discussions are referring to closing the gap between availability and implementation.



Tuesday, June 25, 2013

Dynamic Pricing and Sports Business Models

I just began to follow the Sports Analytics Blog on Twitter.  I click through every now and again.  Interestingly, they have a multi-part piece going up currently about dynamic pricing.  However, I have some qualms with the apparent misunderstanding of business models in certain sports.  Therefore, I figured I would go ahead and use this blog for something and write a short blurb about it.

First, let's talk about dynamic pricing.  The term itself, I guess, is rather new and comes along with technology and data management improvements.  However, the idea is not new, and much of the roots of its theory come from economics.  I am going to talk about dynamic pricing here without using the words "analytics" or "moneyball", because those words just get thrown around all over the place.

Dyanmic pricing has its roots in economic theory on price discrimination and product differentiation.  Price discrimination discusses the ability of a firm to charge different prices to different people based on their willingness to pay.  However, this used to be a big problem for firms to do, as you can't just ask someone how much they want to pay for a product when they come up to the register (they'll tell you they value it at a penny, so the theory goes).  This is what car salespeople try to do when you buy a car: get you to signal to them how much you are willing to pay.  Once you do, you are toast.

But it is important to remember that price discrimination is not necessarily a bad thing (well, the European Union would like you to think so).  From a neutral standpoint--where we as good little economists do not favor the consumer or the firm--price discrimination is more efficient.  The firm no longer has to choose a single price to charge to maximize profits.  It can charge more to those willing to pay more, and less to those willing to pay less.  This probably isn't good for those who will pay more for the product, but now--at least down to marginal cost--those people that could not buy the product at the higher, single price can now purchase it.  Just as with redistribution, some are worse off and some are better off.  But if the cost of figuring out who is willing to pay what are low enough, this is much more efficient than taxation and redistribution.

Product differentiation is one way in which a firm can charge different prices to different types of consumers without asking them, but it technically differs in that it uses multiple products instead of different prices for a single product.  In this case, the firm varies the type of product it sells.  An easy story is IBM and its printers.  Simply, IBM had two types of printers, each for a different price.  One printed fast.  For the other, IBM actually put a special chip in it specifically to slow down the printing speed, despite it being the same exact printer.  They charge more the faster printers, and allow the consumers to sort themselves out by differentiating these products and letting the consumer decide how much printing they need.  Yes, the slow down chip cost a few marginal pennies for IBM, and they still went out of their way to charge less for this printer (about half, from what I can gather).  If they had sold the better printer to everyone for cheaper (so more people could buy them), they would not have maximized profits.  Additionally, more people were happy, as they could buy the cheaper, lower level printer for their basic home needs.

Interestingly, there is built-in product differentiation when buying tickets to athletic events.  Do you think the Yankees front row box seats are the same product as the upper level bleachers?  What about a Yankees game against the Red Sox game versus a game against the Royals?  Games on Saturday and games on Monday?  Teams have been doing this for a while.  Even season and single game tickets are different products.  And the dichotomy of these latter two products are the key to various business models in different leagues.

One of the important aspects of sport is the fixed supply: you can't make more seats if people want them, and you can't reduce costs by taking them away.  The stadium is built, the seats are there.  If you are not going to sell out a game, you may as well give out your tickets for free at the last minute.  If you are selling out, you can increase prices to suck up some surplus (or increase concession and merchandise prices inside the stadium--remember this is a joint maximization problem for the team).  This way, you can get concession revenues and make the stadium full (which may be more fun for the fans anyway).  Additionally, you can also charge more for in-arena/stadium advertising because you have more eyes on the ads.  In fact, teams do this.  The Detroit Pistons were handing out tickets in droves (to local youth groups, etc.) a day before game time in order to continue their sellout streak.  Maybe they should have called it a "give out" streak.  But it doesn't matter how they got into the stadium to those putting billboards inside, and the team's marginal cost of having more people in the stadium is essentially zero (give or take a few extra staff, maybe).

So where does the term "dynamic pricing" come from?  Well, this simply comes from real-time price discrimination and product differentiation.  The key here is "real-time."  Without the real time inclusion, then we're back to boring old economics.

Hotels and airlines are notorious for real-time price changes.  They know that, depending on the time of day and amount of time before the day of your stay/flight that you make your purchase, they can figure out your likely willingness to pay for a given room or seat.  Of course, this is a rough estimate, but it turns out that they are very good at this.  They do this with data they collect on all of their past sales.

More recently, teams have been delving into the use of these real time pricing models to--as the theory goes--simultaneously increase profits and allow more people into the stadium.  Those that wouldn't spend $50 on a Yankees ticket now get to buy it for $40.  Those that really wanted to attend that game bought it way in advance, and perhaps paid $60 for the same seat.  The key to the MLB business model here is that its ticket sales depend heavily on single game tickets.  The fact that there are many tickets to sell for each game--and they are not all sold preseason--allows MLB to do this.  NBA and NHL also have the ability to do this to some extent, though probably not to that of MLB.  I also heard that recently some college football teams are doing this (South Florida), but not the ones that sell pretty much all of their tickets preseason (i.e. Michigan and Florida--note that they do price discriminate through donations and student tickets).

OK, so back to Sports Analytics Blog.  What irked me a bit was this article, which gives the impression that NFL is not currently using dynamic pricing and is therefore making a poor business decision.  They use the example of the team losing a high profile player, which is fine: the Patriots games are arguably a very different product without Tom Brady on the field.  So far so good.  But if Brady is injured midseason, the team cannot suddenly change its prices!  NFL is almost a completely season ticket league (or tickets for single games purchased preseason).  Therefore, the real-time changes within the season aren't tractable, unless you are the Jacksonville Jaguars.

But, Brian, they could just hang on to them and sell them throughout the season (or keep prices super high early on), you say!?!?!

Well you are correct.  But in a short season like NFL, with huge revenue dependencies on television contracts and selling out games, the uncertainty involved in doing that may outweigh any benefit they get from pricing.  The short season allows for only 8 chances to get things right.  They let fans take the risk on purchasing the tickets and possibly seeing a down season.  That's a reasonable business decision, given the broadcasting structure and short season with lots of uncertainty (many times, pretty good teams don't make the playoffs when you have a 16 game season--think about who would make the playoffs if MLB was only 16 games).  Fans can sell off the tickets on the secondary market later if they decide they won't make the game.
To be fair, the SAB article puts things in terms of losing a player to free agency, etc.  But if that is the case, we're really not talking about anything "dynamic."  We are back to pretty basic pricing decisions.  And let's also remember this ignores economic theory in general.  Economic theory on sports says that teams choose their talent level before the season begins (with short term adjustments), based on what they know they can charge to their fans for a team with X wins (i.e. their objective function, assumed to be profit maximization).  Obviously these short term adjustments are important--like losing a player to free agency--but this isn't really dynamic.  It's just pricing.*

Now with that said, I certainly agree that teams should be considering these short term changes in talent levels if they can do so at a minimum cost (this seems likely).  But the timing of these decisions and the timing of ticket purchases in NFL as a whole would result in a relatively low use of a full-on dynamic pricing model.  The blogger at SAB, who I am sure is a sharp business person, seems to have a slight misunderstanding of dynamic pricing as real-time pricing, and of the business structure of NFL.  That doesn't mean teams shouldn't fully consider their business decisions with the information at hand.  They absolutely should!  But it doesn't make them poor businessmen for not being as quick to adopt these techniques.**

Lots can be said about sports pricing, and there is plenty of research to be done.  For now, I'll leave you with some good reading on pricing (these are gated, sorry).  Note that this is an extremely limited list of papers, and there is plenty more out there to be read (including basic texts on pricing in fixed supply industries like sports, entertainment, hotels, and airlines).


Berri, D. & Krautmann, A. (2007).  Can we find it at the concessions?  Understanding price elasticity in professional sports.  Journal of Sports Economics, 8, 183-191.

Salaga, S. & Winfree, J. (2013).  Determinants of secondary market sales prices for National Football League personal seat licenses and season ticket rights.  Journal of Sports Economics, DOI: 10.1177/1527002513477662.

Fort, R. (2004).  Sports pricing.  Managerial and Decision Economics, 25, 87-94.

Soebbing, B. & Humphreys, B. (2012).  A test of monopoly price dispersion under demand uncertainty.  Economics Letters, 114, 304-307.


*In my brief experience working with a sports ticket sales department and their analytics, there is plenty of room for improvement here.  I still can't understand how cold calling people to purchase game tickets actually work.  Has anyone ever gotten a call from a representative at their local pro sports team and been persuaded to go ahead and buy those ticket for this weekend's game?

**Now I do ignore a few interesting things about sports pricing (or at least just glaze over them).  Note that usually some sort of monopolistic firm is required for this tactic.  Otherwise, firms will bid each other down to cost.  At the very least, there would need to be differentiation of these competing products for price discrimination to happen.  Both of these conditions likely hold in pro sports.  The second is that there is a secondary market for tickets.  These are important considerations for teams, as fans selling these to other fans for more money could cut into some of the additional revenues that those fans would otherwise spend inside the stadium.  Thirdly, I ignore directly addressing the inelastic pricing of tickets across many sports.  Remember that there is a joint maximization problem (parking, merchandise, concessions, BEER) not just maximization of gate revenue--most likely dynamic pricing could be used in some sense for these other considerations in the NFL.  I also ignore the use of price dispersion, which could be an important tool for teams (especially in the data collection phase of willingness to pay in given situations).  Finally, there are interesting applications of luxury products and keeping prices high (i.e. Yankee front row box seats) and reference prices (prices that a consumer uses as a baseline for "high" or "low" price for a given product).

Tuesday, May 21, 2013

Graduate Student Research Assistant Position in Sports Economics

I am currently looking to fill an opening for a graduate student research assistant here at the University of Florida beginning in the Fall semester of 2014.  The student should have interest in topics relating to Sports Economics as well as a strong quantitative background.  The position will include tuition remission and a stipend renewable for up to four years.

The deadline for applying to this position is March 1, 2014.

For more information on this position and how to apply, please see this flyer.

Thursday, May 9, 2013

Revisiting Umpire Discrimination: New Paper at JSE

Two colleagues (Scott Tainsky and Jason Winfree) and I have a new paper just posted online at the Journal of Sports Economics.  We revisit the findings of Parsons et al. from 2011 (though, the working version of their paper caught press much earlier than this).  The paper was rather controversial and claimed important influences of umpires on game outcomes based on race.

Our paper uses a different data set and looks to replicate the findings from the original AER paper.  We were able to replicate the original findings from their provided data and code, but find odd uses of fixed effects are at the root of some of the findings.  A large majority of the paper looks at the robustness of the results, and implements Pitch F/X data to empirically derive the edge of the strike zone.  At best, the results initially presented in AER are mixed based on our analysis and re-analysis.

One thing to note is that the main interest of the Parsons et al. paper was not baseball.  The point was that detecting discrimination could be influenced by others that impact the performance of those of a given race (i.e. umpires in this context).  This point is still well taken, and makes up the most important contribution.  In fact, this is why the paper was published in the prestigious journal American Economic Review.

The link directly to the paper and abstract are below.  Unfortunately it is gated.  However, I am going to double-check my rights for including a link on my personal page (usually OK, but journals can sometimes be a pain on this issue).  If you have access, feel free to send along questions or comments to my email address or leave them in the comments.  Please make these comments and/or criticism constructive.

http://jse.sagepub.com/content/early/2013/05/02/1527002513487740.abstract

Saturday, May 4, 2013

Times Change, Or Why Steroids Don't Ruin Baseball for Me

Just a list of links without commentary other than this: I honestly don't care about the Steroid Debate beyond making clear how stupid it is.

Mantle Corked His Bat (insert asterisk here, right...right?)

Athletes Have Gotten Better, Mostly Without Steroids (imagine that!)

The Hall of Fame is Biased (well, I never!)

Friday, April 12, 2013

Brawling Costs Teams Money

I honestly don't know where to begin with the stupidity involved in this:

http://www.usatoday.com/story/sports/mlb/2013/04/12/quentin-charges-greinke-after-being-hit-by-pitch/2076525/

This idiocy cost the Dodgers a whole lot of money.  If I were running the team, I would at the very least seek legal counsel in order to evaluate the chance of getting some of Grienke's contract dollars back.  Yes, MLB contracts are guaranteed.  But he was injured because he assaulted someone.  Unpaid suspensions happen for PED users, so there must be some way to reconcile this.  Has there been any precedent to this sort of thing?  I don't see this behavior as assumed risk on the part of the Dodgers, though I guess one could argue that their supervisors (i.e. Mattingly) could have prevented it.

Grienke claims he did not mean to hit him.  Sure.  The catcher was set up outside and Grienke is a Cy Young contender.  Spots don't get missed like that.

Lest we forget the impact on the individual 2-1 game itself.  San Diego tied it up in that inning.  While it ended up in favor of the Dodgers, that's not something I want my players flirting with after spending over $200 million on them.

Tuesday, March 5, 2013

Refs Complicit in Fighting?

To begin, I'm not much of a hockey fan.  I just don't get it.  That doesn't mean it's not entertaining to you, to the entire population of cold-weather living people, that these guys aren't incredible athletes, or that it has no value.  It just means I don't enjoy watching it.  I watched it plenty as a kid, going to 5-10 Capitals games a year for a number of years.  I think it is an extremely interesting league from the standpoint of my academic interests.  But I never really enjoyed watching.  People say the same to me about baseball.  That's fine, it's not for everyone.  So you can take my following comments with a grain of salt if you like, or as blatant ignorance of what goes on in the sport on the ice.

Despite the idea that hockey has attempted to get rid of fighting, it is obvious to me that this is pure theater by Bettman and the owners.  In fact, given the video below, I suspect there is an explicit instruction to referees to not actually break up the fights until someone hits the ice.  (Hat Tip to Charlie Brown for the video)

http://www.youtube.com/watch?v=KJqN522mkFM&feature=player_embedded

These guys took up their boxing positions with everyone watching, including fellow players and referees, and not a single person bothered to try and separate them or stand between them.  In fact, the referee takes the initiative to pick up the debris (stick, etc.) and get it out of the way for when they throw down.  There is little question in my mind about the referees' complicity in these events on the ice, and I would not be surprised if they were given explicit instructions to let these things play out for the entertainment of the fans.  There is not a real safety concern for the refs or the players in breaking these two up as they are standing 10 feet from each other in their boxing poses.  This one looks almost to the point that it was staged.

McGinn had a broken orbital bone, likely having to do with his face-plant into the ice.  That is not a minor injury.   Not even close.  I know it has been said before, but if this happened in the stands someone would be on their way to prison.  This on-ice fight is no more acceptable to me than the video below, though I imagine there is more outrage there than the hockey fight.  At least in the baseball game, everyone didn't stand around and look the other way for a full 30 seconds while the batter/runner punched the pitcher in the face (Hat Tip to Tangotiger for the video below).

http://www.youtube.com/watch?v=DeKp8e88ZyI&feature=player_embedded

Note that I have the same issue with throwing at batters.  For a long time, I loved Pedro Martinez as a player, but after his many escapades with throwing at batters (not just throwing inside, but his throwing AT them and then talking about it) I no longer had any interest.  I feel the same way about Cole Hamels after the Bryce Harper beaning.  I don't think Selig did enough.  Hamels should have been suspended for the season.

Congress chastises leagues for PED use (particularly baseball for whatever stupid reasons they may have).  But why don't authorities bother with these sorts of incidents, where the league (with questionable antitrust status) is complicit in injuring its employees?  Assumed risk does not include violent assaults in any profession (and I would argue that this even includes boxing and MMA).

Let's make a comparison.  Wikipedia reports that the rate of reported aggravated assaults yearly in Detroit, Michigan is about 0.18% (1,334 assaults per 713,000 or so people).  Detroit isn't exactly a peachy place to live, in terms of crime.  In fact, the shortage of police there is becoming a huge problem.  Some calls take hours before an officer arrives at the scene.  In Los Angeles, a safer city but also a place where violent gang crime has been a serious issue in the past, there are 230 aggravated assaults per 3.84 million people.  That's a rate of .006%.

From Hockey Fight Statistics, in the 2011-2012 season (the lowest fight penalty rate since 06-07), there were 546 fights.  Give or take 700 total NHL players in a given season, we have a rate of about 78%.  That is an aggravated assault rate of 433 times the rate in the city of Detroit.  It is 13,000 times the rate in Los Angeles.

Incentives tend to work.  If you are caught breaking people's skulls in Detroit--even given the lack of police force there--you go to jail.  Same goes for LA.  The incentives against fighting in the NHL (and hitting batters in MLB) are laughable, at best.


**Note: Yes, there are probably differences in the severity of crimes that are reported in Detroit and LA, versus all "fight penalties" in hockey.  But even assuming that unreported aggravated assault in these cities is ten times what is reported, and assuming that only a quarter of NHL fights would be up to the standards of aggravated assault, the differences are still astonishing to me.

Tuesday, February 12, 2013

Employment Bias Toward Athleticism

Something I have always suspected happening in labor markets does, in fact, seem to be happening: hiring managers tend to give a premium to those signalling athletic ability or sport participation.  The paper, by Dan Olof-Rooth, looking at these results is linked below:

http://www.sciencedirect.com/science/article/pii/S0927537110001272

I think this is extremely interesting (and mirrors the studies that randomized "African American sounding names" on resumes, finding bias there).  Of course, there are different signals sent with athleticism vs. the sound of names.  A name isn't likely to signal much, maybe skin color, which we all know is not a valid way to exclude someone from a job.

However, athleticism could signal something else: motivation and time management.  My undergraduate thesis (unpublished) found that student athletes felt (self-reported) much better about their time management skills than non-athletes.  This could be a useful signal for someone hiring a prospective employee. 

Secondly, those who participate in sports tend to be more active and have more energy than those who do not.  These would also seem to be desirable skills for an employer. 

Lastly, being athletic could signal motivation or initiative from the person applying.  This is similar to participating in a club or being president of the young business leaders organization at your university.  I don't know that athletics would give an advantage above and beyond something like this, but it would seem to be at least a useful signal about involvement and social skills.  Team sports are social, and can provide opportunity to grow just as other clubs do.

All of these things are difficult to observe in an interview, so using sport participation as an implicit signal can be useful both for the employee to relay this information, and for the employer to get a bit more information about the prospective hire.  Of course, there is also the possibility of overt bias toward playing football or some other sport at a large university that the employer is a fan of.  This would not be a valid way to make a hire, but I suspect it does happen.  There is always a "buddy network" influencing many areas.

This is why I tend to always put on my CV or Resume that I participated in college athletics and currently continue to play softball and golf.  While it says nothing about my skills as an academic researcher (leaving aside the fact that I research sport), I suspect that at worst it will do nothing for me and at best make the employer slightly more interested.

What say you?