Here's the answers as I remember from class. I had to re-write them from memory, because I couldn't find my paper from high-school. I'll repost the question.
Cut It Down to 3:05
IQ Test Question
QUESTION INTRO
I am the entertainer,
I come to do my show.
You've heard my latest record,
It's been on the radio.
Ah, it took me years to write it,
They were the best years of my life.
It was a beautiful song.
But it ran too long.
If you're gonna have a hit,
You gotta make it fit--
So they cut it down to 3:05. ~Billy Joel
Not merely a catchy lyric, this was Billy's personal dig at a recording industry policy in the 1970's. The industry decided that since the average length of #1 hits had been 3 minutes five seconds in the previous decade... by extension it would be a good idea to keep future songs to this magical musical time limit. The study done by the recording industry was actually inspired by the concept of the Golden Segment (discovered by Euclid). They thought that a similar magical formula for creating hits might exist in music. Here was their methodology.
1.The average was a simple average calculated by taking the length of each song that had made it to #1 in the last decade and dividing by the total of number #1 songs.
2.They got their stats from the billboard charts and radio show call in requests. The rules were simple: The most requested song during a particular week, was considered to be the #1 song for that week.
Now put yourself in the shoes of an up and coming executive in the music industry that has a chance to be promoted in the company if he shows where his current boss, that came up with this “cut it down to 3.05” policy, has erred.
List each individual flaw or error separately in it's own paragraph. Grading is on the number of flaws you find.
Where They Messed Up
One almost doesn't know where to start, since the entire endeavor, resultant conclusions and policy it created is one giant kerfuffle from beginning to end. Keep in mind this question is entirely based on a real life event. Scary thought. It would perhaps be best to segregate the errors into three categories; sampling, premise, and execution.
Sampling
If one wanted to determine what the most popular songs had been during a particular decade, one could not actually do it using the sampling method the industry used. Why? Because the song which gets the most requests in any given week is not necessarily the most popular song - here's why.
1. Missed it by that much: If let's say song #1 gets 50,000 requests and song B gets 49,999 requests, song A gets listed as a #1 hit and song B gets no mention whatsoever. The following week the #1 hit could get only 30,000 requests and actually be far less popular than the #2 song from the week before. Yet the latter makes the grade and the former gets kicked to the curb and becomes absent from the data.
2. How to feel thin: Hang out with people fatter than you. Because of looking at call-ins over the time span of weeks rather than over the life of the song, the competition that a song goes up against will play a larger factor than all other factors combined in whether or not it makes #1. Imagine how few songs made it to the hallowed first place position during the period when Stairway to Heaven was released.
3. Duration Blind: Are you really going to tell me that Hey Mickey & Hotel California are equal in popularity because they both made it to #1? I didn't think so. In order to make any kind of accurate assessment of song popularity, the length of time it stayed at #1 is just as important as the fact that it got there. Songs that barely slip in for one week are nowhere near as popular as songs that stayed on top for months. A weighted average could have accounted for this.
4. No need to request it: I would have requested my favorite song but you played it 10 times last hour. Popular songs often get so much radio play that no one bothers to request them thus creating a negative feedback loop which can bias their rated popularity since they get fewer requests because they are popular.
5. The missing demographic: Not everyone who listens to music and buys albums listens to the radio. Many avid album listeners don't even own radios, and buy all their music. Assuming album sales are the recording industry's primary motivation, would they really want to cater to only a small slice of their client base? I'm in this category, I LOVE music and do not own a radio and have never requested a song.
6. But you don't do requests: Not every radio station takes requests and not every radio listener requests songs, so even among the music lovers who are radio listeners, a good many of them don't call in to request songs. If a song was popular among listeners of 'no call-in' radio stations or listeners who don't call in you'd have no way to tell.
7. I dedicate this to you: Songs of certain type such as love songs are requested more because people can dedicate them. Therefore there is a slight bias toward songs which are dedicated which makes them seem more popular than they really are. They are more requested—they are not necessarily more popular.
8. I got paid to do it: People can, of course, lie. It's common practice for people to be paid to call in and request songs, making the number of requests a better representation of how much money was spent on marketing by the distributer than the song's true popularity.
9. Same time next year: If a song is released during the holiday season or during a major election, peoples' listening and request habits are altered. Low requests during these times might have nothing to do with the song's popularity or construction.
Premise
Correlation does not imply causation and the music industry's entire quest for a hit formula is rife with post hoc ergo propter hoc logical fallacy.
10.We'll always have average: Any set of numerical data can be averaged. This in no way proves or even suggests that the average you come up with is in anyway related to the data. There are several things that could have been done to test the hypothesis; they didn't do them. As I see it, the biggest error here is assuming that a song's length has anything to do with its hit potential.
11. Base rate neglect: No attempt was made to see if songs that were 3 min 5 seconds were more likely to make it to #1, as only the #1 hits were sampled. Assuming there is any causation, one would need to sample the lengths on non #1 songs to establish a baseline. For all we know it could turn out that the average length of all songs is 3:05 - hit or no hit.
12. Out of focus group: They could have produced multiple versions of the same songs and played them for three groups. One group gets the 3:05 version and rates it. One group gets the longer version and rates that. The last group gets both, and picks the one they like better. With large enough groups to factor out random variables in the subject selection, they could have easily rejected the null hypothesis with statistical significance and proved causation.
13. So why are you averaging, anyway?: Without the data we can only guess, but I would be willing to bet that if you looked at the actual length of the #1 hits from the 60's very few of them would be 3:05 in length. Yes, the average of the short songs and long songs might be in the middle. That's kinda how averaging works. The fundamental flaw here is thinking that there were a lot of 3:05 songs. The number was derived by adding all the lengths up and dividing. It is entirely possible that no #1 hits from the sixties were 3:05 and that the only place this duration occurs is as a mathematical construct. A much better way to do it would have been to calculate the probability of success for songs of different lengths and keep averages out of it.
14. Requests on trial: Far as I know there is no hard data that implicitly shows that popular songs that people shell out money to buy are requested proportionately to their popularity. The entire premise of gauging a song's popularity by number of requests is an unsubstantiated supposition.
15. So why do you want me to play this song anyway?: Well if you must know, it's because I didn't like it enough to buy the album. If you really liked a song and the group who sang it, wouldn't you buy it? And if you had it sitting there in your CD player would you still request it? I'm guessing no. It is not at all clear that songs people like the most are the ones they request the most.
16. Is the past really what we want to base the future on: Even if in the last decade the average length of #1 hits was 3:05, is that really predictive of what people will want in the next decade. Trends in music change faster than underwear.
Execution and Conclusions
17. Heisenberg Uncertainty: Confirmation bias by active modification of sample set. Of course as soon as you start cutting songs to fit a certain length, all future stats gathered will be biased by your own actions, creating a self fulling prophecy confirmation bias you'll never escape from. If you offer a choice of milk or milk to a house quest, you cannot then infer that if they pick milk, it is because it is their favorite drink.
18. Cultured Pearls: The songs of the previous decade that they averaged were not cut to a certain length, they were that way naturally. There is no reason to believe that by artificially truncating songs longer than 3:05, one would maintain quality. They might have considered encouraging musicians to write songs to that length, it is a major error in execution methodology to cut already written songs down.
19. You could have just asked: Though it is done now, apparently back then, simply asking people what their favorite songs were never occurred to anybody. This information on requested songs became available and people favored that over randomly sampled studies and listeners choice lists. When listener choice lists did start popping up, little correlation was found between what were considered #1 hits and what people actually liked...not surprising!
20. Unrelated Though it is covered by some of the other answers it bares a restate that there is actually no evidence whatsoever, that the length of a song effects its chances of becoming a #1 hit. The entire exercise was a post hoc ergo propter hoc logical fallacy.
P.S. If you use this yourselves grade it on number of answers, not necessarily the accuracy of those answers. It is primarily a test of creativity. It also tests cognitive bias.