debating the long tail – purple motes

Getting a good, well-understood model often takes you three-quarters of the way toward solving a class of problems. Persons’ choices among a large number of symbolic items still lacks a good, well-understood model for business analysis.

Among a set of similarly instantiated symbolic items in a given domain of choice, item popularity vs item rank in log-log coordinates typically can be well described by a straight line. Put differently, a power-law distribution typically provides a reasonably good model for the aggregate pattern of choices. In a comment referring to a graph of Facebook app popularity, Chris Anderson seems to describe his Long Tail theory as equivalent to observing a straight line in log-log space:

A Long Tail is a powerlaw distribution, which looks exactly like what you’ve shown. All powerlaws have a huge drop-off like that–but the tail being long (get it?) the area under what appears to almost nothing adds up to a lot. The only way you can tell whether it really does conform to the theory or not is to plot it log-log and see if it’s a straight line.

At least under one definition of heavy-tailed distribution, all power laws are heavy-tailed. This meaning of heavy-tailed largely concerns the interpretation of observables and the management of risk. Heavy-tailed distributions are associated with rarely observed, difficult-to-predict outcomes that can dominate values of concern, such as aggregate profits. From this perspective, power laws and other extreme-value distributions are characteristic features of blockbuster-oriented, highly unpredictable businesses.[1]

As Anand Rajaraman insightfully observes, the Internet has produced powerful tools for communicating among persons and for observing and aggregating users’ choices and ratings. The process that produces blockbusters now depends more on influence among users. Experiments indicate that increasing social influence increases the unpredictability of success.[2] But even when blockbusters depend on more centralized, directed marketing campaigns, blockbusters have always been highly unpredictable.[3]

The slope of the approximating straight line for item popularity vs item rank in log-log coordinates offers a rough means for distinguishing between a blockbuster-oriented business and a niche-oriented business. The less the absolute value of the slope, the more business is distributed across relatively low popularity items. An even simpler index is the popularity of the most popular item. That’s the intercept of the item-popularity vs item-rank line where the x-axis goes from 1 to the number of items available. But this extremely simple index ignores most of the data: unusual circumstances may determine the popularity of the most popular item and make the approximating line fit badly for the top-ranked item. Thus the slope of the approximating line is probably a better, simple description of the business.[4]

Evidence is mixed on the evolving importance of blockbuster businesses relative to niche businesses in symbolic economies. Because a large number of possible names has been freely available since the invention of language (supply side), studying the distribution of chosen names is a good way to isolate demand-side factors in mass symbolic choice. In England over the past thousand years, given names show a remarkable flattening in the approximating power law beginning about the time of the Industrial Revolution and continuing to the present. On the other hand, experiments indicate that increasing social influence increases the steepness of the slope, meaning social influence makes popular items relatively more popular. [5] The Internet is plausibly associated with greater social influence, which may be sufficient to reverse apparent long-term trends toward diversification in symbolic choices.

A recent study of business data would have made a greater contribution to understandings symbolic economics with more attention to defining useful statistics. The study reported that among more than a million tracks offered through Rapsody in 2006, “the top 10% of titles accounted for 78% of all plays, and the top 1% of titles for 32% of all plays.” For just under 16,000 movie titles offered through Quickflix in 2006, “the top 10% of DVDs accounted for 48% of all rentals, and the top 1% for 18% of all rentals.”[6] A problem with these statistics is that the total number of titles on offer is changing greatly. Hence statistics such as the “top 10% of titles” and “the top 10% of DVDs” lack enduring significance. Because humans have physically limited brains and communication capabilities, rapidly increasing the total number of symbolic items that persons could choose isn’t likely to affect the aggregate pattern of actual choices among relatively popular items.

Nielsen VideoScan indicates an increasing number of titles are rarely chosen:

The number of titles that sold only a few copies almost doubled for any given week from 2000 to 2005. In the same period, however, the number of titles with no sales at all in a given week quadrupled. Thus the tail represents a rapidly increasing number of titles that sell very rarely or never. … Moreover, we determined that this is not simply a function of the sharp increase in the number of titles that have come onto the market in recent years, or of the transition from VHS to DVD; it is the truth of the long tail.[7]

The author did not describe how the analysis separated the effects of the sharp increases in total titles from actual choices. Disentangling the effects of an increase in titles is a difficult problem. That the form of the author’s statistics depend strongly on the total number of titles suggests that the author hasn’t actually figured out how to do that.

Some data indicate a growing business in a small number of titles. With respect to Nielsen Videoscan data:

success is concentrated in ever fewer best-selling titles at the head of the distribution curve. From 2000 to 2005 the number of titles in the top 10% of weekly sales dropped by more than 50%—an increase in concentration that is common in winner-take-all markets.

The effect seems not to be consistent with a linear popularity model. With respect to Nielsen Videoscan data, the author observes:

The importance of individual best sellers is not diminishing over time. It is growing.

But with respect to Nielsen Soundscan data, the author notes:

although today’s hits may no longer reach the sales volumes typical of the pre-piracy era, an ever smaller set of top titles continues to account for a large chunk of the overall demand for music.

If individual hits decrease in popularity, but an ever smaller set of top titles continues to account for the same large share of demand, than a linear popularity model doesn’t describe well what’s happening. Perhaps the decrease in sales volume for hits (individual best-sellers) refers to a decrease in the overall demand for (commercially sold) music.

The aggregate characteristics of persons’ choices among a nearly infinite set of symbolic goods isn’t well-understood. But the importance of such choices clearly is increasing. As is conventional, I’ll end this post with a call for more research, and for more support for regulators.

* * * * *

Notes:

[1] See De Vany, Arthur S. Hollywood Economics: How Extreme Uncertainty Shapes the Film Industry. Contemporary political economy series. London: Routledge, 2004, and Taleb, Nassim. The Black Swan: The Impact of the Highly Improbable. New York: Random House, 2007.

[2] See Matthew J. Salganik, Peter Sheridan Dodds, and Duncan J. Watts, “Experimental study of inequality and unpredictability in an artificial cultural market” Science, 311, 854-856 (2006).

[3] Extensive marketing and promotion may be necessary for traditional-media blockbusters, but it is not sufficient. See, e.g. De Vany (2004).

[4] Viewed as a distribution of popularity shares, a power-law approximation for item popularity vs. item rank has only one free parameter. The minimum item rank is necessary one (the most popular item) and the total popularity shares must sum to one. But remember that an approximating line is a model, a tool for analysis, a means for organizing fruitful comparison and discussion. The slope of a linear approximation to the popularity distribution for a range of items of practical interest seems to me to best serve this purpose.

[5] Salganik, Dodds, and Watts (2006).

[6] See Anita Elberse, “Should You Invest in the Long Tail?” Harvard Business School Review, July-Aug. 2008.

[7] This and subsequent quotes are from Elberse (2008).

Leave a Reply Cancel reply