Hill Climbing in “Feature Subspace”
Looking at the search for profitable products as a Multi-Armed Bandit (MAB) problem illuminates the general complexity of the firm’s challenge (see previous posts in this series: one, two, three). But in terms of analyzing specific firm behaviors, I think it’s important to acknowledge that we don’t have a pure MAB here. It seems pretty clear there’s more causal structure in Market Space.
Here’s how I visualize the situation. V(F) produces a fitness landscape over Feature Subspace. This idea of applying evolutionary dynamics to the economics of innovation is far from original. Here’s a whole blog devoted to the concept, complete with a very nice illustration:
In my model, the x and y axes would be a low-dimensional representation of Feature Subspace. The z axis would be V, the value function (aka, the “fitness function”). But this fitness function is stochastic; I imagine each of the points having a little one-armed bandit on it, with each bandit’s payoff proportional to its altitude.*
This situation differs fundamentally from the classic MAB because the payoffs of neighboring machines aren’t independent. If a particular machine has a high payoff, the machines near it are more likely to have high payoffs too. This property gives the firm some leverage in its search. If it can establish that a particular bandit has a high payoff, it knows that the bandits around it are also promising prospects. Moreover, if it can estimate the gradient around bandit with positive payoffs, it can “hill climb” to find the highest-payoff bandit in the vicinity.
Unfortunately, the landscape isn’t static. Remember that it’s structure derives from all the non-feature variables in Market Space: production technologies, customer preferences, competitor behavior, partner behavior, government regulations, environment characteristics, etc. These other variables evolve over time and the firm cannot predict or control them very well. I couldn’t find a good animation, but imagine the peaks in the above diagram gradually sinking and new peaks gradually rising.**
“Climb” as a Strategic Option
Moreover, there’s the famous problem of getting stuck at a “local maximum“: a firm hill climbs to the top of a peak but doesn’t realize there’s a higher peak just across a valley. Remember, when it searches the terrain, it has only some information about other bandits. I envision a thick fog across the landscape. So the firm can sort of feel the incline of the terrain and may catch a glimpse of what appears to be a far away peak if the fog clears momentarily (or maybe it’s just a mirage), but it doesn’t have a satellite view.
So unlike the classic MAB, the choices aren’t just “earn” and “learn”. “Learn” actually comes in two rather different flavors, “climb” and “explore”. With “climb”, it’s estimating the local gradient based on its internal model of the terrain and evidence from recent outcomes. With “explore”, it’s launching a probe into pretty much unknown territory. The firm has to allocate its scarce resources across all three activities every period.
Given this problem framing, I have some initial high level thoughts.
- With three possible classes of action instead of two, I’m somewhat confident that the firm’s problem as I’ve described it is at least as complex as the PSPACE hard “restless bandit” (though my math isn’t good enough to prove it). So it seems rather unlikely that any firm will be able to reliably compute the optimal allocation, even with very sophisticated systems and procedures. Instead, a la Nassim Taleb, firms will likely rely on heuristics.
- The success of different heuristics will depend substantially on the topology of local peaks: how common they are, how high they are, how steep they are, how fast they sink or rise, etc. So successful heuristics may not be very portable. Strategies that work in one industry or firm may not work in another. Similarly, strategies that worked at one time in a given industry or firm may not work in the future.
- “Earn”, “climb”, and “explore” have different risk profiles. There’s some risk to earn, due to the shape of the payoff distribution at that point and the potential for that distribution to change. Climb has significantly greater risk than earn, due to both measurement error of and dynamic change to the local gradient. Explore has significantly greater risk than climb because prior information is extremely diffuse. These differences present a thorny issue for compensating employees. If one person’s job involves aspects of more than one class of strategy or if social norms inhibit vastly different compensation schemes across workers implementing different classes of strategies, the firm will find it difficult to incentivize optimal risk taking.
- Specialization could help firms overcome the problem of appropriate risk incentives. It would also allow them to accumulate capital specific to different classes of action. So it seems likely that some firms might attempt specializing to some degree in earn, climb, or explore. This interpretation is certainly consistent with the rise of outsourced manufacturing. The brand focuses on climb and explore, while the contract manufacturer focuses on earn and climb.
- It seems like the rate of change in the landscape could be extremely important. It would probably take some very heavy duty math and/or simulation to confirm, but my suspicion is that a higher rate of change decreases the relative payoff of earn and climb versus explore. If true, this result would help explain why the lifespan in the S&P 500 is declining.
These general insights “feel” like they match my real world observations of firms. So I’ll press on and explore specific implications for small startups and large enterprises in the next two posts.
* Formally, I think we could model this situation by saying that V(F) is a random variable described in turn by the vector-valued function S(F). Some of the S vector’s values would characterize the functional form of the V distribution, while others would describe this distribution’s parameters. Alternatively, as an approximation, the vector could simply be the moments of V.
** Formally, I think we would express the dynamic landscape by adding a time term. So we’d have V(F,t), a random variable whose parameters are determined by S(F,t).