"Sparse coding" (SC), cf. Olshausen & Field, is a widely embraced and, at some level, clearly correct principle of intelligence. However, virtually all instantiations of SC are couched in terms of optimization and involve adding a sparsifying term to the objective function. Despite its popularity and successes, this means that computation must be expended specifically for sparsification on every evaluation of the objective, during learning and inference.
Wouldn't it be far more efficient if zero computational time / power were expended explicitly for achieving / enforcing sparsity? In other words, why not structurally (architecturally) impose a fixed sparseness? There is one model (literally, one model, to my knowledge) that does this. It is the sparse distributed representation (SDR) based model, Sparsey, in which the SDR coding field consists of Q (e.g., Q=100) WTA modules, each having K (e.g., K=20) binary units. Thus, every code that ever becomes active in the coding field, during learning or inference, consists of exactly Q active units (out of QxK). No computation is ever expended in handling/enforcing sparsity for the lifetime of the system. Sparsey's learning method discovers the simiilarity/statistical structure of the input domain, via an unsupervised, single-trial learning protocol. In fact, that statistical structure is automatically embedded as the pattern of intersections of the SDR codes, as a computationally free side effect of the act of storing memory traces of the individual inputs experienced. It is not an optimization approach.
Whether or not Sparsey is the only model that requires no time/power expenditure for enforcing sparsity (and thus ultimately for compression), the larger point is simply that structural enforcement of sparsity is a potentially rich resource to exploit, and one which to my knowledge is absent from the mainstream deep learning literature.
There is a deeper point here. The intent of SC is to force the model towards more compressed explanations of the data, i.e., smaller numbers of latent variables (cf. principal components, factors, causes). However, to my knowledge, all current SC models are in fact localist models, i.e., they end up with 1-to-1 associations of latent variables to units. That is, while whole inputs are typically described as having distributed representations, the latent variables underlying / composing those whole inputs are in fact represented localistically. Therefore, I would claim that the representations learned by existing SC models are not distributed but rather, compositional, i.e., distributedness is being conflated with compositionality. In its original intention, distributedness meant sub-symbolic. However, the latent vars, or factors, produced by SC, correspond to exactly the kinds of concepts/regularities to which names, i.e., symbols, are typically assigned in language.