Sunday, August 23, 2015

The Inventory Problem

We don't quite know how many cells are in the human body, because it's a hard problem, but we estimate it to be around two to the fiftieth. But, even if we had a 'better number', we know that the human body is more than "just cells" ... and of course cells are more than "just molecules", molecules are more than "just atoms", and atoms are more than "just energy". We know that there is a human psychological tendency, and a strong tendency among people who think they are being 'scientific', to assert that a complex system consists of "nothing but" some studied factor in its makeup. You know the sort of thing: the human body is just water and chemicals, biology is nothing but physics, etc. We used to call this 'reductionist' and 'materialist', but I think that's giving too much credit to what amounts to blind dogmatism among people who have no sense of just how little we understand about the universe.

What's the higher-order structure 'above' the level of cells? Well, it's just anything we call a 'system' that we believe is interesting. When we explore the real body we find that the boundaries and the coherence of our chosen 'interesting system', say a kidney or an immune system, are not what we expected. But, then, we should expect such surprise. Humans perceive certain things in certain ways, often in several conflicting ways, and when we decide to look at 'the visual system', or 'the nose', we are making use of some still mysterious human faculty that assigns importance to certain aspects of its environment. On examination, we are always surprised, because our unexamined intuition tends to be wrong. In fact, even our attempts to divorce ourselves from our intuition tend to be heavily suffused with human tendencies. It's a tough game to find out what's going on outside of your own perception, to turn the things your intuition 'knows' into mysteries about what's 'out there'. But that's natural science. It's hard work, especially when we're dealing with complex systems.

This means there really is no ontology of the body which is more than a kind of convenience, so that we can talk about what we're studying. Ontology is important in science, because these are our current assertions about what exists in the world. Of course they are constantly changing, and much of it is based itself on tacit knowledge that we do not understand, which is why I don't think many kinds of science make sense without a parallel study of human psychology. At any stage, our assertions are still human mental constructs. They may be more enlightening, better integrated with other theories, or more carefully constructed to avoid unnecessary stipulations, but they are only 'better'. They aren't 'complete'. Ontologies are works-in-progress, at best.

This epistemological story can be found everywhere in computing. Let's take the issue of testing a software system. In our example, let's say that we primarily care about a consistent  user experience, and so the tests take place against the user interface. What is the inventory of features against we we are testing? It certainly is not the set of features we set out to build in the first place: in order to make a good product, we had to change these. The closest thing to an accurate description of the final system is the work done by the documentation team. If you have such a thing. The team has used human judgement to decide what is important for someone learning about the system. They have organized what they consider to be the 'features' of the system, and explained their purpose and behaviors, as best as they could. This is the closest thing the software company has to an inventory of features and properties against which a QA team can build a testing system. In a system where the interface is everything, and there are a lot of systems like that, and a lot of systems that should be considered like that but aren't, the only way to build reasonable tests is after-the-fact. There is a discipline called 'test-driven development', but this is only appropriate to certain internal aspects of the system, it cannot address the 'logic' that is 'externalized' for the users. There is no such logic in the code. It's a perception of the system, used to guide its development.

If this is true, there is no way to take a 'feature inventory' from within the software. The best one can do is study the user-interface, find out how it responds, talk to developers and product designers to work out their intentions when they're unclear, and keep a coherent-looking list that is easy-to-understand. This is literally not an inventory in any mechanistic sense. It is a thorough set of very-human judgments upon something that others have created.

The 'inventory' will be acceptable, and have descriptive adequacy, when the appropriate group of people can understand it. This might be a very different inventory for a quality-assurance team than for a training team or a support team. There are things the designers and the engineers find important that produce yet another 'inventory'. There are other kinds of inventories, for accessibility issues. The best you can do, in all these cases, is the most human job you can do, to explain the right things to the right audience. The idea that there is any kind of 'logically correct' software, achievable without human judgement, is absurd. A person needs to judge what is correct! We couldn't do any of this work without human judgment. 

Because of this epistemological fact, we rarely have the time for inventories of features. Instead, we look to eliminate 'problems', humanly judged, and polish the software system until it makes sense and does what the team and the users want it to do. The task of describing it, of explaining it, is done in a minimally descriptive way, taking advantage of innate and learned human understanding, and the ability of users to explore things for themselves. The quality-assurance team finds some set of tests that satisfies them, tests for problems that have been fixed, regression tests to make sure the problems don't recur. The notion of a 'complete' description of the system is considered 'just too much work', when, in fact, a 'complete description' is impossible, because such a description cannot exist, it can only be adequate to our current purposes.

This epistemological problem shows up in simpler ways. One approach to preventing virus infection in computers is to add to a growing 'blacklist' of behaviors or data that indicate an 'infection'. The other approach is to make a 'whitelist': only these operations should be possible on the system. The list is only expanded when you want to do something new, not when someone else wants to attack you. This is like avoiding the inventory problem.

Even more, it's reminiscent of the difference between natural science and natural history. Natural history, zoology in its older form, and even structuralism, are about cataloguing and classifying things in nature. Explain why things are the way they are? That's natural science.  In derogatory terms, natural science looks to generalize and idealize and abstract, ignoring as many differences as possible. Natural history embraces diversity, and is more like butterfly collecting-and-organizing. In general, we need integrated approaches that allow for collecting diverse facts in the context of an an ever-improving explanatory theory. 

Approaches to building software are ever-expanding, and we are spending no effort trying to understand why, primarily because computer science is not a natural science, and doesn't approach the problem of explaining why things are one way, and not another. Most of the answers to those questions lie in a study of the human mind, not in a study of the machines that humans build. Studying software without studying cognition, is like studying animal tracks without studying the animals.

Thursday, August 20, 2015

The Wrong Tools

We are using the wrong tools to program, and the wrong criteria to judge good programs, and good programming practices. These bad practices, and bad approaches to thinking about the nature of programming, have emerged together over the last 70 years.

Our first mistake is the emphasis on code itself. I understand how high-level languages can seem very empowering, and so it seems natural that 'polishing code' is a means of achieving quality, and 'code standards' are means to improve group collaboration.

But even though these are accepted practices, they are not correct. When we make any improvements at all, we are not actually following these practices. The issue of what we are doing is not even a topic, and it's not examined in any kind of computing institution, academic or industrial. This is true despite the fact that everyone with any experience or sensitivity is absolutely certain that there's some fundamental expressiveness problem, on which they can't quite put their finger.

Let's say that code has two purposes.

On one hand, we have built machines and programs that can read the code, which does something, based on various kinds of agreements, mostly unspoken, mostly not understood, not studied, not explicit, and incomprehensible, but which maintain an illusion of explicitness, precision, consistency, and predictability -- probably because there are symbols involved, and we instinctively tend to respect symbols and construe them as meaningful in and of themselves. 

The other purpose of code is to express and explain your thoughts, your hopes, and your desires, to yourself, and to your colleagues, specifically regarding what you would like this system of loose engineering agreements to do in all kinds of circumstances.

At the heart of both of these 'uses of code', the operational and the expressive, are human meaning and ideas. These are not understood by the machine, in any sense. We take subsets of human ideas and create "instructions" for "operations" in the machine that in some way are reminiscent of these ideas, usually highly-constrained in a way that requires a great deal of explanation.

That's on the operational side! This is just as true on the expressive side, where we have new ideas that we are trying to express in highly-constrained ways that still can be read by humans, on these interlocking systems and platforms of strange high-constraint ideas. And of course -- most of you can guess -- these "two purposes" of code really are the same, because most programmers build various layers of code that are essentially new machines that define the operational platform on which they then express their application ideas.

Which means that the code is the least important part of computing. The mind-internal meaning of all these constraints needs to be explained so that they can inspire the correct meaning in the minds of the humans taking various roles relative to various parts of the software. 

Without explanation, code is meaningless. Without human minds, symbols are meaningless. Code is "operational", so we are fooled into thinking the meaning is 'in the machine'. But that meaning is also in the heads of people, who either made the machines or use them.

If this is true, then good explanation -- explanation that is genuinely useful, which genuinely makes the life of people involved easier and better -- is the heart of computing. This needs to be recognized, emphasized, and facilitated. Code is merely a kind of weak shorthand that we use, badly, to pass hopeful, incoherent indications to artifacts that other people have created.

Existing formal languages and their tools -- based on a uselessly-constrained approach to symbolic representation -- are woefully inappropriate for this, based as they are on a rather trivial formal definition of computation, which has been accepted because of the rather amusing belief -- no more than an unexamined dogma -- that anything can be 'precisely described' with symbols, boolean logic, parameterized functions, and digital representations. None of this is "true", or even sensible, and the belief in these dogmas show how far computing is from the natural sciences.

In the meantime, we need new programming tools with which we can more completely express new concepts, and more easily express our feelings and desires.