Prowling the Digital Forest: 2008-04

2008-04-22

Legislative Programming

Don Knuth's Literate Programming has shown us how to match the way a program's entrails are documented with the way the programmer thinks of them, time-wise. When the programmer thinks well, this can result in better programs than what would happen when the programmer would strictly follow the language-compelled structure. Unfortunately, the original WEB's paradigm comes from a simpler era, when a program was just a program, and the programmer's primary work was to write the program.

But within the intervening 30 years, the scope of a programmer's work has widened considerably. Programmer doesn't just write The Program anymore; programmers develop systems via applying the art of programming. This means they also:

design inter-program interfaces
write add-ons to existing programs
even write throwaway code for the express purpose of generating "real" code
construct white box test cases

All of that work is a part of programmer's work, may introduce complicated constructs that should be documented, and merits documented along with The Program itself. Thus, while Knuth's original intention --

The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reinforce each other.

-- is good, it is not complete enough.

Given that the scope of a programming job is considerably wider than the construction of a source file, I believe the scope of a literate programming document should be expanded to transcript of whatever the programmer needs to do to resolve this problem. The main difference from a mere log of programmer's activities would be that the document wouldn't need to describe false starts or bugs in detail, although it could caution against the more devious ones. Because such a document could easily be thought of in the style of An Act of the College of Programmers to Develop a System to Gleefully Greet the Globe, I'm jokingly calling this Legislative Programming for now, but a better name will probably be found eventually.

A legislative programming system should be capable of both tangling and weaving, as is Knuth's original system and almost all of its known descendants. However, it should go somewhat further:

It should allow specifying how external commands -- such as the C translator -- are to be invoked. Integration with make or an analogous tool might be desirable. This would define a process of executing the job; the results of such an execution would be all the files the programmer sought to produce -- including the executable, possibly within a container such as JAR or WAR for Java projects.
It should allow specifying several text files in various languages. Many WEB's descendants, such as noweb, already do this.
While it's essential that the programming job be weavable into a single, all-encompassing architectural document, the system should also support creation of daughter documents which should, of course, appropriately appear in the architectural document. Such documents should be used to document specific interfaces of the software system in question -- such as the command line, any configuration parameters or extensibility.
It should provide a convenient approach for building on earlier projects. WEB's change mechanism solves a special case of this problem, offering a way to adapt the underlying system to specific platforms. When a Program Is Just a Program, but there are many systems, this is appropriate. However, by 2008, everybody and his grandma runs some sort of POSIX/Java/ANSI C capable system, which provides for fairly common shared platforms, so adaptation is no longer the primary way of code reuse. Nowadays, the primary ways of code reuse are software extension and aggregation, and neither is easy (or even appropriate) to do via WEB's change mechanism.
It should provide a convenient approach for building on later projects. For example, consider Forth, a language commonly used for software extensibility. Let's suppose I've built a framework providing the common Forth services. A later project exploiting this framework would necessarily want to add new variables, words or perhaps a new interface to virtual files. To that end, the framework project should be structured to use clean interface for such new definitions, and it's the responsibility of the legislative programming system to facilitate this. The best way to attain this I currently know involves treating every 'programming job' as an object of a Self-like programming language, and defining composition and prototype-based "inheritance" on these objects.

In Knuth's system, there is a considerable emphasis on prettyprinting. While it's a useful feature, I do not consider it essential, as its influence on the way the programmer works is not very significant. Furthermore, supporting a wide number of languages makes proper prettyprinting expensive to implement.

I used to consider it important to reveal the structure of the literate programming 'source' file through a DOM-like API to client code, but experience has now convinced me this, too, is a luxury of minor practical utility. Besides, most benefits of such a system can be achieved by a good throwaway script support.

2008-04-17

How to fix Wikipedia, part 1: edit wars

(This post is a part of the series How to fix Wikipedia.)

What's the problem?

From the horse's mouth:

Edit warring occurs when individual editors or groups of editors repeatedly revert content edits to a page or subject area. Such hostile behavior is prohibited, and considered a breach of Wikiquette. Since it is an attempt to win a content dispute through brute force, edit warring undermines the consensus-building process that underlies the ideal wiki collaborative spirit.

Analysis

The necessary ingredients for a sustained edit war are:

There are two or more persons or factions with strong yet conflicting beliefs about a particular topic.
There's only one article per point of conflict to contain the belief.
The persons or factions involved persist.
Wikipedia's policies deliberately lack a terminal resolution mechanism for such conflicts.

Removing any of these four factors would make edit wars unsustainable. For example, removing The Other Factions through blocks and bans would guarantee a particular side a victory, and end the particular edit war. (This is a popular pastime for many Wikipedians, and a fundamental reason for most of the Wikipedia politicking.) Of course, this solution has the important downside of reducing the pool of Wikipedians.

Interesting previously used solutions

As I pointed out above, many people on Wikipedia like to use the solution of getting The Other Faction blocked or banned.

A more "orderly" approach sometimes used on Wikipedia is page protection. A well-known quip is that protection results in the Wrong Version of the page remaining unchanged and unchangeable for extended periods of time.

Some people have forked Wikipedia in its entirety, trying to retain the mission, impose their additional guidelines or rules, and replace the content. Consider Conservapedia and Veropedia, for example.

Citizendium is another fork that involves thorough redesign of the underlying sociopolitical system.

My solution

I believe the best way to permanently end edit wars, with the least actual downsides, is to discard the concept of article uniquity, and replace it with a fork-at-will scheme. This would mean:

For every article there is, every user can create a new version he believes is improved over the previous version.
Such a new version would not replace the previous version but appear parallel to it.
Every reader would be able to see all the outstanding branch versions.

An obvious problem with implementing only this bare minimum is that there are a lot of articles, and many of them have a lot of versions -- and this can easily lead to information overload. This can be tackled in two ways:

Machine-assisted comparison: the engine should provide a clear and readable overview on how do the versions differ.
Personal approvals and network trust: users should be able to indicate which version (or which versions) they believe is best. Furthermore, the engine should enable automated processing of the social trust network by allowing users to delegate preference to other people whom they trust to make good decisions. This formal preference should give the engine data it could use to prioritise the branches by relevance, and according to the particular user's preferences.

This would result automatic formal factionalising on all controversial topics, allowing each side to work on a version of their preference on any article without disruption from other sides. Furthermore, because the trust network is formalised and accessible to the engine, the engine might be able to assess the sizes of various factions, the similarities and differences between various people, and even match people with similar understandings on What Should Be in the Wiki. Like Amazon's "Other people who bought these books also bought ..." feature, the wiki engine could offer users "Other people who approved these versions also trusted these people ..." feature.

Solved issues

This approach obsoletes page protection and replaces it by withholding approval of new version. If such withholding is automatic unless explicit (although possibly indirect) trust has been invoked, it also obsoletes the weird social construct of vandalism patrol as nobody would approve random vandalism (or if they did, it would be easy to distrust them). Of course, organised trolling groups such as GNAA might be able to pull of more notable things, but again, it would be easy to keep them off your network of trust. In essence, every singular troll would become a faction of one, and organised trolls would just become their own factions.

This approach allows the users the benefits of their information being properly gatekept, but does it without forcing any gatekeeper and his biases on any user. Everybody who is unsatisfied with the articles there are could become a gatekeeper of their own, or nominate a gatekeeper for them.

One persistent kind of conflict in Wikipedia is the continuous battle between the so-called 'deletionists' and 'inclusionists'. One camp wants to enforce a high bar of entry for Wikipedia's articles; the other, in turn, wants to enforce a very low bar of entry. If we recognise "this article shouldn't appear as all" as one possible versions, allowing people to "prefer not having an article", we should be able to make both inclusionists and deletionists if not happy then at least content.

When Wikipedia started, it imported the bulk of the 1911 edition of Encyclopedia Britannica. This has had a number of interesting consequences, and some problems. Similarly, when Citizendium started, it pondered for quite a while on whether to start from a carte blanche or import the then-current state of Wikipedia. A wiki engine conducive to branching would solve this dilemma by offering the possibility of including any earlier encyclopedic work as one fork. Where, say, the 1911 Encyclopedia of Britannica is still adequate, people could check it over and approve its version; where it isn't -- such as nuclear physics --, new articles could be written, and the knowledgeable people's approval of these over the old encyclopædia.

Downsides

Fork-at-will will lead to compartmentalisation among people, not consensus. On its face value, it's true that such a proposal will enable high level of compartmentalisation or many parallel Walled Gardens. However, the ability to fork is not the cause of that.

We already live in a world of many, sometimes conflicting, beliefs. There have already been forks of Wikipedia. It appears it's unreasonable to expect for, say, a staunch Muslim and a devoted Catholic to find a consensus on the position of Jesus' status. At best, they may be able to agree to disagree, but this is not always the case -- especially on Wikipedia. Forking-at-will will facilitate such agreements to disagree, by allowing people of different beliefs to have their own -- necessarily different -- wiki in the same space.

Furthermore, when it is possible for different factions to work out a consensus -- for example, by developing a joint version of an article that would discuss all involved sides' approaches --, I fully believe they'd do it even with fork-at-will. (For example, it's not inconceivable that a Catholic and a Presbyterian will find a consensus on Jesus' status, even if they differ widely on the relevance of Mary's sainthood.) Thus, I wouldn't say that fork-at-will leads to new compartmentalisation; instead, it will provide a new layer of flexibility.

Conspiracy theorists, pseudoscientists, and other fringers will be able to push their message. To an extent, this is true. However, to an extent, this is also an inevitable aspect of crowd-sourcing. In consolation, the non-conspiracy theorists will always be able to disapprove of conspiracy theories, and will not have to suffer from them. For pseudoscience, I anticipate a science team arising to check out on science articles, approve the good ones, disapprove the bad ones, and allow other people to trust them. Should -- hypothetically -- the team somehow be overtaken by, say, homœopaths or other woos, the science-minded people's trust in them would wane, and be carried over to a new and better science team.

Unsolved issues

How best represent trust? I believe this is the issue that will require most further thought (and careful experimentation) in this whole solution. In the beginning, I would give each user the buttons "I consider this version good" and "I consider this version bad" as well as "I expects this user's new edits will be good/bad" and "I trust this user to have good judgement on article goodness". First further developments would allow users to form groups, to trust groups, and to limit trust to particular topics ("I trust this user to have good judgement on article goodness for astronomy-related articles"). Further developments should depend on how well this works in practice.

What to offer to random passersby? One of the strong points of Wikipedia is that it makes a handy one-stop link for bloggers, making its articles very high among most search engine's results. This strength depends, to a considerable degree, on article uniquity, and discarding this uniquity may thus handicap the solution. However, considering that we don't have to discard the uniquity of article topics but only of the best versions, I suspect this problem can be overcome in a combination of two approaches. First, as I mentioned before, the software should be able to easily point out the differences between major versions. (I admit that 'major' is somewhat vague here, but 'approved by largest amount of users' should make a good first approximation until we figure out a better approach.) Second, once we have groups of users and group approvals, it'll be easy for the link author's to specify which group's version he wants to link to. Thus, WingNutDaily could always link to the Conservapedia-like version of the article on, say, evolution.

Social issues

If the basic mission of Wikipedia is to be followed, initial groups on natural sciences, arts, and history should be set up. Of course, the spirit of non-exclusivism will allow other groups of similar scope to also be formed. For example, there will almost certainly be a general pseudoscience and ufology group.

Coöperation between various factions will still be useful, but since non-exclusivism will avoid forcing it on people, the social system may have to provide stronger incentives towards coöperation.

If the system becomes popular, there will probably be quite a number of groups of different structure.

Depending on the way groups are formed, some may become suspectible to hostile takeovers. Experience will hopefully teach us how to deal with this danger.

How to fix Wikipedia, part 0

Wikipedia being broken beyond repair, it can't be fixed, per se. However, a thorough understanding of how exactly it is broken may aid people in developing a better replacement.

I have studied a number of incidents, and I think I now understand a large part, although not yet all, of the problems that make Wikipedia suck. For some, I can recommend remedies; for some -- not yet. Although a lot of the problems are social or epistemological (or misological, as the case may be), and not technical, my focus is on technical solutions. Some of them should be able to fix some of the social problems, but there are still issues worth considering.

I'm planning to cover roughly these topics:

Edit wars
Class conflicts
Misgovernance
Cabalism
Scope of crowd-sourcing
Lack of gnomes
Fringe faiths
The fallacy of consensus
Systemic misology
Consistency

However, I'm not yet done with all the writing, and the structure may yet to change. This post should become an index, though.

2008-04-08

How to Buy a Standard in 10 Days

A nice overview of the whole OOXML saga.

Prowling the Digital Forest