Prowling the Digital Forest

2009-12-15

Abkhazian independence marches on

Today, the independence ambitions of Abkhazia and South Ossetia were further validated when Nauru recognised their sovereignty for Russian humanitarian aid worth $50M, Telegraph writes. This brings the count to a whopping four after Russia, Venezuela and Nicaragua.

Nauru is known for its economic difficulties ever since it mined off and sold all its guano, and has been trying to find alternative sources of revenue for decades.

2009-11-20

Bullypedia

Interesting blog post by Gene McKenna: Bullypedia, A Wikipedian Who's Tired of Getting Beat Up. The topic is Wikipedia's treatment of newbies, and some supporting evidence has been collected in this on-Wikipedia experiment.

In a related item, I asked a few friends whether and how the Kick Ass Curve, a construct originally proposed for assessing coolness of programming languages, could be applied to people's Wikipedia careers. An insightful response came from Alexia Death, and here it is:

Nope. Wickipedi[a] does not start with "this sucks" phase. It starts with "WOW, awesome. I can do things here!" evolves to "Ok, there's some catches, but I can cope." and ends in "This place is f*cking nuts. I quit."

Evidently, Wikipedia's culture has descended into a Manichean battle between "us defenders of Wikipedia" and "them newbies, presumed vandals". Because only people who aren't chased away speak up in Wikipedia, there is a bias encouraging misidentification of newbies as vandals and chasing them away before they become visible. Moreover, people who chase away newbies get socially rewarded for "standing up to them vandals".

2009-10-21

Forking is one of the earliest traditions of Wikipedia

It is news to many people, but Wikipedia has been forking from the very beginning. One of the earliest forks, and a reasonably successful one at that, is the Enciclopedia Libre Universal. The ELU forked from a Spanish Wikipedia in early 2002, about a year after Wikipedia's official launch, and while people on both side of the chasm have been occasionally raising the discussion about merging, it has so far remained only an object of discussion.

It might be illuminating that Wikipedia's related article, [[Enciclopedia_Libre_Universal_en_Español]], mentions as the first factor that "may contribute to a merger" the creation of Wikimedia Foundation. This event happened years ago in the middle of 2003 and, interestingly, may have been prodded by ELU's split as can be seen from an old speculative list post and Mister Wales' followup confirmation.

2008-09-25

Implementing imperation in Prolog

Two blog posts: this presents the core of a simple Algol-like language implemented in Prolog, and this attempts to fix errors in it.

2008-05-29

Another step towards human-faced Java: single binary

I've found a nice way to produce classical 'single binary' applications developed in Java without sacrificing their portability.
Java apps are typically packaged as JAR files. Format-wise, a JAR is actually a ZIP. ZIP file format is end-anchored; a ZIP file remains processible if random bytes are prepended to it.
Therefore, if we package a Java application as a standard standalone JAR file, make sure the Main-Class manifest attribute is properly set up, and prepend these two lines to the file:

#! /bin/sh
exec java -jar "$0" "$@"

we get a file that is, at the same file, a JAR containing all the original content, and a shell script that, when executed, invokes JVM instructing it to then interpret the JAR as a standard Java app. It also passes on all the original arguments and the exit code of the JVM, as expected.
The result behaves pretty much like a standard binary. The primary restrictions are that JRE needs to be installed in a PATH-findable way, and suid doesn't work on such pseudo-binaries on most systems. The primary advantage is that such a binary does not depend on the host architecture or even the OS (as long as it is a Unix, that is). An interesting secondary advantage is the standard resource handling mechanism inherent in the concept of using JARs for applications.

2008-04-22

Legislative Programming

Don Knuth's Literate Programming has shown us how to match the way a program's entrails are documented with the way the programmer thinks of them, time-wise. When the programmer thinks well, this can result in better programs than what would happen when the programmer would strictly follow the language-compelled structure. Unfortunately, the original WEB's paradigm comes from a simpler era, when a program was just a program, and the programmer's primary work was to write the program.

But within the intervening 30 years, the scope of a programmer's work has widened considerably. Programmer doesn't just write The Program anymore; programmers develop systems via applying the art of programming. This means they also:

design inter-program interfaces
write add-ons to existing programs
even write throwaway code for the express purpose of generating "real" code
construct white box test cases

All of that work is a part of programmer's work, may introduce complicated constructs that should be documented, and merits documented along with The Program itself. Thus, while Knuth's original intention --

The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reinforce each other.

-- is good, it is not complete enough.

Given that the scope of a programming job is considerably wider than the construction of a source file, I believe the scope of a literate programming document should be expanded to transcript of whatever the programmer needs to do to resolve this problem. The main difference from a mere log of programmer's activities would be that the document wouldn't need to describe false starts or bugs in detail, although it could caution against the more devious ones. Because such a document could easily be thought of in the style of An Act of the College of Programmers to Develop a System to Gleefully Greet the Globe, I'm jokingly calling this Legislative Programming for now, but a better name will probably be found eventually.

A legislative programming system should be capable of both tangling and weaving, as is Knuth's original system and almost all of its known descendants. However, it should go somewhat further:

It should allow specifying how external commands -- such as the C translator -- are to be invoked. Integration with make or an analogous tool might be desirable. This would define a process of executing the job; the results of such an execution would be all the files the programmer sought to produce -- including the executable, possibly within a container such as JAR or WAR for Java projects.
It should allow specifying several text files in various languages. Many WEB's descendants, such as noweb, already do this.
While it's essential that the programming job be weavable into a single, all-encompassing architectural document, the system should also support creation of daughter documents which should, of course, appropriately appear in the architectural document. Such documents should be used to document specific interfaces of the software system in question -- such as the command line, any configuration parameters or extensibility.
It should provide a convenient approach for building on earlier projects. WEB's change mechanism solves a special case of this problem, offering a way to adapt the underlying system to specific platforms. When a Program Is Just a Program, but there are many systems, this is appropriate. However, by 2008, everybody and his grandma runs some sort of POSIX/Java/ANSI C capable system, which provides for fairly common shared platforms, so adaptation is no longer the primary way of code reuse. Nowadays, the primary ways of code reuse are software extension and aggregation, and neither is easy (or even appropriate) to do via WEB's change mechanism.
It should provide a convenient approach for building on later projects. For example, consider Forth, a language commonly used for software extensibility. Let's suppose I've built a framework providing the common Forth services. A later project exploiting this framework would necessarily want to add new variables, words or perhaps a new interface to virtual files. To that end, the framework project should be structured to use clean interface for such new definitions, and it's the responsibility of the legislative programming system to facilitate this. The best way to attain this I currently know involves treating every 'programming job' as an object of a Self-like programming language, and defining composition and prototype-based "inheritance" on these objects.

In Knuth's system, there is a considerable emphasis on prettyprinting. While it's a useful feature, I do not consider it essential, as its influence on the way the programmer works is not very significant. Furthermore, supporting a wide number of languages makes proper prettyprinting expensive to implement.

I used to consider it important to reveal the structure of the literate programming 'source' file through a DOM-like API to client code, but experience has now convinced me this, too, is a luxury of minor practical utility. Besides, most benefits of such a system can be achieved by a good throwaway script support.

2008-04-17

How to fix Wikipedia, part 1: edit wars

(This post is a part of the series How to fix Wikipedia.)

What's the problem?

From the horse's mouth:

Edit warring occurs when individual editors or groups of editors repeatedly revert content edits to a page or subject area. Such hostile behavior is prohibited, and considered a breach of Wikiquette. Since it is an attempt to win a content dispute through brute force, edit warring undermines the consensus-building process that underlies the ideal wiki collaborative spirit.

Analysis

The necessary ingredients for a sustained edit war are:

There are two or more persons or factions with strong yet conflicting beliefs about a particular topic.
There's only one article per point of conflict to contain the belief.
The persons or factions involved persist.
Wikipedia's policies deliberately lack a terminal resolution mechanism for such conflicts.

Removing any of these four factors would make edit wars unsustainable. For example, removing The Other Factions through blocks and bans would guarantee a particular side a victory, and end the particular edit war. (This is a popular pastime for many Wikipedians, and a fundamental reason for most of the Wikipedia politicking.) Of course, this solution has the important downside of reducing the pool of Wikipedians.

Interesting previously used solutions

As I pointed out above, many people on Wikipedia like to use the solution of getting The Other Faction blocked or banned.

A more "orderly" approach sometimes used on Wikipedia is page protection. A well-known quip is that protection results in the Wrong Version of the page remaining unchanged and unchangeable for extended periods of time.

Some people have forked Wikipedia in its entirety, trying to retain the mission, impose their additional guidelines or rules, and replace the content. Consider Conservapedia and Veropedia, for example.

Citizendium is another fork that involves thorough redesign of the underlying sociopolitical system.

My solution

I believe the best way to permanently end edit wars, with the least actual downsides, is to discard the concept of article uniquity, and replace it with a fork-at-will scheme. This would mean:

For every article there is, every user can create a new version he believes is improved over the previous version.
Such a new version would not replace the previous version but appear parallel to it.
Every reader would be able to see all the outstanding branch versions.

An obvious problem with implementing only this bare minimum is that there are a lot of articles, and many of them have a lot of versions -- and this can easily lead to information overload. This can be tackled in two ways:

Machine-assisted comparison: the engine should provide a clear and readable overview on how do the versions differ.
Personal approvals and network trust: users should be able to indicate which version (or which versions) they believe is best. Furthermore, the engine should enable automated processing of the social trust network by allowing users to delegate preference to other people whom they trust to make good decisions. This formal preference should give the engine data it could use to prioritise the branches by relevance, and according to the particular user's preferences.

This would result automatic formal factionalising on all controversial topics, allowing each side to work on a version of their preference on any article without disruption from other sides. Furthermore, because the trust network is formalised and accessible to the engine, the engine might be able to assess the sizes of various factions, the similarities and differences between various people, and even match people with similar understandings on What Should Be in the Wiki. Like Amazon's "Other people who bought these books also bought ..." feature, the wiki engine could offer users "Other people who approved these versions also trusted these people ..." feature.

Solved issues

This approach obsoletes page protection and replaces it by withholding approval of new version. If such withholding is automatic unless explicit (although possibly indirect) trust has been invoked, it also obsoletes the weird social construct of vandalism patrol as nobody would approve random vandalism (or if they did, it would be easy to distrust them). Of course, organised trolling groups such as GNAA might be able to pull of more notable things, but again, it would be easy to keep them off your network of trust. In essence, every singular troll would become a faction of one, and organised trolls would just become their own factions.

This approach allows the users the benefits of their information being properly gatekept, but does it without forcing any gatekeeper and his biases on any user. Everybody who is unsatisfied with the articles there are could become a gatekeeper of their own, or nominate a gatekeeper for them.

One persistent kind of conflict in Wikipedia is the continuous battle between the so-called 'deletionists' and 'inclusionists'. One camp wants to enforce a high bar of entry for Wikipedia's articles; the other, in turn, wants to enforce a very low bar of entry. If we recognise "this article shouldn't appear as all" as one possible versions, allowing people to "prefer not having an article", we should be able to make both inclusionists and deletionists if not happy then at least content.

When Wikipedia started, it imported the bulk of the 1911 edition of Encyclopedia Britannica. This has had a number of interesting consequences, and some problems. Similarly, when Citizendium started, it pondered for quite a while on whether to start from a carte blanche or import the then-current state of Wikipedia. A wiki engine conducive to branching would solve this dilemma by offering the possibility of including any earlier encyclopedic work as one fork. Where, say, the 1911 Encyclopedia of Britannica is still adequate, people could check it over and approve its version; where it isn't -- such as nuclear physics --, new articles could be written, and the knowledgeable people's approval of these over the old encyclopædia.

Downsides

Fork-at-will will lead to compartmentalisation among people, not consensus. On its face value, it's true that such a proposal will enable high level of compartmentalisation or many parallel Walled Gardens. However, the ability to fork is not the cause of that.

We already live in a world of many, sometimes conflicting, beliefs. There have already been forks of Wikipedia. It appears it's unreasonable to expect for, say, a staunch Muslim and a devoted Catholic to find a consensus on the position of Jesus' status. At best, they may be able to agree to disagree, but this is not always the case -- especially on Wikipedia. Forking-at-will will facilitate such agreements to disagree, by allowing people of different beliefs to have their own -- necessarily different -- wiki in the same space.

Furthermore, when it is possible for different factions to work out a consensus -- for example, by developing a joint version of an article that would discuss all involved sides' approaches --, I fully believe they'd do it even with fork-at-will. (For example, it's not inconceivable that a Catholic and a Presbyterian will find a consensus on Jesus' status, even if they differ widely on the relevance of Mary's sainthood.) Thus, I wouldn't say that fork-at-will leads to new compartmentalisation; instead, it will provide a new layer of flexibility.

Conspiracy theorists, pseudoscientists, and other fringers will be able to push their message. To an extent, this is true. However, to an extent, this is also an inevitable aspect of crowd-sourcing. In consolation, the non-conspiracy theorists will always be able to disapprove of conspiracy theories, and will not have to suffer from them. For pseudoscience, I anticipate a science team arising to check out on science articles, approve the good ones, disapprove the bad ones, and allow other people to trust them. Should -- hypothetically -- the team somehow be overtaken by, say, homœopaths or other woos, the science-minded people's trust in them would wane, and be carried over to a new and better science team.

Unsolved issues

How best represent trust? I believe this is the issue that will require most further thought (and careful experimentation) in this whole solution. In the beginning, I would give each user the buttons "I consider this version good" and "I consider this version bad" as well as "I expects this user's new edits will be good/bad" and "I trust this user to have good judgement on article goodness". First further developments would allow users to form groups, to trust groups, and to limit trust to particular topics ("I trust this user to have good judgement on article goodness for astronomy-related articles"). Further developments should depend on how well this works in practice.

What to offer to random passersby? One of the strong points of Wikipedia is that it makes a handy one-stop link for bloggers, making its articles very high among most search engine's results. This strength depends, to a considerable degree, on article uniquity, and discarding this uniquity may thus handicap the solution. However, considering that we don't have to discard the uniquity of article topics but only of the best versions, I suspect this problem can be overcome in a combination of two approaches. First, as I mentioned before, the software should be able to easily point out the differences between major versions. (I admit that 'major' is somewhat vague here, but 'approved by largest amount of users' should make a good first approximation until we figure out a better approach.) Second, once we have groups of users and group approvals, it'll be easy for the link author's to specify which group's version he wants to link to. Thus, WingNutDaily could always link to the Conservapedia-like version of the article on, say, evolution.

Social issues

If the basic mission of Wikipedia is to be followed, initial groups on natural sciences, arts, and history should be set up. Of course, the spirit of non-exclusivism will allow other groups of similar scope to also be formed. For example, there will almost certainly be a general pseudoscience and ufology group.

Coöperation between various factions will still be useful, but since non-exclusivism will avoid forcing it on people, the social system may have to provide stronger incentives towards coöperation.

If the system becomes popular, there will probably be quite a number of groups of different structure.

Depending on the way groups are formed, some may become suspectible to hostile takeovers. Experience will hopefully teach us how to deal with this danger.