The general idea is that you take code written in some high level language, and rather than compiling it into "native" code for a specific hardware architecture, you compile it into a sort of "virtual assembly language," the instruction set for some sort of generic processor. This has quite a number of merits:
Code is parsed by the "bytecompiling" process and is transformed into some form that may be read in quickly without a need for complex parsing;
By removing whitespace and the likes, there is sometimes a savings of space as compared to the source code form (this commonly occurs when compiling ELisp code).
More importantly, there is almost always a huge savings of space as compared to compiling to machine code.
For example, the calendrica code compiles in various forms to the following sizes:
Table 1. Compiling calendrica.lisp
|calendrica.x86f||CMUCL Machine Code||472649|
|calendrica.lbytef.gz||CMUCL Bytecompiled, compressed||34941|
|calendrica.fas.gz||CLISP Bytecode, compressed||30290|
The critical comparison here is that the bytecoded forms are a whole lot smaller than the roughly 472K of calendrica.x86f.
It is far more difficult to measure this, but bytecode is also likely to be stored more compactly in memory than machine code. This is one of the purposes of the way CMUCL combines native compilation with a bytecode compiler: code that is executed a lot will benefit from compilation to native code, whilst by bytecode-compiling those parts of a system that are seldom executed, substantial memory savings are attained. The compactness here comes from the fact that the "machine language" is designed not for the computer hardware , but rather for the application .
Hand in hand with the diminished size comes the combination of convenience of implementation as well as improved computational efficiency.
All three walk in together as joint merits of designing a "computational engine" specifically for the application.
Consider that if the application is intended to
process strings, it makes sense to have strings as basic data types.
Parrot has "string" operations
tostring, which work with
strings far more conveniently than operators you
would get with "real machine language." That convenience can make it
easier to write compact, efficient code.
Expanding this to "real machine code" would increase the size of the code considerably.
A simulated "virtual machine" can be manipulated in ways that would be prohibitively complex to do on "bare hardware." For instance, in the Parrot system, it is easy enough to save sets of registers by pushing a few pointers onto a stack. On "bare hardware," the equivalent behaviour requires pushing a whole of registers into memory locations. This has the unexpected result that bytecoding can, here and there, actually be faster than coding to bare hardware.
Bytecode machines have traditionally been stack-oriented machines, where objects would be drawn in and out of memory onto a stack where they would then be processed.
The Parrot virtual machine is a little different, having a register architecture with four sets of 32 registers for four data types of integers, floats, strings, and Parrot Magic Cookies. They figure that this will lead to less stack thrashing.
It is convenient to create operations that do extremely complex processing.
Such operations will provide a compact representation for something that is complex, which reduces the size of a program; they also substantially improve performance by allowing a lot of work to be done within the optimized code of the "virtual machine simulator."
The classic example of an arguably mistaken example of this is in the
operation on the old VAX architecture. Calculating CRC checksums and
evaluating polynomials are wonderful examples of
"extremely complex processing." Rather a lot of microcode silicon was
likely consumed on these operations, and few compilers made use of
them. At least not the C compiler! As a
result of that, code implemented in C is unlikely to use these
operations, such as popular bytecoded language interpreters!
In an application where you expect to calculate a lot of
POLY operator will certainly be of
great value, as would, very likely, a whole set of matrix math
CLISP is known for having unusually good performance when processing BIGNUMs (quasi-infinite precision integers). Other Common Lisp implementations tend to beat its pants off when working with small integers when they can render code into native 32 bit arithmetic operations, as you might find with crypto applications, but once you cross the line to the BIGNUM, all the implementations wind up invoking function calls, and behave little different from a bytecode interpreter. CLISP has an unusually good BIGNUM library, and so works better than many others in this area of strength.
As for the
CRC function on the VAX being
a "mistake," it's a mistake when it consumes silicon on the CPU that
would have better been used for something else, and then remains
unused when your favorite compilers don't use it. The same is not
true for rarely-used bytecode instructions. If there are 160000 gates
on a CPU that aren't being used, that feels wasteful. If there is 16K
of code in the bytecode interpreter that never gets used, and perhaps
never even gets paged into memory, the waste is nowhere near as
In the hardware world, RISC may have become "king," in that it allows silicon to be devoted to having more registers and in improving the ability to execute code in parallel. In a bytecode interpreter, CISC is virtually always a win.
The BCPL compiler generated O-code, which was then interpreted or compiled to native code.
Implementations of the Icon language compile to bytecode, allowing deployment of compiled code on any supported platform.
Emacs ELisp code is commonly compiled into bytecode to speed load time, and reduce both disk and memory consumption.
OCAML includes a portable bytecode compiler.
Various Scheme implementations use bytecode interpreters.
Smalltalk systems typically are bytecoded; one interesting claim I have seen is that some Smalltalk implementations include Java bytecode integration.
Many implementations of FORTH use "bytecompiled" code;
See also Mono development platform
There have periodically been some rather hysterical reports and theories about the relationship between MONO, GNOME and Microsoft. Many quite wild, with rather incoherent theories as to why someone would have thought it sensible to implement MONO.
Contrary to some of the wild theories floating out on Slashdot, the reasoning has little to do with "using Microsoft code," or Microsoft Passport authentication, or anything else of the sort.
The real reasoning has to do with language. Microsoft is implementing all sorts of things as "part of .NET;" the parts MONO is looking at are:
A dynamic language
Using a more dynamic language offering garbage collection allows the
ability to not bother writing hordes of
free() calls, which would allow an
application like Evolution to be both
smaller and more easily and quickly written.
A bytecoded (perhaps JIT-compiled) platform to provide some independence of platform.
This also would disconnect application code somewhat from the deep details of the many C-based libraries of GNOME. Apparently the not-always-organized growth of libraries in GNOME has led to it becoming somewhat difficult to make concurrent use of many of the services offered.
Again, Java offers a "JVM." A number of other languages offer language-specific bytecoding schemes that somewhat parallel this.
Language- and platform-independence
One of the important characteristics of the GNOME project is that it intends to be relatively agnostic about what languages are used (in contrast with the somewhat C++-partisan KDE and Objective C-partisan GNUStep ).
The various " bytecode execution machines" that are presently available are generally not terribly friendly to the use of multiple languages. JVM is for Java, for instance.
There is some "never-accomplished Holy Grail" to this; witness UNCOL.
In practice, while there are around a dozen language compilers available that could be used with Mono, nearly all code is written in C#.
In effect, MONO represents something rather like the Java platform, with the conspicuous difference that it is specifically intended to be language neutral.
Here are some links to interviews and commentary from sundry GNOME folk about what they're about:
In effect, MONO may be summed up by "Programmer to use new compiler and new garbage collector". In a way, it's not vastly more profound than that.
Basically, "it's at least a couple years away, so speculating about it's functionality and importance now is fairly silly."
Which tries to puncture some of the as-of-2010 hysteria and disinformation.
An IDE tool for use with Mono