Unix components

Friday, May 16, 2008 by ckwong

Source: http://en.wikipedia.org/wiki/Unix

The Unix system is composed of several components that are normally packaged together. By including — in addition to the kernel of an operating system — the development environment, libraries, documents, and the portable, modifiable source-code for all of these components, Unix was a self-contained software system. This was one of the key reasons it emerged as an important teaching and learning tool and has had such a broad influence.

The inclusion of these components did not make the system large — the original V7 UNIX distribution, consisting of copies of all of the compiled binaries plus all of the source code and documentation occupied less than 10MB, and arrived on a single 9-track magnetic tape. The printed documentation, typeset from the on-line sources, was contained in two volumes.

The names and filesystem locations of the Unix components has changed substantially across the history of the system. Nonetheless, the V7 implementation is considered by many to have the canonical early structure:

  • Kernel — source code in /usr/sys, composed of several sub-components:
    • conf — configuration and machine-dependent parts, including boot code
    • dev — device drivers for control of hardware (and some pseudo-hardware)
    • sys — operating system “kernel”, handling memory management, process scheduling, system calls, etc.
    • h — header files, defining key structures within the system and important system-specific invariables
  • Development Environment — Early versions of Unix contained a development environment sufficient to recreate the entire system from source code:
    • cc — C language compiler (first appeared in V3 Unix)
    • as — machine-language assembler for the machine
    • ld — linker, for combining object files
    • lib — object-code libraries (installed in /lib or /usr/lib) libc, the system library with C run-time support, was the primary library, but there have always been additional libraries for such things as mathematical functions (libm) or database access. V7 Unix introduced the first version of the modern “Standard I/O” library stdio as part of the system library. Later implementations increased the number of libraries significantly.
    • make - build manager (introduced in PWB/UNIX), for effectively automating the build process
    • include — header files for software development, defining standard interfaces and system invariants
    • Other languages — V7 Unix contained a Fortran-77 compiler, a programmable arbitrary-precision calculator (bc, dc), and the awk “scripting” language, and later versions and implementations contain many other language compilers and toolsets. Early BSD releases included Pascal tools, and many modern Unix systems also include the GNU Compiler Collection as well as or instead of a proprietary compiler system.
    • Other tools — including an object-code archive manager (ar), symbol-table lister (nm), compiler-development tools (e.g. lex & yacc), and debugging tools.
  • Commands — Unix makes little distinction between commands (user-level programs) for system operation and maintenance (e.g. cron), commands of general utility (e.g. grep), and more general-purpose applications such as the text formatting and typesetting package. Nonetheless, some major categories are:
    • sh — The “shell” programmable command-line interpreter, the primary user interface on Unix before window systems appeared, and even afterward (within a “command window”).
    • Utilities — the core tool kit of the Unix command set, including cp, ls, grep, find and many others. Subcategories include:
      • System utilities — administrative tools such as mkfs, fsck, and many others
      • User utilities — environment management tools such as passwd, kill, and others.
    • Document formatting — Unix systems were used from the outset for document preparation and typesetting systems, and included many related programs such as nroff, troff, tbl, eqn, refer, and pic. Some modern Unix systems also include packages such as TeX and Ghostscript.
    • Graphics — The plot subsystem provided facilities for producing simple vector plots in a device-independent format, with device-specific interpreters to display such files. Modern Unix systems also generally include X11 as a standard windowing system and GUI, and many support OpenGL.
    • Communications — Early Unix systems contained no inter-system communication, but did include the inter-user communication programs mail and write. V7 introduced the early inter-system communication system UUCP, and systems beginning with BSD release 4.1c included TCP/IP utilities.
The 'man' command can display a 'man page' for every command on the system, including itself.

The ‘man’ command can display a ‘man page’ for every command on the system, including itself.

  • Documentation — Unix was the first operating system to include all of its documentation online in machine-readable form. The documentation included:
    • man — manual pages for each command, library component, system call, header file, etc.
    • doc — longer documents detailing major subsystems, such as the C language and troff

Unix overview

Friday, May 16, 2008 by ckwong

Unix operating systems are widely used in both servers and workstations. The Unix environment and the client-server program model were essential elements in the development of the Internet and the reshaping of computing as centered in networks rather than in individual computers.

Both Unix and the C programming language were developed by AT&T and distributed to government and academic institutions, causing both to be ported to a wider variety of machine families than any other operating system. As a result, Unix became synonymous with “open systems“.

Unix was designed to be portable, multi-tasking and multi-user in a time-sharing configuration. Unix systems are characterized by various concepts: the use of plain text for storing data; a hierarchical file system; treating devices and certain types of inter-process communication (IPC) as files; and the use of a large number of software tools, small programs that can be strung together through a command line interpreter using pipes, as opposed to using a single monolithic program that includes all of the same functionality. These concepts are known as the Unix philosophy.

Under Unix, the “operating system” consists of many of these utilities along with the master control program, the kernel. The kernel provides services to start and stop programs, handle the file system and other common “low level” tasks that most programs share, and, perhaps most importantly, schedules access to hardware to avoid conflicts if two programs try to access the same resource or device simultaneously. To mediate such access, the kernel was given special rights on the system, leading to the division between user-space and kernel-space.

The microkernel concept was introduced in an effort to reverse the trend towards larger kernels and return to a system in which most tasks were completed by smaller utilities. In an era when a “normal” computer consisted of a hard disk for storage and a data terminal for input and output (I/O), the Unix file model worked quite well as most I/O was “linear”. However, modern systems include networking and other new devices. As graphical user interfaces developed, the file model proved inadequate to the task of handling asynchronous events such as those generated by a mouse, and in the 1980s non-blocking I/O and the set of inter-process communication mechanisms was augmented (sockets, shared memory, message queues, semaphores), and functionalities such as network protocols were moved out of the kernel.

Source: http://en.wikipedia.org/wiki/Unix

Why C++?

Friday, May 9, 2008 by ckwong

Source: http://www.gamearchitect.net/Articles/WhyC++.html

By Kyle Wilson
Saturday, July 15, 2006

A number of different on-line forums (most notably the SWEng-GameDev mailing list) have seen recent discussion of whether C++ is a wise language choice for game development.  When people have to make that decision in the real world, the dominance of C++ has been overwhelming.  But if on-line discussion is any indication, there’s also a good deal of dissatisfaction with that choice.

The attacks on C++ come on two flanks.  From the low-level side comes the argument that the language supports too high a level of abstraction, and that we should go back to programming in straight C.  This argument holds that writing efficient code in C is easier than writing efficient code in C++ because the level of information hiding afforded by C++ hides inefficiencies from cursory perusal.  For example, in C, you can read the line

    a = b + c;

and know that a, b and c are primitive types and that the addition and the assignment will generate, at most, a few lines of assembly code. In C++, a, b and c could be objects of any size calling overloaded addition and assignment operators that might generate still more temporary objects, allocate memory, call sleep(), reformat your hard drive, etc. The only safe way to prevent such shenanigans is to disable compiler support for C++ language features entirely.

The other camp of C++ criticism compares it to higher-level languages and finds C++ wanting. This camp holds that C++ suffers under the burden of backwards compatibility and slow ANSI/ISO standard evolution. It lacks features found in more modern languages, such as aspects, closures, garbage collection, reflection, and native support for concurrency. The paleolithic C++ physical structure–.h include files and .cpp implementation files–results in achingly slow build times and forces programmers to write every function definition twice, once where it’s declared in the header and once where it’s defined. We would be better off, according to this argument, writing our games in C#, Eiffel or Objective Caml.

Remarkably, the two sides don’t even disagree. Instead, the sentiment is commonly expressed that straight C is the way to go for core engine code while less-performance-critical gameplay code should be written in a high-level scripting language (for a cogent expression of this opinion, see this post by Robert Blum in the SWEng-GameDev thread). C++ falls awkwardly in the middle, a jack of all trades and master of none.

Given all this discontent… Why C++? Why has it been so successful, and does the language deserve to see that success continue?

First, I’ll get the meta-reason out of the way: Nothing succeeds like success. The fact that C++ is the dominant language for game development gives it a lot of inertia. If you want to hire experienced programmers, it’s much easier to recruit C++ gurus than it is to find OCaml talent. If you’re going to develop for consoles, Microsoft and Sony provide C++ compilers, but for any other language you’re on your own. And if you want to link with middleware, most modern game middleware packages–including Gamebryo, Havok, FMOD, SpeedTree and the Unreal Engine–are written in C++. Developing in another language, even C, is a lonely path.

The power of the C++ network effect is not to be underestimated. For example, Naughty Dog used an in-house LISP variant called GOAL (Game Oriented Assembly Lisp) for the Jak and Daxter titles. They had a significant investment in GOAL, for which they’d written their own compiler, linker and debugger. They supported dynamic reloading of code by a running game and reportedly generated tight, efficient assembly. After their acquisition by Sony, however, Naughty Dog were forced to transition to C++ development. “Sony wants us to be able to share code with other studios, and this works both ways - both other studios using our code and vice versa,” posted Naughty Dog lead programmer Scott Shumaker.

But rather than dwelling on the advantages that derive from its existing popularity, I’d like to focus on the inherent strengths of C++ for game development. In doing that, it’s important to understand how and why C++ evolved into the language it is today. C++ grew out of C. C was a systems programming language that co-evolved with the Unix operating system and that gained popularity as Unix spread. C was tight, close to the metal. You can look at a C function and tell almost exactly what assembly will result from compilation.

C++ adds layers of abstraction. C++ began as a set of extensions to C to add support for the object-oriented concepts found in Simula, classes and inheritance. As creator Bjarne Stroustrup writes in his C++ FAQ, “Since 1987 or so, the focus of development the C++ language and its associated programming styles have been the use of templates, static polymorphism, generic programming, and multiparadigm programming.” C++ continues to evolve to support new design paradigms.

The growth and evolution of C++ has been a mixed blessing. On the one hand, C++ is truly a language for multi-paradigm development: the language supports generic, imperative and object-oriented programming. Any C++ program, or part of a C++ program, can be written in the appropriate style to best model a particular problem domain. On the other hand, C++ has been a victim of its own success. As new core language and standard library features have been added to meet new needs, programming in C++ has become a powerful but complicated business. It’s easy to hire people who call themselves C++ programmers, but engineers who understand both the low-level details of the C++ object model and the compile-time power of C++ template metaprogramming are hard to come by. And C++, in the hands of an programmer without a good grasp of the language, is a dangerous thing.

Despite this risk, I think that the answer is better education, not a wholesale rush back to programming in C. Because in the right hands, C++ is a more powerful, safer, and–yes, really–more efficient language than C is.

More powerful? Every C game I’ve seen has re-implemented virtual function tables with structs stuffed full of function pointers. C++ has native support for object-oriented code. The language’s native syntax allows expression that’s closer to a programmer’s intention. C++ template metaprogramming is vastly more capable than macro coding with the C preprocessor. And the C++ standard library offers a broad array of containers and algorithms. You’re likely to be able to implement any given task with less code in C++, from low-level bit shuffling to high-level state logic.

Safer? In C, if we write

    #define MAX(A,B) ((A) > (B)) ? (A) : (B))

    MAX(x++,2)

then x will be incremented twice after macro expansion. The inlined templatized C++ std::max() function is equally efficient and free of such unpleasant side-effects. The lack of templates in C encourages passing structs through void pointers, as in the unfortunate standard library qsort() function. This predilection for void* undercuts the protections of the type-safety system and turns easily-caught compile-time errors into hard-to-find runtime errors. In C, the Resource Acquisition Is Initialization idiom doesn’t exist. Scoped locks and smart pointers can’t be implemented, so leaks of memory and other resources become much more likely.

More efficient! The C++ standard’s support of type-based alias analysis allows C++ compilers to elide load and store instructions that compliant C compilers cannot. (A full discussion of type-based alias analysis is beyond the scope of this article, but see “Type-Based Alias Analysis” in Dr. Dobb’s Journal.) Inlining allows C++ compilers to generate programs that make fewer function calls and therefore waste less time on function-call overhead. In combination with templates, functors allow C++ to easily generate efficient inlined custom code at compile time. This is why the C++ std::sort() will beat the perfomance of the C qsort() function every time. Finally, the C++ standard library containers, though much-criticized by developers, are likely to outperform equivalent code written by junior C coders. (For that matter, STL containers have outperformed the custom code they replaced in every case where I’ve seen data structures switched to use the STL.)

While I prefer C++ to C, I do agree that a C++ engine core works best in combination with gameplay code written in a language that supports a higher level of abstraction. That’s not because C++ lacks programming constructs of more modern languages, though. I confess that I haven’t worked with aspects or closures enough to even know whether they give great productivity improvements. I have done enough procedural programming to doubt that code written in a procedural language is going to be inherently more clean and elegant than code written in C++. And I’m generally skeptical of the benefits of garbage collection. Memory is one resource among many (file handles, critical section locks, D3D resource reference counts, etc.), and a resource management paradigm that only covers memory is of limited use. (If you think garbage collection “just works,” read this explanation of the complexities inherent in C# finalization.)

No, the compelling advantage that a more abstract language has over C++ is faster iteration time. The efficiency and power of C++ make it the best language choice for core game engine technology, where runtime efficiency is important and iteration time is not. For gameplay code, however, those requirements are reversed. The ability to iterate quickly on gameplay and get immediate feedback on design changes is paramount. But gameplay logic executes infrequently compared to physics or graphics routines, so highly-optimized native-code inner loops are less important.

For rapid iteration, you want a simple language that compiles quickly or doesn’t need to be compiled at all. You want to be able to make changes and see their effect immediately in-game, without having to restart, or even reload a level. You do not want the daunting features of C++ that make experienced programmers pause to think and make novice programmers write buggy code: pointers, a menagerie of different integer and floating-point types, multiple string representations and manual iteration over containers.

It’s because of these considerations that Lua has become almost as popular in the game industry as C++–at least among those game companies that elect to use an off-the-shelf scripting language. Lua is lightweight, dynamically typed and interpreted. It’s garbage collected, but as long as all your other resources are owned and managed by code and not script, garbage collection works just fine and saves you the headache of dealing with memory allocation.

So I hate to admit it, but I come down on the side of conventional wisdom. Write your game engine in C++. Write your gameplay in Lua. Those might not be the right answers for the next generation. I can imagine a language with better support for concurrency stealing the application domain from C++ sometime in the next decade. And C#/Mono is nipping at Lua’s heels as a game scripting language already. But for now, I think the C++/Lua combination is as good as it gets.

Algorithms and Data Structures - Links to Implementations [NIST]

Thursday, May 8, 2008 by ckwong

Source: http://www.nist.gov/dads/termsImpl.html

Other great source of implementations:

Stony Brook Algorithm Repository (organized by area and reviewed for quality)

Guide to Available Mathematical Software or GAMS (source of implementations of mathematical functions)


Algorithms and Data Structures - Index by Type [NIST]

Thursday, May 8, 2008 by ckwong

Types

Source: http://www.nist.gov/dads/termsType.html

Here covers only the “general” computer science algorithms, algorithmic techniques, data structures, archetypal problems, and related definitions.

Here not includes algorithms particular to fields such as business data processing, communications, operating systems or distributed algorithms, programming languages, AI, graphics, or numerical analysis.

Some terms with a leading variable, such as n-way, m-dimensional, or p-branching, are under k-.

Terms dealing with hardware, the computer industry, slang, etc., can be found in the Free On-Line Dictionary Of Computing or in A Glossary of Computer Oriented Abbreviations and Acronyms.


Data Structures