Part One

Part One

How Atari
BASIC Works

Atari BASIC: A High-Level Language Translator

The programming language which has become the de facto standard for the Atari Home Computer is the Atari 8K BASIC Cartridge, known simply as Atari BASIC. It was designed to serve the programming needs of hoth the computer novice and the experienced programmer who is interested in developing sophisticated applications programs. In order to meet such a wide range of programming needs, Atari BASIC was designed with some unique features. In this chapter we will introduce the concepts of high level language translators and examine the design features of Atari BASIC that allow it to satisfy such a wide variety of needs. Language Translators Atari BASIC is what is known as a high level language translator. A language, as we ordinarily think of it, is a system for communication. Most languages are constructed around a set of symbols and a set of rules for combining those symbols. The English language is a good example. The symbols are the words you see on this page. The rules that dictate how to combine these words are the patterns of English grammar. Without these patterns, communication would be very difficult, if not impossible: Out sentence this believe, of make don't this trying if sense you to! If we don't use the proper symbols, the results are also disastrous: @twu2 yeggopt gjsiem, keorw? In order to use a computer, we must somehow communicate with it. The only language that our machine really understands is that strange but logical sequence of ones and zeros known as machine language. In the case of the Atari, this is known as 6502 machine language. When the 6502 central processing unit (CPU) "sees" the sequence 01001000 in just the right place according to its rules of syntax, it knows that it should push the current contents of 1

Chapter One the accumulator onto the CPU stack. (If you don't know what an "accumulator" or a "CPU stack" is' don't worry about it. For the discussion which follows, it is sufficient that you be aware of their existence.) Language translators are created to make it simpler for humans to communicate with computers. There are very few 6502 programmers, even among the most expert of them, who would recognize 01001000 as the push-the-accumulator instruction. There are more 6502 programmers, but still not very many, who would recognize the hexadecimal form of 01001000, $48, as the push-the-accumulator instruction. However, most, if not all, 6502 programmers will recognize the symbol PHA as the instruction which will cause the 6502 to push the accumulator. PHA, $48, and even 01001000, to some extent, are translations from the machine's language into a language that humans can understand more easily. We would like to be able to communicate to the computer in symbols like PHA; but if the machine is to understand us, we need a language translator to translate these symbols into machine language. The Debug Mode of Atari's Editor/Assembler cartridge, for example, can be used to translate the symbols $48 and PHA to the ones and zeros that the machine understands. The debugger can also translate the machine's ones and zeros to $48 and PHA. The assembler part of the Editor/Assembler cartridge can be used to translate entire groups of symbols like PHA to machine code. Assemblers An assembler - for example, the one contained in the Assembler/Editor cartridge - is a program which is used to translate symbols that a human can easily understand into the ones and zeros that the machine can understand. In order for the assembler to know what we want it to do, we must communicate with it by using a set of symbols arranged according to a set of rules. The assembler is a translator, and the language it understands is 6502 assembly language. The purpose of 6502 assembly language is to aid program authors in writing machine language code. The designers of the 6502 assembly language created a set of symbols and rules that matches 6502 machine language as closely as possible. This means that the assembler retains some of the 2

Chapter One disadvantages of machine language. For instance, the process of adding two large numbers takes dozens of instructions in 6502 machine language. If human programmers had to code those dozens of instructions in the ones and zeros of machine language, there would be very few human programmers. But the process of adding two large numbers in 6502 assembly language also takes dozens of instructions. The assembly language instructions are easier for a programmer to read and remember, but they still have a One-to-one cor respondence with the dozens of machine language instructions. The programming is easier, but the process remains the same. High Level Languages High level languages, like Atari BASIC, Atari PILOT, and Atari Pascal, are simpler for people to use because they more closely approximate human speech and thought patterns. However, the computer still understands only machine language. So the high level languages, while seeming simple to their users, are really much more complex in their internal operations than assembly language. Each high level language is designed to meet the specific need of some group of people. Atari Pascal is designed to implement the concept of structured programming. Atari PILOT is designed as a teaching tool. Atari BASIC is designed to serve both the needs of the novice who is just learning to program a computer and the needs of the expert programmer who is writing a sophisticated application program, but wants the program to be accessible to a large number of users. Each of these languages uses a different set of symbols and symbol-combining rules. But all these language translators were themselves written in assembly language. Language Translation Methods There are two different methods of performing language translation - compilation and interpretation. Languages which translate via interpretation are called interpreters. Languages which translate via compilation are called compilers. Interpreters examine the program source text and simulate the operations desired. Compilers translate the program source text into machine language for direct machine execution. 3

Chapter One The compilation method tends to produce faster, more efficient programs than does the interpretation method. However, the interpretation method can make programming easier. Problems with the Compiler Method The compiler user first creates a program source file on a disk, using a text editing program. Then the compiler carefully examines the source program text and generates the machine language as required. Finally, the machine language code is loaded and executed. While this three-step process sounds fairly simple, it has several serious 'gotchas." Language translators are very particular about their symbols and symbol-combining rules. If a symbol is misspelled, if the wrong symbol is used, or if the symbol is not in exactly the right place, the language translator will reject it. Since a compiler examines the enure program in one gulp, one misplaced symbol can prevent the compiler from understanding any of the rest of the program - even though the rest of the program does not violate any rules! The result is that the user often has to make several trips between the text editor and the compiler before the compiler successfully generates a machine language program. But this does not guarantee that the program will work. If the programmer is very good or very lucky, the program will execute perfectly the very first time. Usually, however, the user must debug the program. This nearly always involves changing the source program, usually many times. Each change in the source program sends the user back to step one: after the text editor changes the program, the compiler still has to agree that the changes are valid, and then the machine code version must be tested again. This process can be repeated dozens of times if the program is very complex. Faster Programming or Faster Programs? The interpretation method of language translation avoids many of these problems. Instead of translating the source code into machine language during a separate compiling step, the interpreter does all the translation while the program is running. This means that whenever you want to test the program you're writing, you merely have to tell the interpreter to run it. If things don't work right; stop the program, make a few changes, and run the program again at once. 4

Chapter One You must pay a few penalties for the convenience of using the interpreter's interactive process, but you can generally develop a complex program much more quickly than the compiler user can. However, an interpreter is similar to a compiler in that the source code fed to the interpreter must conform to the rules of the language. The difference between a compiler and an interpreter is that a compiler has to verify the symbols and symbol-combining rules only once - when the program is compiled. No evaluation goes on when the program is running. The interpreter, however, must verify the symbols and symbol-combining rules every time it attempts to run the program. If two identical programs are written, one for a compiler and one for an interpreter, the compiled program will generally execute at least ten to twenty times faster than the interpreted program. Pre-compiling Interpreter Atari BASIC has been incorrectly called an interpreter. It does have many of the advantages and features of an interpretive language translator, but it also has some of the useful features of a compiler. A more accurate term for Atari's BASIC Language Translator is pre-compiIing interpreter. Atari BASIC, like an interpreter, has a text editor built into it. When the user enters a source line, though, the line is not stored in text form, but is translated into an intermediate code, a set of symbols called tokens. The program is stored by the editor in token form as each program line is enterred. Syntax and symbol errors are weeded out at that time. Then, when you run the program, these tokens are examined and their functions simulated; but hecause much of the evaluation has already been done, the execution of an Atari BASIC program is faster than-that of a pure interpreter. Yet Atari BASIC's program-building process is much simpler than that of a compiler. Atari BASIC has advantages over compilers and interpreters alike. With Atari BASIC, every time you enter a line it is verified for language correctness. You don't have to wait until compilation; you don't even have to wait until a test run. When you type RUN you already know there are no syntax errors in your program. 5

Chapter One

Internal Design Overview

Atari BASIC is divided into two major functional areas: the Program Constructor and the Program Executor. The Program Constructor is used when you enter and edit a BASIC program. The source line pre-compiler, also part of the Program Constructor, translates your BASIC program source text lines into tokenized lines. The Program Executor is used to execute the tokenized program - when you type RUN, the Program Executor takes over. Both the Program Constructor and the Program Executor are designed to use data tables. Some of these tables are already contained in BASIC's ROM (read-only memory). Others are constructed by BASIC in the user RAM (random- access memory). Understanding these various tables is an important key to understanding the design of Atari BASIC. Tokens In Atari BASIC, tokens are the intermediate code into which the source text is translated. They represent source-language symbols that come in various lengths - some as long as 100 characters (a long variable name) and others as short as one character ("+" or "-"). Every token, however, is exactly one eight-bit byte in length. Since most BASIC Language Symbols are more than one character long, the representation of a multi-character BASIC Language Symbol with a single-byte token can mean a considerable saving of program storage space. A single-byte token symbol is also easier for the Program Executor to recognize than a multi-character symbol, since it can be evaluated by machine language routines much more quickly. The SEARCH routine - 76 bytes long - located at $A462 isa good example of how much assembly language it takes to recognize a multi-character symbol. On the other hand, the two instructions located at $AB42 are enough to 7

Chapter Two determine if a one-byte token is a variable. Because routines to recognize Atari BASIC'so one-byte tokens take so much less machine language, they execute relatively quickly. The 256 possible tokens are divided into logical numerical groups that also make them simpler to deal with in assembly language. For example, any token whose value is 128 ($80) or greater represents a variable name. The logical grouping of the token values also means faster execution speeds, since, in effect, the computer only has to check bit 7 to recognize a variable. The numerical grouping of the tokens is shown below: Token Value (Hex) Description 00-0D Unused 0E Floating Point Numeric Constant. The next six bytes will hold its value. 0F String Constant. The next byte is the string length. A string of that length follows. 10-3C Operators. See table starting at $A7E3 for specific operators and values. 39-54 Functions. See table starting at $A820 for specific functions and values. 55-7F Unused. 80-FF Variables. In addition to the tokens listed above, there is another set of single-byte tokens, the Statement Name Tokens. Every statement in BASIC starts with a unique statement name, such as LET, PRINT, and POKE. (An assignment statement such as "A = B + C," without the word LET, is considered to begin with an implied LET.) Each of these unique statement names is represented by a unique Statement Name Token. The Program Executor does not confuse Statement Name Tokens with the other tokens because the Statement Name Tokens are always located in the same place in every statement - at the beginning. The Statement Name Token value is derived from its entry number, starting with zero, in the Statement Name Table at $A4AF. 8

Chapter Two Tables A table is a systematic arrangement of data or inforrnation. Tables in Atari BASIC fall into two distinct types: tables that are part of the Atari BASIC ROM and tables that Atari BASIC builds in the user RAM area. ROM Tables The following is a brief description of the various tables in the Atari BASIC ROM. The detailed use of these tables will be explained in subsequent chapters. Statement Name Table ($A4AF). The first two bytes in each entry point to the information in the Statement Syntax Table for this statement. The rest of the entry is the name of the statement name in ATASCII. Since name lengths vary, the last character of the statement name has the most signiflcant bit turned on to indicate the end of the entry. The value of the Statement Name Token is derived from the relative (from zero) entry number of the statement name in this table. Statement Execution Table ($AA00). Each entry in this table is the two-byte address of the 6502 machine language code which will simulate the execution of the statement. This table is organized with the statements in the same order as the statements in the Statement Name Table. Therefore, the Statement Name Token can be used as an index to this table. Operator Name Table ($A7E3). Each entry comprises the ATASCII text of an Operator Symbol. The last character of each entry has the most significant bit turned on to indicate the end of the entry. The relative (from zero) entry number, plus 16 ($10), is the value of the token for that entry. Each of the entries is also given a label whose value is the value of the token for that symbol. For example, the ";" symbol at $A7E8 is the fifth (from zero) entry in the table. The label for the ";" token is CSC, and the value of CSC is $15, or 21 decimal (1^l6+5). Operator Execution Table ($AA70). Each two-byte entry points to the address, minus one, of the routine which simulates the execution of an operator. The token value, minus 16, is used to access the entries in this table during execution time: The entries in this table are in the same order as in the Operator Name Table. Operator Precedence Table ($AC3F). Each entry represents the relative execution precedence of an individual operator. The table entries are accessed by the operator tokens, 9

Chapter Two minus 16. Entries correspond with the entries in the Operator Name Table. (See Chapter 7.) Statement Syntax Table ($A60D). Entries in this table are used in the process of translating the source program to tokens. The address pointer in the first part of each entry in the Statement Name Table is used to access the specific syntax information for that statement in this table. (See Chapter 5.) RAM Tables The tables that BASIC builds in the user RAM area will be explained in detail in Chapter 3. The following is a brief description of these tables: Variable Name Table. Each entry contains the source ATASCII text for the corresponding user variable symbol in the program. The relative (from zero) entry number of each entry in this table, plus 128, becomes the value of the token representing the variable. Variable Value Table. Each entry either contains or points to the current value of a variable. The entries are accessed by the token value, minus 128. Statement Table. Each entry is one tokenized BASIC program line. The tokenized lines are kept in this table in ascending numerical order by line number. Array/String Table. This table contains the current values for all strings and numerical arrays. The location of the specific values for each string and/or array variable is accessed from information in the Variable Value Table. Runtime Stack. This is the LIFO Runtime Stack, used to control the execution of GOSUB/RETURN and similar statements. Pre-compiler Atari BASIC translates the BASIC source lines from text to tokens as soon as they are entered. To do this, Atari BASIC must recognize the symbols of the BASIC Language. BASIC also requires that its symbols be combined in certain specific patterns. If the symbols don't follow the required patterns, then Atari BASIC cannot translate the line. The process of checking a source line for the required symbol patterns is called syntax checking. BASIC performs syntax checking as part of the tokenizing process. When the Program Editor receives a completed line of 10