Stack-based languages simplify the language's parser considerably because the data for an instruction always appears in the source code before the instructions that will use it. To see why this helps, consider this typical line of Basic:
A = 10 + 20 * B
To perform this line, the interpreter has to read the entire line, look up the value of B (let's say 3), realize that the * has to be performed before + and order the instructions correctly, and then finally convert those into instructions something like:
get(B,temp1) - get the value in B and store it in temp1
multiply(20,temp1,temp2) - multiply that value by 20 and store the result in temp2
add(10,temp2,temp3) - add 10 to temp2 and store the result in temp3
put(temp3,A) - store the value of temp3 into the variable A
In contrast, in a stack-based system, the programmer organizes the code in the fashion it will ultimately be performed. The equivalent would be something like:
B 20 mul
10 add
When this code is performed, the interpreter pushes the value of B on the stack, then 20. It then encounters the mul, which removes the last two items, the 3 and 20, multiplies them, and puts the result back on the stack. Next, it pushes 10 on the stack, leaving the top two locations containing 60 and 10. It then encounters add, taking the two values, adding them, and putting the result back on the stack. The top of the stack now contains the result, 70.
Notice that the stack-based version has no temporary values, and only reads a single instruction at a time, not an entire line of code. As a result, the parser is much simpler, smaller and requires less memory to run. This, in turn, generally makes it much faster, comparable to compiled programs.
Another key aspect of the language is Forth's inherently multitasking design. The program could set up separate stacks and feed different code into each one. The Forth kernel would run each of these stacks in turn, so all Forth programs had access to these features. This made writing multithreaded code very easy, so one could, for instance, have a thread reading the joystick as it moved, and then read that value in a game loop in another stack.