General Information
Author: various from Usenet
Assembler: generic
Published: Usenet
Download: from comp.sys.apple2.programmer
I've seen code that does this:
LDA #<data1 ;get low address and modify code below sta load + 1 lda ##data1 ;get high address and modify code below STA load+2 load LDA $ABCD ;table of addresses data1 data2 data3
what other self-modifying code schemes exist?
I was thinking about this last night: non modifying version:
do stuff lda wait ;wait is a flag, 1 = make the program wait for open apple to be pressed beq nowait jsr wait4OpenApple ;wait = 1 nowait RTS
modifying version:
do stuff modify jsr wait4openApple ;this routine waits for the open apple button to ;be pressed, then just RTS's
Include a flag, "wait" that is checked at the beginning of the program:
begin program LDA WAIT BNE yeswait (wait = 1, meaning code above is ok as it is) ;here, wait = 0, meaning we do NOT want to JSR the Wait4OpenApple code lda #$60 ;rem OPcode for RTS sta modify ;this changes the JSR at label MODIFY to an RTS yeswait (rest of program)
thoughts, comments, or other self modifying code schemes?
Rich
aiiad...@gmail.com wrote:
> thoughts, comments, or other self modifying code schemes?
Self-modifying code can be difficult to debug. If you need it for speed in a critical section of code, by all means use it, but I wouldn't make a habit of it unless you're trying to be deliberately obscure.
I'm using it in some of the graphics code because STA absolute,Y is 1 cycle faster than STA (dp),Y. It takes time to do the LDA/STA to set up the address, but since I'm repeating the operation hundreds or thousands of times it's a net win.
If you use it for control of execution decisions, like stuffing an RTS to short-circuit a function, you're probably asking for trouble. At the very least, you have to ensure that everything is initialized on every execution, or you won't be able to run the code more than once without reloading it. The real trouble though is that you can't just look at the code and know what's going to happen, because what eventually gets executed is different. If all you're doing is changing the address of a STA, it's not so bad, because you know that it's storing a byte
>If you use it for control of execution decisions, like stuffing an RTS to short-circuit a function, you're probably asking for trouble. At the very least, you have to ensure that everything is initialized on every execution, or you won't be able to run the code more than once without reloading it.
the idea is to include/not include a JSR to a "wait for button press" routine for debugging.... and to use self modifying code for the fun of it.
I could add a command within the program:
lda key cmp #enablewait beq addwait cmp #disablewait beq removewait <etc, rest of commands of my program> addwait lda #$XX ;rem OPcode for JSR sta modify ; sta modify2 sta modify3 (etc) jmp end removewait lda #$60 ;rem OPcode for RTS sta modify ; sta modify2 sta modify3 (etc) jmp end subroutine1 <do stuff> modify jsr $XXXX RTS subroutine2 <do stuff> modify2 jsr $XXXX RTS subroutine1 modify3 jsr $XXXX RTS
I've used self-modifiying code to ( 1 ) preserve registers without using extra space, ( 2 ) in unrolling loops and ( 3 ) eliminate branches. I've also dynamically built inner loops as outlined by Andy. There's a lot of ways to use self-modifying code. Some of them might even be practical. :)
Lucas
Exmaple 1: This modifies an operand
sty restore+1 jsr do_stuff restore ldy \#00
Example 2: This modifies an operand
lda \#end sec sbc num_loops sta disp+1 lda \#00 disp jmp $0000 sta $FF . . sta $01 sta $00 end rts
Example 3: (if x = 0 save Y, if x = 1 do nothing). This modifies an opcode
lda opcode,x sta patch tya patch sta save opcode db '$8D, $AD' $8D = sta, $AD = lda
I like example 1... that is a lot faster than pulling registers off the stack.
Example 2 is a loop unrolling example, right? So you'll store 0 in addresses 0 thru num_loops without any branch logic -- but this extra speed is of course at the expense of additional memory for the code.
Example 3 seems like great obfuscation, but since you're clobbering the accumulator, couldn't you just do:
txa bne no_save sty save no_save ...
It's 12 bytes for your method (including the opcode table) and 6 bytes for mine. However, yours is a very cool way to throw someone WAAAAAY off the scent of what you're actually doing so would be a good in a copy-protection scheme. Unless I've lost the point of what it's also capable of doing (very likely :-)
>It's 12 bytes for your method (including the opcode table) and 6 bytes >for mine. However, yours is a very cool way to throw someone WAAAAAY >off the scent of what you're actually doing so would be a good in a >copy-protection scheme. Unless I've lost the point of what it's also >capable of doing (very likely :-)
Well, I tried just to present the skeleton of each idea. For a more 'real world' usage of something like Example 3, consider trying to look something up in a table where the index is x + y. The 'slow' way of doing this is
stx tmp tya clc adc tmp tax lda table,x Using self-modifying code (assuming the table is page-aligned) sty patch+1 patch lda table,x
Granted, this patches an operand instead of the opcode but it does let you index further than 255 bytes, e.g. x = 200, y = 200 will do the right thing.
Hmmm...let's try again. The one time I actually used Example 3, I chained several of these together. Here's code that implements the following bit of logic (assume x = 0 or 1, y = 0 or 1, paddle = 0 to 3, and action is at memory location $60XX)
if x = 0 then return else if paddle+y > 1 then return action,y else return 7 end
lda opcode1,x ; patch in RTS or LDA paddle sta patch1 patch1 lda paddle sta patch2+1 ; patch in offset (opcode2 is page aligned) patch2 lda opcode2,y sta patch3 patch3 lda action,y rts org $6000 opcode2 db '$A9, $A9, $B9, $B9, $B9' opcode1 db '$60, $AD' action db '$5A, $A5'
The part around patch3 deserves a little more explanation. Since action = $6007, the four bytes at patch3 will either be
$B9 $07 $60 $60 = LDA $6007,y; RTS
or
$A9 $07 $60 $60 = LDA #$07; RTS; RTS.
So the correct behavior really depends on the location of the action table. I guess you could use something like that for copy-protection.
-Lucas
lscha...@d.umn.edu wrote:
> Exmaple 1: This modifies an operand > sty restore+1 > jsr do_stuff > restore ldy \#00
"STY absolute" + "LDY \#imm" takes 4+3=7 cycles and 5 bytes.
phy jsr do_stuff ply
takes 3+4=7 cycles and 2 bytes, and you won't burn in hell for all eternity.
> Example 2: This modifies an operand > lda \#end > sec > sbc num_loops > sta disp+1 > lda \#00 > disp jmp $0000 > sta $FF > . > . > sta $01 > sta $00 > end rts
A "computed goto" is a reasonable use, though there is an indirect form of JMP that may work better. (It looks like JMP(addr,X) didn't exist until the 65c02 though.) The above doesn't actually work, if I understand your intention -- you have to multiply the value by 2 to line it up on a STA instruction, and you have to do a 16-bit add because "disp" crosses at least one page.
> Example 3: (if x = 0 save Y, if x = 1 do nothing). This modifies an > opcode
> lda opcode,x > sta patch > tya > patch sta save > opcode db '$8D, $AD' $8D = sta, $AD = lda
This is the sort of thing that scares me, for the reasons mentioned earlier: it's hard to tell what's going to happen by reading the code.
Don't forget you can do tricks like this:
[test something, branch to load+1] lda #$00 load bit $03a9 sta somewhere
There's a "lda #$03" embedded in the BIT instruction. It's generally more sane to code it like this:
lda #$00 dfb $2c ;BIT abs load lda #$03 sta somewhere
It worries me a little that I'm doing most of this off the top of my head.
>> Example 1: This modifies an operand >> sty restore+1 >> jsr do_stuff >> restore ldy #00 >"STY absolute" + "LDY \#imm" takes 4+3=7 cycles and 5 bytes. > phy > jsr do_stuff > ply >takes 3+4=7 cycles and 2 bytes, and you won't burn in hell for all eternity.
Well, the place I actually use constructs like this is IIgs specific where I need to restore the stack pointer after some PEA slamming. My code really looks like this
tsc sta patch+1 loop anop jmp mess_up_stack patch lda #5A5A tcsThis lets me replace a save/restore with just a restore.
>A "computed goto" is a reasonable use, though there is an indirect form of JMP that may work better. (It looks like JMP(addr,X) didn't exist until the 65c02 though.) The above doesn't actually work, if I understand your intention -- you have to multiply the value by 2 to line it up on a STA instruction, and you have to do a 16-bit add because "disp" crosses at least one page.
Agreed. Also, for an unrolled loop the jmp (addr,x) instruction is not too useful since you need a jump table as big as the unrolled loop itself! And yes, I did miss some of the details.
>> Example 3: (if x = 0 save Y, if x = 1 do nothing). This modifies an >> opcode >> lda opcode,x >> sta patch >> tya >> patch sta save >> opcode db '$8D, $AD' $8D = sta, $AD = lda >This is the sort of thing that scares me, for the reasons mentioned
Wait until you read my other post!
>earlier: it's hard to tell what's going to happen by reading the code. Don't forget you can do tricks like this:
> [test something, branch to load+1] > lda #$00 >load bit $03a9 > sta somewhere
>There's a "lda #$03" embedded in the BIT instruction. It's generally more sane to code it like this:
> lda #$00 > dfb $2c ;BIT abs >load lda #$03 > sta somewhere
I personally like the "jump into instruction" obfusication. I don't know exactly why -- there just something satisfying in writing code so synergistic that every byte is fullfilling multiple functions. ;)
I think that quite a few "self-modifying ticks" can be much more useful when you can wrap then in some good semantics, as you illustrated.
>It worries me a little that I'm doing most of this off the top of my head.
Nah, it just shows off your geek quotient.
-Lucas
If only Von Neumann knew what kind of insanity we'd get ourselves into with the whole stored program concept. Maybe he would have reworked things so that program code could only touch memory segments flagged as data-only. Then again, maybe Von Neumann would write enough self-modifying code to make someone like Dykstra want to take a flying leap off the UT clock tower. ;-D
BLuRry wrote: > If only Von Neumann knew what kind of insanity we'd get ourselves into with the whole stored program concept. Maybe he would have reworked things so that program code could only touch memory segments flagged as data-only. Then again, maybe Von Neumann would write enough self-modifying code to make someone like Dykstra want to take a flying leap off the UT clock tower. ;-D
Before the invention of the index register (or the "B-Box" as it was called at Manchester), modifying code was the *only* way to write useful loops. The possibility of programmatic code modification was not an unintended consequence of storing code and data in the same memory, but a primary motivation for doing so.
The very possibility of programs creating other programs was created, and with it, endless possibilities.
Today, with most instruction modification formalized and abstracted as indexing and indirection, we tend to regard code that re-writes code as a problem, not a solution. But where would we be without compilers, optimizers, JIT code generators, dynamic optimizers, etc., all of which spring from the concept of code being able to be data, too.
So the *real* rule must be: "Don't modify code unless you really know what you're doing, and all of its implications." (Like playing with fire. ;-)
-michael