The simplest BSL only needs to erase Flash and load the image as generated by Compiler/Assembler/Linker. There is no need to load RAM, peripherals, PC, or other registers. Once the Flash is loaded correctly, the BSL code could enter an infinity loop or go to LPM doing nothing. After that, whenever you cycle the power or cause a Reset, the chip will automatically start to run the program currently reside in Flash.
If you want BSL to invoke the program without going through power cycle, that can be done very simply too. Instead of go to an infinity loop or go to LPM, the BSL code could do something to cause a Reset (such as WDT=0;).