Quantcast
Channel: Microcontrollers
Viewing all articles
Browse latest Browse all 218803

Forum Post: TMS320F28388D: Optimization Behavior with -O4 in TI C2000 Compiler

$
0
0
Part Number: TMS320F28388D Tool/software: I am using the TI C2000 compiler with the following optimization flags: /opt/ti/ti-cgt-c2000_22.6.1.LTS/bin/cl2000 \ --issue_remarks --gen_opt_info=2 -v28 -ml -O4 -op=3 \ --c_src_interlist --auto_inline --verbose_diagnostics \ --advice:performance=all --opt_for_speed=5 \ --preproc_with_compile --keep_asm \ -I/opt/ti/ti-cgt-c2000_22.6.1.LTS/include \ main.cpp However, I have noticed a few inefficiencies in the code optimization behavior at the ` -O4` optimization level, and I wanted to ask if there are any recommendations or insights to address these. Static Table Optimization In the first example, where I have a static table[] in the function foo() , I expected the compiler to optimize this table away and remove unnecessary memory accesses. However, the table is still being accessed directly, even though the value of a is within the bounds of the table. In comparison, GCC at optimization level -O1 would handle this more efficiently. Is there any way to ensure that the table is properly optimized away? /* MOVZ AR6,AL ; [CPU_ALU] |4| MOVL XAR4,#_table$1 ; [CPU_ARAU] |6| SETC SXM ; [CPU_ALU] MOVL ACC,XAR4 ; [CPU_ALU] |6| ADD ACC,AR6 ; [CPU_ALU] |6| MOVL XAR4,ACC ; [CPU_ALU] |6| MOV AL,*+XAR4[0] ; [CPU_ALU] |6| LRETR ; [CPU_ALU] */ int foo(char a) { static const int table[] = { 1,2,3,4,5 }; return table[a]; } Auto inline In the second example, the read() function is simple and should ideally be inlined, especially given the --auto_inline flag. However, the compiler does not seem to inline this function. GCC at -O1 inlines it automatically. Is there a reason why this function is not inlined in the C2000 compiler even with -O4 , and are there additional flags that can ensure this? Also I observed that the compiler generates an unnecessary call to memcpy() , which is not ideal for performance. The code is essentially moving around values that could be done with simpler instructions, so I was surprised to see the memcpy call. How can I avoid this issue, or is there a setting that can better optimize this pattern? /* MOVL *SP++,XAR1 ; [CPU_ALU] ADDB SP,#6 ; [CPU_ARAU] MOVZ AR4,SP ; [CPU_ALU] |21| MOVZ AR1,AL ; [CPU_ALU] |20| MOVL XAR5,#_$P$T0$2 ; [CPU_ARAU] |21| MOVB ACC,#5 ; [CPU_ALU] |21| SUBB XAR4,#5 ; [CPU_ARAU] |21| MOVZ AR4,AR4 ; [CPU_ALU] |21| LCR #_memcpy ; [CPU_ALU] |21| MOVZ AR4,SP ; [CPU_ALU] |22| SUBB XAR4,#5 ; [CPU_ARAU] |22| MOVZ AR4,AR4 ; [CPU_ALU] |22| SETC SXM ; [CPU_ALU] MOVL ACC,XAR4 ; [CPU_ALU] |22| ADD ACC,AR1 ; [CPU_ALU] |22| MOVL XAR4,ACC ; [CPU_ALU] |22| MOV AL,*+XAR4[0] ; [CPU_ALU] |22| SUBB SP,#6 ; [CPU_ARAU] MOVL XAR1,*--SP ; [CPU_ALU] LRETR ; [CPU_ALU] */ int foo2(char a) { const int table[] = { 1,2,3,4,5 }; return table[a]; } I would appreciate any insights or suggestions on improving the optimization for these cases with the TI C2000 compiler.

Viewing all articles
Browse latest Browse all 218803

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>