The simplest thing to do to optimize your C code, is to use the "-O" flad during compilation. The '-O' flag tells the compiler to optimize the code. This also means the compilation will take longer, as the compiler tries to apply various optimization algorithms to the code. This optimization is supposed to be conservative, in that it ensures us the code will still perform the same functionality as it did when compiled without optimization (well, unless there are bugs in our compiler). Usually can define an optimization level by adding a number to the '-O' flag. The higher the number - the better optimized the resulting program will be, and the slower the compiler will complete the compilation. One should note that because optimization alters the code in various ways, as we increase the optimization level of the code, the chances are higher that an improper optimization will actually alter our code, as some of them tend to be non-conservative, or are simply rather complex, and contain bugs. For example, for a long time it was known that using a compilation level higher than 2 (or was it higher than 3?) with gcc results bugs in the executable program. After being warned, if we still want to use a different optimization level (lets say 4), we can do it this way:
cc -O4 helloworld.c
Besides this, there are many other ways to make C more efficient.
General TipsAvoid using ++ and -- etc. within loop expressions. E.g.: while(n--){}, as this can sometimes be harder to optimize.
Minimize the use of global variables.
Declare anything within a file (external to functions) as static, unless it is intended to be global.
Use word-size variables if you can, as the machine can work with these better (instead of char, short, double, bit fields etc.).
Don't use recursion. Recursion can be very elegant and neat, but creates many more function calls which can become a large overhead.
Avoid the sqrt() square root function in loops - calculating square roots is very CPU intensive.
Single dimension arrays are faster than multi-dimension arrays.
Compilers can often optimize a whole file - avoid splitting off closely related functions into separate files, the compiler will do better if it can see both of them together (it might be able to inline the code, for example).
Single precision math may be faster than double precision - there is often a compiler switch for this.
Floating point multiplication is often faster than division - use val * 0.5 instead of val / 2.0.
Addition is quicker than multiplication - use val + val + val instead of val * 3. puts() is quicker than printf(), although less flexible.
Use #defined macros instead of commonly used tiny functions - sometimes the bulk of CPU usage can be tracked down to a small external function being called thousands of times in a tight loop. Replacing it with a macro to perform the same job will remove the overhead of all those function calls, and allow the compiler to be more aggressive in its optimization.
Binary/unformatted file access is faster than formatted access, as the machine does not have to convert between human-readable ASCII and machine-readable binary. If you don't actually need to read the data in a file yourself, consider making it a binary file.
MemoryIf your library supports the mallopt() function (for controlling malloc), use it. The MAXFAST setting can make significant improvements to code that does a lot of malloc work. If a particular structure is created/destroyed many times a second, try setting the mallopt options to work best with that size.
Use static buffers - fast, but lack the ability to grow when needed.
malloc() - slighly slower than static buffers, possible memory fragmentation, and most importantly it requires the memory to be freed, which can be sometimes very annoying. It's easy to forget to free the memory and cause memory leaks. free()ing already freed memory may also be an exploitable a security flaw, always setting the free()d pointer to NULL helps quite a lot though (create a macro to do it).
Garbage collector - this would be the best way to manage memory. For example OCaml's garbage collector is quite smart by treating long-living allocations differently than temporary allocations. However with C it's not really possible - you can't go moving the allocated memory elsewhere unless you write your program in special way. So there's a few simple non-portable garbage collector implementations for C, but they're not much more different from malloc()ing other than that you don't need to free() memory. They're not fool proof either.
Stringsstrncpy(), strncat(), snprintf() - only snprintf() of these is easy to use safely (except it's return value) but it's still somewhat unportable (Windows). strncpy() doesn't necessarily NUL-terminate requiring it to be done explicitly. strncat() was probably never meant to be a way to prevent buffer overflows, it's behaviour is just too insane for that, for example: strncat(buf, "foo", sizeof(buf)-strlen(buf)-1); Then again there's people who are happily using strncat() by giving the full buffer size to it's 3rd argument. While that would sound logical, it's completely wrong.
strlcpy(), strlcat() - much better replacements to above by OpenBSD. Very unportable, but you can easily create your own ones. But these can still be used unsafely if the buffer size parameter is wrong or if the programmer goes playing around with the buffer indirectly, by eg. appending single characters and missing size checks (yes, I've seen this in software that contained "secure" in it's name).
Dynamically allocating the amount of wanted memory and then using strcpy(), strcat(), sprintf() and direct accessing. This requires you to be very careful with the string size calculations. I don't understand why so many people think that's not a problem, they have this "If you can't calculate the sizes correctly, you're stupid and you shouldn't be coding at all" attitude. Why bother wasting time with that at all when you could be doing more important things?
Dynamically growing buffers, used by for example GLIB, vsftpd, qmail, djbdns and Postfix. This is definitely the right way; string manipulation is done through API which discourages - or even disallows - direct buffer manipulation.
Global variablesGlobal variables are never allocated to registers. Global variables can be changed by assigning them indirectly using a pointer, or by a function call. Hence, the compiler cannot cache the value of a global variable in a register, resulting in extra (often unnecessary) loads and stores when globals are used. We should therefore not use global variables inside critical loops.
If a function uses global variables heavily, it is beneficial to copy those global variables into local variables so that they can be assigned to registers. This is possible only if those global variables are not used by any of the functions which are called.
Loop jammingNever use two loops where one will suffice. But if you do a lot of work in the loop, it might not fit into your processor's instruction cache. In this case, two separate loops may actually be faster as each one can run completely in the cache. The following code is bad....
for(i=0; i<100; i="0;">
would be better written as:
for(i=0; i<100;>
PointersIf possible, you should pass structures by reference, that is pass a pointer to the structure, otherwise the whole thing will be copied onto the stack and passed, which will slow things down. I've seen programs that pass structures several Kilo Bytes in size by value, when a simple pointer will do the same thing.
Functions receiving pointers to structures as arguments should declare them as pointer to constant if the function is not going to alter the contents of the structure. As an example:
void print_data_of_a_structure ( const Thestruct *data_pointer)
{
...printf contents of the structure...
}
This example informs the compiler that the function does not alter the contents (as it is using a pointer to constant structure) of the external structure, and does not need to keep re-reading the contents each time they are accessed. It also ensures that the compiler will trap any accidental attempts by your code to write to the read-only structure and give an additional protection to the content of the structure.
IntegersYou should use unsigned int instead of int if we know the value will never be negative. Some processors can handle unsigned integer arithmetic considerably faster than signed (this is also good practice, and helps make for self-documenting code). So, the best declaration for an int variable in a tight loop would be:
register unsigned int variable_name;
Although, it is not guaranteed that the compiler will take any notice of register, and unsigned may make no difference to the processor. But it may not be applicable for all compilers. Remember, integer arithmetic is much faster than floating-point arithmetic, as it can usually be done directly by the processor, rather than relying on external FPUs or floating point math libraries.
When you need to be accurate to two decimal places scale everything up by 100, and convert it back to floating point as late as possible.
Division and RemainderIn standard processors, depending on the numerator and denominator, a 32 bit division takes 20-140 cycles to execute. The division function takes a constant time plus a time for each bit to divide.
Time (numerator / denominator) = C0 + C1* log2 (numerator / denominator) = C0 + C1 * (log2 (numerator) - log2 (denominator))
The current version takes about 20 + 4.3N cycles for an ARM processor. As an expensive operation, it is desirable to avoid it where possible. Sometimes, such expressions can be rewritten by replacing the division by a multiplication. For example, (a / b) > c can be rewritten as a > (c * b) if it is known that b is positive and b *c fits in an integer. It will be better to use unsigned division by ensuring that one of the operands is unsigned, as this is faster than signed division.