"Jan Richter" <vincent00...@yahoo.de> writes:
> the Code below shows DJBs own implementation of strlen (str_len):
> unsigned int str_len(char *s)
> {
> register char *t;
> t = s;
> for (;;) {
> if (!*t) return t - s; ++t;
> if (!*t) return t - s; ++t;
> if (!*t) return t - s; ++t;
> if (!*t) return t - s; ++t;
> }
> }
> I understand the code so far, but I have a question about the loops.
> To me, it seems he used loop unrolling (aka duff's device) to optimize
> the code while it is executed. But why did he use four loops? When
> the function is invoked, he didn't know how big "s" is. Or am I wrong here?
> I always thought, to unroll a loop I need to know how often the loop is
> used.
Duff's device is a specific technique, not a synonym for loop
unrolling. See question 20.35 of the C FAQ.
The use of unsigned int is questionable. The return type should be
size_t. It's possible that this implementation predates the existence
of size_t (though it's been modernized with a prototype). I would
guess that it might be more efficient than strlen() only on a very old
implementation.
A matter of terminology: it doesn't have four loops. It has a single
loop containing four if statements.
The statement that's executed N times is
if (!*t) return t - s; ++t;
N needn't be a multiple of 4 because it can break out of the loop
partway through the body of the for loop.
In the following, I'll use the term "iteration" for an execution of
the entire body of the loop, and "sub-iteration" for an execution of a
single line within the loop.
The idea of loop unrolling is that it avoids the overhead of a test on
each iteration, falling through from one statement to the next and
performing the test and branch only once every 4 elements. The number
of times a loop is to be unrolled is a tradeoff between speed and code
size -- and if it's unrolled too much (say, 1024 times), the code size
itself can make it run slower due to cache issues. This is all
*extremely* system-specific, which is why the whole thing is best left
to the compiler.
Furthermore, the "beauty" of Duff's Device is that it jumps into the
middle of the loop, so the loop never has to exit from the middle, and
it really does avoid the overhead on each sub-iteration. The code
above doesn't do this; instead, it performs a test and branch on each
sub-iteration.
I'd be surprised to see the above code performing better than strlen()
on any modern implementation, particularly since an implementation is
free to implement strlen() using whatever non-portable tricks it likes
to squeeze out the last clock cycle.
--
Keith Thompson (The_Other_Keith) k...@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
关闭本页