今日はどんなコード書くと速くなるよと言うちょっとした小ネタを話すです。 ちなみにベンチマークは Fedora Core 6 (Intel(R) Pentium(R) 4 CPU 2.80GHz) の Perl 5.8.8 で取ってます。
B::Concise までは手を出さない。めんどくさい。
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw(:all);
my $limit = 10000;
my $sets = {
add1 => sub {
my $a = 0;
for ( 0 .. $limit - 1 ) {
$a++;
}
},
add2 => sub {
my $a = 0;
for ( 0 .. $limit - 1 ) {
++$a;
}
},
add3 => sub {
my $a = 0;
for ( 0 .. $limit - 1 ) {
$a = $a + 1;
}
},
add4 => sub {
my $a = 0;
for ( 0 .. $limit - 1 ) {
$a += 1;
}
}
};
my $result = timethese(
10000,
$sets
);
cmpthese($result);
一般的には ++$a が最速と言われてるけど?
Benchmark: timing 10000 iterations of add1, add2, add3, add4...
add1: 18 wallclock secs (18.12 usr + 0.00 sys = 18.12 CPU) @ 551.88/s (n=10000)
add2: 17 wallclock secs (17.62 usr + 0.00 sys = 17.62 CPU) @ 567.54/s (n=10000)
add3: 22 wallclock secs (21.70 usr + 0.00 sys = 21.70 CPU) @ 460.83/s (n=10000)
add4: 22 wallclock secs (22.09 usr + 0.00 sys = 22.09 CPU) @ 452.69/s (n=10000)
Rate add4 add3 add1 add2
add4 453/s -- -2% -18% -20%
add3 461/s 2% -- -16% -19%
add1 552/s 22% 20% -- -3%
add2 568/s 25% 23% 3% --
ここで B::Deparse を使う。 B::Deparse を使うと、Perl コンパイラがソースコードをどのように最適化しているかを伺い知る事が出来ます。
$ perl -MO=Deparse,-P -e 'my $a = 1; for (0..100) { $a++ }'
my $a = 1;
foreach $_ (0 .. 100) {
++$a;
}
お!++$a に置換されてるじゃないですか。と言う訳で ++$a と $a++ の違いはコンパイラの最適化に掛かるコストだと考えられます。
元から配列に格納されて居ると言う前提。
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw(:all);
my @chars = ('a' .. 'z');
my $sets = {
concat1 => sub {
my $concat = '';
$concat .= $_ for @chars;
},
concat2 => sub {
my $concat = join( '' => @chars );
},
concat3 => sub {
my $concat = "@chars";
},
concat4 => sub {
my $concat = $chars[0] . $chars[1] . $chars[2] . $chars[3] . $chars[4] . $chars[5] . $chars[6] . $chars[7] . $chars[8] . $chars[9] . $chars[10] . $chars[11] . $chars[12] . $chars[13] . $chars[14] . $chars[15] . $chars[16] . $chars[17] . $chars[18] . $chars[19] . $chars[20] . $chars[21] . $chars[22] . $chars[23] . $chars[24] . $chars[25];
}
};
my $result = timethese(
100000,
$sets
);
cmpthese($result);
んで結果。
Benchmark: timing 100000 iterations of concat1, concat2, concat3, concat4...
concat1: 1 wallclock secs ( 0.83 usr + 0.00 sys = 0.83 CPU) @ 120481.93/s (n=100000)
concat2: 1 wallclock secs ( 0.28 usr + 0.00 sys = 0.28 CPU) @ 357142.86/s (n=100000)
(warning: too few iterations for a reliable count)
concat3: 0 wallclock secs ( 0.53 usr + 0.00 sys = 0.53 CPU) @ 188679.25/s (n=100000)
concat4: 1 wallclock secs ( 0.43 usr + 0.00 sys = 0.43 CPU) @ 232558.14/s (n=100000)
Rate concat1 concat3 concat4 concat2
concat1 120482/s -- -36% -48% -66%
concat3 188679/s 57% -- -19% -47%
concat4 232558/s 93% 23% -- -35%
concat2 357143/s 196% 89% 54% --
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dump qw(dump);
use Benchmark qw(:all);
my ($a, $b, $c, $d) = ("a" .. "d");
my $sets = {
concat1 => sub {
return "$a$b$c$d";
},
concat2 => sub {
return $a . $b . $c . $d;
},
concat3 => sub {
return join('', $a, $b, $c, $d);
}
};
my $result = timethese(
1000000,
$sets
);
cmpthese($result);
さてどうか。
Benchmark: timing 1000000 iterations of concat1, concat2, concat3...
concat1: 1 wallclock secs ( 0.60 usr + 0.00 sys = 0.60 CPU) @ 1666666.67/s (n=1000000)
concat2: 0 wallclock secs ( 0.55 usr + 0.00 sys = 0.55 CPU) @ 1818181.82/s (n=1000000)
concat3: 0 wallclock secs ( 0.67 usr + 0.00 sys = 0.67 CPU) @ 1492537.31/s (n=1000000)
Rate concat3 concat1 concat2
concat3 1492537/s -- -10% -18%
concat1 1666667/s 12% -- -8%
concat2 1818182/s 22% 9% --
ほう。B::Deparse してみると、
perl -MO=Deparse,-P,-q -e '"$a$b$c"' $a . $b . $c;
なるほど。
もう少し変数の数を増やしてみよう。
Rate concat1 concat3 concat2
concat1 1111111/s -- -2% -8%
concat3 1136364/s 2% -- -6%
concat2 1204819/s 8% 6% --
Rate concat2 concat1 concat3
concat2 628931/s -- -8% -16%
concat1 680272/s 8% -- -10%
concat3 751880/s 20% 11% --
| -- | "$a$b$c..." | $a . $b . $c ... | join(, $a, $b, $c) | 備考 |
| 4 | 1666667/s | 1818182/s | 1492537/s | . で連結が最速 |
| 6 | 1111111/s | 1204819/s | 1136364/s | join が二位に浮上だがまだ . で連結のが速い |
| 10 | 680272/s | 628931/s | 751880/s | join が一位になった |
と言う事で、連結する変数が多くなると途中から join が最速になる。"$a$b$c" のような形式は最適化されて $a . $b . $c となるようだがやはり数に関わらず最も遅いので使わない。
http://www.ibm.com/developerworks/jp/linux/library/l-optperl/ より
他にも色々拡張してみた。
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw(:all);
my $limit = 100;
my $sets = {
concat1 => sub {
my $string = 'abcdefghijklmnopqrstuvwxyz';
my $concat = '';
for (1..$limit) {
$concat .= $string;
}
return $concat;
},
concat2 => sub {
my $string = 'abcdefghijklmnopqrstuvwxyz';
my $concat = '';
my @stack = ();
for (1 .. $limit) {
push(@stack, $string);
}
return join('', @stack);
},
concat3 => sub {
my $string = 'abcdefghijklmnopqrstuvwxyz';
my $concat = '';
my @stack = ();
$#stack = $limit - 1;
for (0 .. $limit - 1) {
$stack[$_] = $string;
}
return join('', @stack);
},
concat4 => sub {
my $string = 'abcdefghijklmnopqrstuvwxyz';
return $string x $limit;
},
concat5 => sub {
my $string = 'abcdefghijklmnopqrstuvwxyz';
join '' => map { $string } ( 1 .. $limit );
},
};
my $result = timethese(
100000,
$sets
);
cmpthese($result);
結果発表。
Benchmark: timing 100000 iterations of concat1, concat2, concat3, concat4, concat5...
concat1: 3 wallclock secs ( 3.13 usr + 0.00 sys = 3.13 CPU) @ 31948.88/s (n=100000)
concat2: 7 wallclock secs ( 7.86 usr + 0.01 sys = 7.87 CPU) @ 12706.48/s (n=100000)
concat3: 6 wallclock secs ( 7.81 usr + 0.00 sys = 7.81 CPU) @ 12804.10/s (n=100000)
concat4: 1 wallclock secs ( 0.34 usr + 0.00 sys = 0.34 CPU) @ 294117.65/s (n=100000)
(warning: too few iterations for a reliable count)
concat5: 8 wallclock secs ( 7.72 usr + 0.01 sys = 7.73 CPU) @ 12936.61/s (n=100000)
Rate concat2 concat3 concat5 concat1 concat4
concat2 12706/s -- -1% -2% -60% -96%
concat3 12804/s 1% -- -1% -60% -96%
concat5 12937/s 2% 1% -- -60% -96%
concat1 31949/s 151% 150% 147% -- -89%
concat4 294118/s 2215% 2197% 2174% 821% --
ダントツで "x" で連結した方が速い。
スタックに積んで連結するタイプは $#list を事前に指定しておく方式はしないで push するのとほとんど速度の優劣は見られなかった。map とかも遅い。
指定回数、空ループを回す
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw(:all);
my $limit = 100000;
my $sets = {
for1 => sub {
for (my $i = 0; $i < $limit; $i++) { }
},
for2 => sub {
my $i = 0;
for (;;) {
last if ($i > $limit);
$i++;
}
},
for3 => sub {
for (my $i = 0; $i < $limit; ) {
$i++;
}
},
for4 => sub {
for (0 .. $limit - 1) { }
},
while1 => sub {
my $i = 0;
while ($i < $limit) {
$i++;
}
},
while2 => sub {
my $i = 0;
while (1) {
last if ($i > $limit);
$i++;
}
},
while3 => sub {
my @loop = (0 .. $limit - 1);
while (shift @loop) { }
},
until1 => sub {
my $i = 0;
until ($i > $limit) { $i++ }
},
until2 => sub {
my $i = 0;
until (0) {
if ($i > $limit) {
last;
}
$i++;
}
},
do_while => sub {
my $i = 0;
do { $i++ } while ($i < $limit);
},
do_until => sub {
my $i = 0;
do { $i++ } until ($i > $limit);
},
map => sub {
map { $_ } (0 .. $limit-1);
},
};
my $result = timethese(
1000,
$sets
);
cmpthese($result);
結果〜、
Benchmark: timing 1000 iterations of do_until, do_while, for1, for2, for3, for4, map, until1, until2, while1, while2, while3...
do_until: 18 wallclock secs (18.67 usr + 0.00 sys = 18.67 CPU) @ 53.56/s (n=1000)
do_while: 19 wallclock secs (18.72 usr + 0.00 sys = 18.72 CPU) @ 53.42/s (n=1000)
for1: 19 wallclock secs (18.46 usr + 0.01 sys = 18.47 CPU) @ 54.14/s (n=1000)
for2: 20 wallclock secs (20.31 usr + 0.00 sys = 20.31 CPU) @ 49.24/s (n=1000)
for3: 18 wallclock secs (18.48 usr + 0.00 sys = 18.48 CPU) @ 54.11/s (n=1000)
for4: 11 wallclock secs (10.74 usr + 0.00 sys = 10.74 CPU) @ 93.11/s (n=1000)
map: 30 wallclock secs (29.80 usr + 0.01 sys = 29.81 CPU) @ 33.55/s (n=1000)
until1: 18 wallclock secs (18.34 usr + 0.00 sys = 18.34 CPU) @ 54.53/s (n=1000)
until2: 21 wallclock secs (20.47 usr + 0.00 sys = 20.47 CPU) @ 48.85/s (n=1000)
while1: 18 wallclock secs (18.48 usr + 0.00 sys = 18.48 CPU) @ 54.11/s (n=1000)
while2: 21 wallclock secs (20.27 usr + 0.00 sys = 20.27 CPU) @ 49.33/s (n=1000)
while3: 32 wallclock secs (31.93 usr + 0.00 sys = 31.93 CPU) @ 31.32/s (n=1000)
Rate while3 map until2 for2 while2 do_while do_until for3 while1 for1 until1 for4
while3 31.3/s -- -7% -36% -36% -37% -41% -42% -42% -42% -42% -43% -66%
map 33.5/s 7% -- -31% -32% -32% -37% -37% -38% -38% -38% -38% -64%
until2 48.9/s 56% 46% -- -1% -1% -9% -9% -10% -10% -10% -10% -48%
for2 49.2/s 57% 47% 1% -- -0% -8% -8% -9% -9% -9% -10% -47%
while2 49.3/s 58% 47% 1% 0% -- -8% -8% -9% -9% -9% -10% -47%
do_while 53.4/s 71% 59% 9% 8% 8% -- -0% -1% -1% -1% -2% -43%
do_until 53.6/s 71% 60% 10% 9% 9% 0% -- -1% -1% -1% -2% -42%
for3 54.1/s 73% 61% 11% 10% 10% 1% 1% -- -0% -0% -1% -42%
while1 54.1/s 73% 61% 11% 10% 10% 1% 1% 0% -- -0% -1% -42%
for1 54.1/s 73% 61% 11% 10% 10% 1% 1% 0% 0% -- -1% -42%
until1 54.5/s 74% 63% 12% 11% 11% 2% 2% 1% 1% 1% -- -41%
for4 93.1/s 197% 178% 91% 89% 89% 74% 74% 72% 72% 72% 71% --
変数展開があるかもしれないので、当然ダブルクオートの方が遅いと予想される。
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw(:all);
my $limit = 1000;
my $sets = {
single_quote => sub {
for ( 0 .. $limit ) {
my $string = "abcdefgjijklmnopqrstuvwxyz";
}
},
double_quote => sub {
for ( 0 .. $limit ) {
my $string = 'abcdefgjijklmnopqrstuvwxyz';
}
},
};
my $result = timethese(
100000,
$sets
);
cmpthese($result);
結果
Benchmark: timing 100000 iterations of double_quote, single_quote...
double_quote: 56 wallclock secs (55.72 usr + 0.03 sys = 55.75 CPU) @ 1793.72/s (n=100000)
single_quote: 52 wallclock secs (52.08 usr + 0.04 sys = 52.12 CPU) @ 1918.65/s (n=100000)
Rate double_quote single_quote
double_quote 1794/s -- -7%
single_quote 1919/s 7% --
ほんの数%だけ速くはなる。
いじょ。