MySQLにおける、GeohashとGeometry型による検索の比較

後、id:kokogiko さんにGeometry型 × SPATIAL インデックスはどーよってツッコミをもらったので、そちらも一緒にベンチをとります。

Geometry型のテーブル定義

前回のGeohashのテーブルを一つ。SPATIALインデックスはMyISAM専用なので、そのテーブルを一つ。後、InnoDBでインデックス貼っていない物を用意しました。InnoDBでno-SPATIALなインデックス張った物も試したかったんですが、手元の環境だとMySQLが落ちてしまうっぽいので今回は省略しました。

CREATE TABLE location (
    id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    geohash VARCHAR(10) NOT NULL,
    KEY (geohash)
) ENGINE=InnoDB;

CREATE TABLE location_myisam (
    id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    point POINT NOT NULL,
    SPATIAL KEY (point)
) ENGINE=MyISAM;

CREATE TABLE location_nokey (
    id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    point POINT NOT NULL
) ENGINE=InnoDB;

データ投入は前回のスクリプトと同等です。

my $gh = Geo::Hash::XS->new;

my $ins_myi = $dbh->prepare(
    "INSERT INTO location_myisam(point) VALUES (GeomFromText(?))"
) or die $dbh->errstr;
my $ins_no = $dbh->prepare(
    "INSERT INTO location_nokey(point) VALUES (GeomFromText(?))"
) or die $dbh->errstr;
my @geo_sths = ($ins_myi, $ins_no);

my $sth = $dbh->prepare('INSERT INTO location(geohash) VALUES (?)')
          or die $dbh->errstr;

my $gh = Geo::Hash::XS->new;
for (1 .. 100000) {
    my $lat = rand(37 - 34) + 34;
    my $lng = rand(141 - 138) + 138;

    $sth->execute($gh->encode($lat, $lng, 10)) or die $sth->errstr;
    my $point = "POINT($lng $lat)";
    $_->execute($point) or die $_->errstr for @geo_sths;
}

ベンチ用のスクリプト

Geohashの検索ロジックは前回とまったく一緒です。Geometry型の検索は以下のようにMBRContainsを使いました。

sub search_points_geometry($) {
    my $table = shift;
    my $sth = $dbh->prepare(qq{
        SELECT id, X(point), Y(point) FROM $table
        WHERE MBRContains(GeomFromText('Polygon((
        139.716654  35.727353,
        139.716654  35.827353,
        139.816654  35.827353,
        139.816654  35.727353,
        139.716654  35.727353
        ))'), point)
    }) or die $dbh->errstr;
    $sth->execute or die $sth->errstr;

    my @ret;
    while (my $ref = $sth->fetch) {
        push @ret, $ref;
    }
    return @ret;
}

測定は、Benchmark.pmではなくてTime::HiResを利用した自前のルーチンで行いました。

use Time::HiRes qw/time/;

sub benchmark($$) {
    my ($count, $codes) = @_;

    my %ret;
    for my $name (keys %$codes) {
        my $code = $codes->{$name};

        my $time = time;
        $code->() for 1 .. $count;
        $ret{$name} = time - $time;
    }

    my $base;
    for (sort {$ret{$b} <=> $ret{$a}} keys %ret) {
        my $count = $count / $ret{$_};
        my $rate = 0;
        if (! $base) {
            $base = $count;
        } else {
            $rate = $count / $base - 1;
        }
        printf "%6s %2.1f/s %5.1f %%\n", $_, $count, $rate * 100;
    }
}

benchmark(10, {
    (map { my $p = $_; "p=$p" => sub { search_points($p) } } 1 .. 7),
    myisam => sub { search_points_geometry('location_myisam') },
    nokey  => sub { search_points_geometry('location_nokey') },
});

結果

まず、それぞれ10回で試行します。

   p=2 2.3/s   0.0 %
 nokey 2.4/s   1.9 %
   p=1 2.5/s   6.1 %
   p=7 5.1/s 119.0 %
   p=3 11.3/s 383.8 %
   p=6 130.1/s 5450.8 %
   p=4 299.7/s 12684.5 %
myisam 403.5/s 17112.9 %
   p=5 530.0/s 22508.9 %

前回の誤ったベンチと同様に、やはりprecisionの範囲は4〜6でないと使い物にはならないようです。また、InnoDBでキーなしでMBRContainsを使った場合が論外ってのも予想通り。

次、上位を1,000回で試行します。

   p=6 130.1/s   0.0 %
   p=4 299.0/s 129.8 %
myisam 402.8/s 209.6 %
   p=5 538.1/s 313.6 %

意外だったのは、MyISAMでSPARTIALインデックスを使うよりGeohashで適切な粒度に分割した方が速いってことです。ただ、これはMySQLのバージョンによっても結果が変わるかもしれません(当方は5.0.45)。

まとめ

やはりGeohashで検索をする時は、数個〜数百で領域を覆える粒度のprecisionで検索をするとよいようです。

また、MySQLにおいてはSPARTIALインデックスと同等かそれ以上のパフォーマンスが期待でき、Geohashによる検索が十分使い物になることを示していると言えるでしょう。

Pixel Pedals of Tomakomai

北海道苫小牧市出身の初老の日常

MySQLにおける、GeohashとGeometry型による検索の比較

Geometry型のテーブル定義

ベンチ用のスクリプト

結果

まとめ