Eigen3.4.0は単純に処理速度が上がったらしい

by tokagi_rikka · 公開済み 2021年8月29日 · 更新済み 2021年8月29日

六花です。
暑くてPCが燃えそうです。助けて。

ところで先日、Eigenの新しいバージョンである3.4.0の正式版がリリースされました。
bfloat16やhalfといった半精度浮動小数点数の導入が興味を惹かれますが、
bfloat16は後述のソースコードではfloatに比べて非常に遅く、
halfはそもそもビルドが遅すぎて実用に耐えなさそうでした(処理自体はbfloat16より高速なので、そこで活きるかも)。
半精度浮動小数点数については今後のお楽しみかなぁという感じです。

また、リリースノート(https://eigen.tuxfamily.org/index.php?title=3.4)によると、

By using half- and quater-packets the performance of matrix multiplications of small to medium sized matrices has been improved
↓chrome翻訳
「ハーフパケットとクォーターパケットを使用することにより、中小規模の行列の行列乗算のパフォーマンスが向上しました。」

とあったので、実際に計算速度が上がっているか調べてみることにしました。

・環境
Ryzen 5950X
Microsoft Visual Studio Community 2019 Version 16.10.1

#include <iostream>
#include <random>
#include <array>
#include <chrono>

//#include "../Eigen-3.3.9/Core"
#include "../Eigen-3.4.0/Core"

int main()
{

    std::cout << "Eigen version : " << EIGEN_WORLD_VERSION << "." << EIGEN_MAJOR_VERSION << "." << EIGEN_MINOR_VERSION << std::endl;

    // 乱数の準備
    std::random_device rd;
    std::array<std::seed_seq::result_type, std::mt19937_64::state_size> seed_data;
    std::generate(seed_data.begin(), seed_data.end(), ref(rd));
    std::seed_seq s_seq(seed_data.begin(), seed_data.end());
    std::mt19937_64 mt(s_seq);
    auto dist_input = std::uniform_real_distribution<float>(-1.0, +1.0);

    Eigen::MatrixXf A(5, 5), B(5, 5), C(5, 5);
    //Eigen::Matrix<Eigen::half, -1, -1> A(5, 5), B(5, 5), C(5, 5);
    //Eigen::Matrix<Eigen::bfloat16, -1, -1> A(5, 5), B(5, 5), C(5, 5);

    // 初期値を明示的にランダムにする
    A.setRandom();
    B.setRandom();
    C.setRandom();

    // 10回計測する
    for (int i = 0; i < 10; ++i)
    {
        const auto now_begin = std::chrono::high_resolution_clock::now();
        //-------------------------------------------------------------------
        // 沢山計算させる
        for (int n = 0; n < 100'000; ++n)
        {
            // 乱数で初期化(これ自体も重い処理)
            for (int x = 0; x < 5; ++x)
            {
                for (int y = 0; y < 5; ++y)
                {
                    A(y, x) = dist_input(mt);
                    B(y, x) = dist_input(mt);
                    //A(y, x) = Eigen::half(dist_input(mt));
                    //B(y, x) = Eigen::half(dist_input(mt));
                    //A(y, x) = Eigen::bfloat16(dist_input(mt));
                    //B(y, x) = Eigen::bfloat16(dist_input(mt));
                }
            }

            C = A * B;
        }
        //-------------------------------------------------------------------
        const auto now_end = std::chrono::high_resolution_clock::now();
        const auto timecount = std::chrono::duration_cast<std::chrono::microseconds>(now_end - now_begin).count();
        std::cout << timecount << std::endl;
    }

    // 処理を止めておくための行
    while(true){}

    return 0;
}

結果の平均が、マイクロ秒換算で以下の通りでした。
3.3.9 188883
3.4.0 163540

この結果だけだと少し減ったなという感じですが、乱数で初期化する処理が非常に重いため、「// 乱数で初期化」のブロックをコメントアウトすると以下の通りになりました。
3.3.9 57026
3.4.0 29630

行列積の部分だけだとほぼ半減です。
ちなみに半精度浮動小数点数は以下の通りでした。
half 190695
bfloat 290134

計算だけならhalfの方が早いんですが、実践で使うにはビルド時間がつらいと思います。

Eigen3.4.0は単純に処理速度が上がったらしい

おすすめ

コメントを残すコメントをキャンセル

最近の投稿

カテゴリー

ブログ管理人について

Eigen3.4.0は単純に処理速度が上がったらしい

おすすめ

Inverted Dropoutの実装とResNetの実験[c++ Arrayfire]

プログラミングを春休みの自由研究にしてみる 7、8、9日目

第25回３分ゲーコンテストありがとうございました

コメントを残す コメントをキャンセル

最近の投稿

カテゴリー

タグ

ブログ管理人について

コメントを残すコメントをキャンセル