Others | Cuda-DGEMM-MIPS


0 kişi
bu görevi tamamlamak istiyor

Görev Detayı

1. Open a Visual Studio empty C++ Project and run the DGEMM code provided in your reference book using 5 versions and 5 matrix sizes:
(10 points) V1: Unoptimized (Code is provided in the end of this directive. )
(10 points) V2: AVX (Code is provided in your book. )
(10 points) V3: AVX + Unroll (Code is provided in your book. )
(10 points) V4: AVX + Unroll + blocked (Code is provided in your book. )
(10 points) V5: GPU (CUDA or OpenCL) (Code from midterm assignment can be used.)
Matrix Sizes: 256x256 512x512 768x768 1024x2024 1280x1280

Measure the runtime for each version and each matrix size. Measure the runtime of functions only, not the printf() runtime.
All codes, including different optimizations are given in your reference book. GPU code for midterm assignment can be used for V5. Just run them in your computer and measure the runtimes in seconds. Draw a comparison table and plot a comparison graph. Compare it with the results shown below that are taken from the reference book. You will use seconds or milliseconds not GFLOPS as measurement unit.
Important note 1: All copy-paste works will get zero point. Do not share your homework.
Note 2: #include <immintrin.h> is needed for running AVX intrinsics.

2. Write an academic report. It should have following sections:
A. Abstract
B. Introduction
C. Literature Review
D. Hardware Acceleration Methods
1. Unoptimized DGEMM Code
2. AVX Optimization
3. AVX + Unrolling Optimization
4. AVX + Unrolling + Blocking Optimization
5. GPU Optimization
E. Experimental Setup
F. Results and Discussion
G. Conclusion
H. References

Academic Report Evaluation:
(10 points) Abstract and Introduction
(10 points) Literature Review and References
(20 points) Explanation of Hardware Acceleration Methods and Source Codes
(10 points) Commenting on Obtained Results

Submission Material
(5 separate source code files for each method should be prepared.)
1. Unoptimized Source Code
2. AVX Optimization Source Code
3. AVX + Unrolling Optimization Source Code
4. AVX + Unrolling + Blocking Optimization Source Code
5. GPU Optimization Source Code
6. Academic report in MS Word format
7. Video link showing the face of one group member clearly and demo running of each method’s source code.

Unoptimized Code:
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <cuda.h>
#include <device_functions.h>
#include <cuda_runtime_api.h>
#include <immintrin.h>

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define ARRAY_SIZE 256

void dgemm(int n, const double* A, const double* B, double* C)
int i, j, k;
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
C[i*n + j] = 0;
for (k =0; k < n; k++)
C[i*n + j] += A[k + i * n] * B[k*n + j];

int main()
double *A = (double*)calloc(ARRAY_SIZE * ARRAY_SIZE, sizeof(double));
double *B = (double*)calloc(ARRAY_SIZE * ARRAY_SIZE, sizeof(double));
double *C1 = (double*)calloc(ARRAY_SIZE * ARRAY_SIZE, sizeof(double));
double *C2 = (double*)calloc(ARRAY_SIZE * ARRAY_SIZE, sizeof(double));

for (int i = 0; i < ARRAY_SIZE * ARRAY_SIZE; i++)
A[i] = rand() % 100;
B[i] = rand() % 100;

clock_t t;

t = clock();
dgemm(ARRAY_SIZE, A, B, C1);
t = clock() - t;

double elapsed_time = ((double)t) / CLOCKS_PER_SEC;
printf("Unoptimized DGEMM code took %.6f seconds to execute.\n", elapsed_time);

return 0;

Bütçe: 100 TL

İşin Yapılacağı Konum: ONLINE
Görevin Başlangıç Tarihi: 11-06-2020
Görevin Bitiş Tarihi: 20-06-2020
Kategori: Yazılım


Aday aranıyor.