Does it work if you declare everything as PetscScalar instead of cuDoubleComplex?
> El 2 oct 2024, a las 11:23, 刘浪天 <[email protected]> escribió: > > Hi Jose, > > Since my matrix is two large, I cannot create the Mat on GPU. So I still want > to create and compute the eigenvalues of this matrix on CPU using SLEPc. > > Best, > -------------------- Langtian Liu Institute for Theorectical Physics, > Justus-Liebig-University Giessen Heinrich-Buff-Ring 16, 35392 Giessen Germany > email: [email protected] Tel: (+49)641 99 33342 > >> On Oct 2, 2024, at 11:18 AM, Jose E. Roman <[email protected]> wrote: >> >> >> For the CUDA case you should use MatCreateDenseCUDA() instead of >> MatCreateDense(). With this you pass a pointer with the data on the GPU >> memory. But I guess "new cuDoubleComplex[dim*dim]" is allocating on the CPU, >> you should use cudaMalloc() instead. >> >> Jose >> >> >>> El 2 oct 2024, a las 10:56, 刘浪天 via petsc-users <[email protected]> >>> escribió: >>> >>> Hi all, >>> >>> I am using the PETSc and SLEPc to solve the Faddeev equation of baryons. I >>> encounter a problem of function MatCreateDense when changing from CPU to >>> CPU-GPU computations. >>> At first, I write the codes in purely CPU computation in the following way >>> and it works. >>> ``` >>> Eigen::MatrixXcd H_KER; >>> Eigen::MatrixXcd G0; >>> printf("\nCompute the propagator matrix.\n"); >>> prop_matrix_nucleon_sc_av(Mn, pp_nodes, cos1_nodes); >>> printf("\nCompute the propagator matrix done.\n"); >>> printf("\nCompute the kernel matrix.\n"); >>> bse_kernel_nucleon_sc_av(Mn, pp_nodes, pp_weights, cos1_nodes, >>> cos1_weights); >>> printf("\nCompute the kernel matrix done.\n"); >>> printf("\nCompute the full kernel matrix by multiplying kernel and >>> propagator matrix.\n"); >>> MatrixXcd kernel_temp = H_KER * G0; >>> printf("\nCompute the full kernel matrix done.\n"); >>> >>> // Solve the eigen system with SLEPc >>> printf("\nSolve the eigen system in the rest frame.\n"); >>> // Get the size of the Eigen matrix >>> int nRows = (int) kernel_temp.rows(); >>> int nCols = (int) kernel_temp.cols(); >>> // Create PETSc matrix and share the data of kernel_temp >>> Mat kernel; >>> PetscCall(MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE, >>> nRows, nCols, kernel_temp.data(), &kernel)); >>> PetscCall(MatAssemblyBegin(kernel, MAT_FINAL_ASSEMBLY)); >>> PetscCall(MatAssemblyEnd(kernel, MAT_FINAL_ASSEMBLY)); >>> ``` >>> Now I change to compute the propagator and kernel matrices in GPU and then >>> compute the largest eigenvalues in CPU using SLEPc in the ways below. >>> ``` >>> cuDoubleComplex *h_propmat; >>> cuDoubleComplex *h_kernelmat; >>> int dim = EIGHT * NP * NZ; >>> printf("\nCompute the propagator matrix.\n"); >>> prop_matrix_nucleon_sc_av_cuda(Mn, pp_nodes.data(), cos1_nodes.data()); >>> printf("\nCompute the propagator matrix done.\n"); >>> printf("\nCompute the kernel matrix.\n"); >>> kernel_matrix_nucleon_sc_av_cuda(Mn, pp_nodes.data(), pp_weights.data(), >>> cos1_nodes.data(), cos1_weights.data()); >>> printf("\nCompute the kernel matrix done.\n"); >>> printf("\nCompute the full kernel matrix by multiplying kernel and >>> propagator matrix.\n"); >>> // Map the raw arrays to Eigen matrices (column-major order) >>> auto *h_kernel_temp = new cuDoubleComplex [dim*dim]; >>> matmul_cublas_cuDoubleComplex(h_kernelmat,h_propmat,h_kernel_temp,dim,dim,dim); >>> printf("\nCompute the full kernel matrix done.\n"); >>> >>> // Solve the eigen system with SLEPc >>> printf("\nSolve the eigen system in the rest frame.\n"); >>> int nRows = dim; >>> int nCols = dim; >>> // Create PETSc matrix and share the data of kernel_temp >>> Mat kernel; >>> auto* h_kernel = (std::complex<double>*)(h_kernel_temp); >>> PetscCall(MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE, >>> nRows, nCols, h_kernel_temp, &kernel)); >>> PetscCall(MatAssemblyBegin(kernel, MAT_FINAL_ASSEMBLY)); >>> PetscCall(MatAssemblyEnd(kernel, MAT_FINAL_ASSEMBLY)); >>> But in this case, the compiler told me that the MatCreateDense function >>> uses the data pointer as type of "thrust::complex<double>" instead of >>> "std::complex<double>". >>> I am sure I only configured and install PETSc in purely CPU without GPU and >>> this codes are written in the host function. >>> Why the function changes its behavior? Did you also meet this problem when >>> writing the cuda codes and how to solve this problem. >>> I tried to copy the data to a new thrust::complex<double> matrix but this >>> is very time consuming since my matrix is very big. Is there a way to >>> create the Mat from the original data without changing the data type to >>> thrust::complex<double> in the cuda applications? Any response will be >>> appreciated. Thank you! >>> >>> Best wishes, >>> Langtian Liu >>> >>> ------ >>> Institute for Theorectical Physics, Justus-Liebig-University Giessen >>> Heinrich-Buff-Ring 16, 35392 Giessen Germany >
