-- Dr. Bobby R. Bruce Room 3050, Kemper Hall, UC Davis Davis, CA, 95616 web: https://www.bobbybruce.net
On Jul 18, 2023, at 6:08 PM, Matt Sinclair <mattdsinclair.w...@gmail.com> wrote:
For what it's worth, one of the students working with me (Marco, CC'd) is having the same failure right now for the head of develop (plus this fix: https://github.com/gem5/gem5/pull/99), except for a tiny GPU microbenchmark that definitely is not using PyTorch or any higher level library.
We are working on getting a backtrace to understand what's going on for us (and then push a fix as applicable), and it's possible our problems have the same symptom but a different root cause. But just wanted to chime in that there are multiple cases where this error is happening on develop right now with SE mode.
Matt
I’m afraid I don’t know exactly what’s causing this error, but just to make sure, the binary you built and as a `CustomResource` executes on your host? This looks like an error coming from PyTorch, not the simulator. That being said, I don’t understand why "build/X86/sim/faults.cc:61: panic: panic condition !FullSystem occurred: fault (General-Protection) detected @ PC “ is occurring after either, that could also be the issue. Personally, I’m always a bit scared linking to dynamic libraries on the host as well,
If you want to get around this the annoying advice is to use FS mode. It’s slower, and requires creation of a disk image, but it isn’t nearly as error-prone as SE mode. If your binary works on your host then you should be able to get it to work in FS mode. Using checkpoints and (if you have the right hardware and are using X86) KVM cores can speed things up for you too.
Also, as a sidenote: If you’re wanting to simulate PyTorch, don’t you want to simulate a GPU too?
On Jul 14, 2023, at 3:02 AM, Caio Vieira via gem5-users <gem5-users@gem5.org> wrote:
Hi everyone,
I'm trying to execute gem5 simulations using libtorch in SE mode.
However, I get the following error message:
--- Error message ---
...
terminate called after throwing an instance of
'std::runtime_error'
what(): expected eof but found 'ident' here:
aten::quantized_lstm.inpr input, Tensor[]
orch.classes.rnn.CellPara[] params, bool has_biases, int
num_layers, float dropout, bool train, bool bidirectional, bool
batch_first, *, ScalarType? dtype=None, bool use_dynamic=False) dy
namic=False) -> (Tensor, Tensor, Tenso Tensor, Tensor)
~~~~~ <--- HERE
build/X86/sim/syscall_emul.cc:86: warn: ignoring syscall
rt_sigprocmask(...)
(further warnings will be suppressed)
build/X86/sim/syscall_emul.cc:86: warn: ignoring syscall
rt_sigaction(...)
(further warnings will be suppressed)
build/X86/sim/faults.cc:61: panic: panic condition !FullSystem
occurred: fault (General-Protection) detected @ PC
(0x7fff7a3d5898=>0x7fff7a3d5899).(0=>1)
Memory Usage: 11842716 KBytes
Program aborted at tick 294083905383
--- BEGIN LIBC BACKTRACE ---
...
The simulation fails before the first line of the main function. I
believe that it is failing to load the libtorch library.
Unfortunately, it is not possible to build libtorch with "-static"
since their static builds is broken for quiet a long
time: https://github.com/pytorch/pytorch/issues/21737
I've tested with gem5 v22.1.0.0
and also 22.0.0.2. I've also tested using different GCC versions
to build the simulated binary.
For anyone interested in reproducing the error, I'm sending a
"setup.sh" script to create a minimal reproducible environment.
Simply copy and paste the script below and name it as "setup.sh"
in a new directory, then:
source setup.sh
cmake --B build -S .
cmake --build build
./<gem5> config.py build/main
Best regards,
Caio Vieira
--- setup.sh ---
#!/bin/bash
# Bash script to create minimal reproducible environment for
libtorch simulation
# bug. This script creates necessary files such as a
CMakeLists.txt and a minimal
# main.cpp. The CMakeLists.txt file downloads and manages libtorch
by saving it
# in a ""_deps"" folder. Steps to reproduce the bug:
# ./<this-script>
# cmake -B build -S .
# cmake --build build
# ./<gem5> config.py build/main
function create_cmake() {
cat > CMakeLists.txt <<- \EOF
cmake_minimum_required(VERSION 3.22 FATAL_ERROR)
# Download and manage libtorch dependency
set(DEPENDENCY_DIR "${CMAKE_CURRENT_LIST_DIR}/_deps")
file(MAKE_DIRECTORY "${DEPENDENCY_DIR}")
if(NOT EXISTS "${DEPENDENCY_DIR}/libtorch")
file(DOWNLOAD
https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-2.0.0%2Bcpu.zip
"${DEPENDENCY_DIR}/libtorch.zip")
file(ARCHIVE_EXTRACT
INPUT "${DEPENDENCY_DIR}/libtorch.zip"
DESTINATION "${DEPENDENCY_DIR}")
file(REMOVE "${DEPENDENCY_DIR}/libtorch.zip")
endif()
set(CMAKE_PREFIX_PATH "${DEPENDENCY_DIR}/libtorch")
find_package(Torch REQUIRED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")
project(main)
# Create executable
add_executable(${CMAKE_PROJECT_NAME} main.cpp)
target_include_directories(${CMAKE_PROJECT_NAME} PUBLIC
${TORCH_INCLUDE_DIRS})
target_link_libraries(${CMAKE_PROJECT_NAME} ${TORCH_LIBRARIES})
set_property(TARGET ${CMAKE_PROJECT_NAME} PROPERTY CXX_STANDARD
14)
message("Torch Libraries: ${TORCH_LIBRARIES}")
EOF
}
function create_main() {
cat > main.cpp <<- \EOF
#include <iostream>
#include <torch/torch.h>
int main(int argc, char *argv[]) {
std::cout << "Hello World" << std::endl;
}
EOF
}
function create_gem5_config() {
cat > config.py <<- \EOF
# This script must be executed by gem5
# Usage: ./<gem5> <this-script> <binary>
import sys
from gem5.utils.requires import ISA
from gem5.components.boards.simple_board import SimpleBoard
from gem5.components.cachehierarchies.classic.no_cache import
NoCache
from gem5.components.memory.single_channel import
SingleChannelDDR3_1600
from gem5.components.processors.simple_processor import
SimpleProcessor
from gem5.components.processors.cpu_types import CPUTypes
from gem5.resources.resource import CustomResource
from gem5.simulate.simulator import Simulator
if len(sys.argv) == 1:
print("Provide a binary as argument", file=sys.stderr)
sys.exit(1)
cache_hierarchy = NoCache()
memory = SingleChannelDDR3_1600("8GiB")
processor = SimpleProcessor(cpu_type=CPUTypes.ATOMIC, num_cores=1,
isa=ISA.X86)
board = SimpleBoard(
clk_freq="3GHz",
processor=processor,
memory=memory,
cache_hierarchy=cache_hierarchy
)
binary = CustomResource(sys.argv[1])
board.set_se_binary_workload(binary)
simulator = Simulator(board=board)
simulator.run()
EOF
}
create_cmake
create_main
create_gem5_config
_______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
|