For what it's worth, one of the students working with me (Marco, CC'd) is
having the same failure right now for the head of develop (plus this fix:
https://github.com/gem5/gem5/pull/99), except for a tiny GPU microbenchmark
that definitely is not using PyTorch or any higher level library.

We are working on getting a backtrace to understand what's going on for us
(and then push a fix as applicable), and it's possible our problems have
the same symptom but a different root cause.  But just wanted to chime in
that there are multiple cases where this error is happening on develop
right now with SE mode.

Matt


On Tue, Jul 18, 2023 at 7:58 PM Bobby Bruce via gem5-users <
gem5-users@gem5.org> wrote:

> I’m afraid I don’t know exactly what’s causing this error, but just to
> make sure, the binary you built and as a `CustomResource` executes on your
> host? This looks like an error coming from PyTorch, not the simulator. That
> being said, I don’t understand why "build/X86/sim/faults.cc:61: panic:
> panic condition !FullSystem occurred: fault (General-Protection) detected @
> PC “ is occurring after either, that could also be the issue. Personally,
> I’m always a bit scared linking to dynamic libraries on the host as well,
>
> If you want to get around this the annoying advice is to use FS mode. It’s
> slower, and requires creation of a disk image, but it isn’t nearly as
> error-prone as SE mode. If your binary works on your host then you should
> be able to get it to work in FS mode. Using checkpoints and (if you have
> the right hardware and are using X86) KVM cores can speed things up for you
> too.
>
> Also, as a sidenote: If you’re wanting to simulate PyTorch, don’t you want
> to simulate a GPU too?
>
> --
> Dr. Bobby R. Bruce
> Room 3050,
> Kemper Hall, UC Davis
> Davis,
> CA, 95616
>
> web: https://www.bobbybruce.net
>
> On Jul 14, 2023, at 3:02 AM, Caio Vieira via gem5-users <
> gem5-users@gem5.org> wrote:
>
>
> Hi everyone,
>
> I'm trying to execute gem5 simulations using libtorch in SE mode. However,
> I get the following error message:
>
> --- Error message ---
> ...
> terminate called after throwing an instance of 'std::runtime_error'
>   what():  expected eof but found 'ident' here:
> aten::quantized_lstm.inpr input, Tensor[] orch.classes.rnn.CellPara[]
> params, bool has_biases, int num_layers, float dropout, bool train, bool
> bidirectional, bool batch_first, *, ScalarType? dtype=None, bool
> use_dynamic=False) dy
> namic=False) -> (Tensor, Tensor, Tenso Tensor, Tensor)
>                           ~~~~~ <--- HERE
>
> build/X86/sim/syscall_emul.cc:86: warn: ignoring syscall
> rt_sigprocmask(...)
>       (further warnings will be suppressed)
> build/X86/sim/syscall_emul.cc:86: warn: ignoring syscall rt_sigaction(...)
>       (further warnings will be suppressed)
> build/X86/sim/faults.cc:61: panic: panic condition !FullSystem occurred:
> fault (General-Protection) detected @ PC
> (0x7fff7a3d5898=>0x7fff7a3d5899).(0=>1)
> Memory Usage: 11842716 KBytes
> Program aborted at tick 294083905383
> --- BEGIN LIBC BACKTRACE ---
> ...
>
> The simulation fails before the first line of the main function. I believe
> that it is failing to load the libtorch library.
> Unfortunately, it is not possible to build libtorch with "-static" since
> their static builds is broken for quiet a long
> time: https://github.com/pytorch/pytorch/issues/21737
> I've tested with gem5 v22.1.0.0 and also 22.0.0.2. I've also tested using
> different GCC versions to build the simulated binary.
>
> For anyone interested in reproducing the error, I'm sending a "setup.sh"
> script to create a minimal reproducible environment.
> Simply copy and paste the script below and name it as "setup.sh" in a new
> directory, then:
>
> source setup.sh
> cmake --B build -S .
> cmake --build build
> ./<gem5> config.py build/main
>
> Best regards,
> Caio Vieira
>
> --- setup.sh ---
>
> #!/bin/bash
>
> # Bash script to create minimal reproducible environment for libtorch
> simulation
> # bug. This script creates necessary files such as a CMakeLists.txt and a
> minimal
> # main.cpp. The CMakeLists.txt file downloads and manages libtorch by
> saving it
> # in a ""_deps"" folder. Steps to reproduce the bug:
> # ./<this-script>
> # cmake -B build -S .
> # cmake --build build
> # ./<gem5> config.py build/main
>
> function create_cmake() {
>     cat > CMakeLists.txt <<- \EOF
> cmake_minimum_required(VERSION 3.22 FATAL_ERROR)
>
> # Download and manage libtorch dependency
> set(DEPENDENCY_DIR "${CMAKE_CURRENT_LIST_DIR}/_deps")
>
> file(MAKE_DIRECTORY "${DEPENDENCY_DIR}")
> if(NOT EXISTS "${DEPENDENCY_DIR}/libtorch")
>     file(DOWNLOAD
>
> https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-2.0.0%2Bcpu.zip
>         "${DEPENDENCY_DIR}/libtorch.zip")
>     file(ARCHIVE_EXTRACT
>         INPUT "${DEPENDENCY_DIR}/libtorch.zip"
>         DESTINATION "${DEPENDENCY_DIR}")
>     file(REMOVE "${DEPENDENCY_DIR}/libtorch.zip")
> endif()
> set(CMAKE_PREFIX_PATH "${DEPENDENCY_DIR}/libtorch")
> find_package(Torch REQUIRED)
> set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")
>
> project(main)
>
> # Create executable
> add_executable(${CMAKE_PROJECT_NAME} main.cpp)
> target_include_directories(${CMAKE_PROJECT_NAME} PUBLIC
> ${TORCH_INCLUDE_DIRS})
> target_link_libraries(${CMAKE_PROJECT_NAME} ${TORCH_LIBRARIES})
> set_property(TARGET ${CMAKE_PROJECT_NAME} PROPERTY CXX_STANDARD 14)
> message("Torch Libraries: ${TORCH_LIBRARIES}")
> EOF
> }
>
> function create_main() {
>     cat > main.cpp <<- \EOF
> #include <iostream>
> #include <torch/torch.h>
>
> int main(int argc, char *argv[]) {
>     std::cout << "Hello World" << std::endl;
> }
> EOF
> }
>
> function create_gem5_config() {
>     cat > config.py <<- \EOF
> # This script must be executed by gem5
> # Usage: ./<gem5> <this-script> <binary>
>
> import sys
>
> from gem5.utils.requires import ISA
> from gem5.components.boards.simple_board import SimpleBoard
> from gem5.components.cachehierarchies.classic.no_cache import NoCache
> from gem5.components.memory.single_channel import SingleChannelDDR3_1600
> from gem5.components.processors.simple_processor import SimpleProcessor
> from gem5.components.processors.cpu_types import CPUTypes
> from gem5.resources.resource import CustomResource
> from gem5.simulate.simulator import Simulator
>
> if len(sys.argv) == 1:
>     print("Provide a binary as argument", file=sys.stderr)
>     sys.exit(1)
>
> cache_hierarchy = NoCache()
> memory = SingleChannelDDR3_1600("8GiB")
> processor = SimpleProcessor(cpu_type=CPUTypes.ATOMIC, num_cores=1,
> isa=ISA.X86)
>
> board = SimpleBoard(
>         clk_freq="3GHz",
>         processor=processor,
>         memory=memory,
>         cache_hierarchy=cache_hierarchy
> )
>
> binary = CustomResource(sys.argv[1])
> board.set_se_binary_workload(binary)
>
> simulator = Simulator(board=board)
> simulator.run()
> EOF
> }
>
> create_cmake
> create_main
> create_gem5_config
> _______________________________________________
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
>
>
> _______________________________________________
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
>
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

Reply via email to