I have deployed rpc tracker to k8s cluster, at the begining it looks like it was working: - devices can connect to rpc tracker - query rpc tracker results with free devices - rpc is behind k8s' service and is configured in deployment with one Pod
But when I do simple benchmark run, i'm getting following error: ``` Traceback (most recent call last): File "/workspace/tcl_scripts/benchmark.py", line 106, in <module> main(args) File "/workspace/tcl_scripts/benchmark.py", line 64, in main compile_upload_benchmark_model(args, mod, params, target) File "/workspace/tcl_scripts/benchmark.py", line 35, in compile_upload_benchmark_model args.rpc_key, args.rpc_tracker, args.rpc_port, timeout=500) File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 735, in request_remote remote = tracker.request(device_key, priority=priority, session_timeout=timeout) File "/workspace/python/tvm/rpc/client.py", line 418, in request "Cannot request %s after %d retry, last_error:%s" % (key, max_retry, str(last_err)) RuntimeError: Cannot request android after 5 retry, last_error:Traceback (most recent call last): 3: TVMFuncCall 2: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&) 1: tvm::runtime::RPCClientConnect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::TVMArgs) 0: tvm::runtime::RPCConnect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::TVMArgs) File "/workspace/src/runtime/rpc/rpc_socket_impl.cc", line 72 TVMError: --------------------------------------------------------------- An error occurred during the execution of TVM. For more information, please see: https://tvm.apache.org/docs/errors.html --------------------------------------------------------------- Check failed: (sock.Connect(addr)) is false: Connect to 10.70.227.3:5001 failed ``` Service: ``` apiVersion: v1 kind: Service metadata: name: tvm-rpc-tracker-service spec: type: LoadBalancer selector: app: tvm-rpc-tracker ports: - name: rpc1 protocol: TCP port: 9190 targetPort: 9190 - name: rpc2 protocol: TCP port: 5000 targetPort: 5000 - name: rpc3 protocol: TCP port: 5001 targetPort: 5001 - name: rpc4 protocol: TCP port: 5002 targetPort: 5002 - name: rpc5 protocol: TCP port: 5003 targetPort: 5003 ``` Deployment: ``` apiVersion: apps/v1 kind: Deployment metadata: name: tvm.rpc-tracker-deployment labels: app: tvm-rpc-tracker spec: replicas: 1 selector: matchLabels: app: tvm-rpc-tracker template: metadata: labels: app: tvm-rpc-tracker spec: nodeSelector: location: dc containers: - name: tvm image: tvm:0.0.3 command: ["/bin/bash", "-ec", "/usr/bin/python3 -m tvm.exec.rpc_tracker --host=0.0.0.0 --port=9190"] ports: - containerPort: 9190 - containerPort: 5000 - containerPort: 5001 - containerPort: 5002 - containerPort: 5003 ``` Question, how many 500* ports should I open/forward? Does all of them should be TCP? Have an idea how to debug it? --- [Visit Topic](https://discuss.tvm.apache.org/t/have-anyone-deployed-rpc-tracker-to-k8s-cluster/12242/1) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/ad1ab548c660d7f9474ec266a5710f307277fa462690d70be79d12472b40261c).