Re: [dmlc/tvm] [RFC] Relay C++ Frontend (#2685)

2019-04-02 Thread Lianmin Zheng
Should we open an RFC to discuss how to port autotvm and topi to c++?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2685#issuecomment-478954829

Re: [dmlc/tvm] [RFC] Register Relay VM design (#2915)

2019-04-02 Thread Tianqi Chen
I want to highlight that due to public archive principle. The summary of in 
person discussion only serve as summary information and suggestions instead of 
the final design decision. 

The design decision should be make in this thread, allowing everyone to 
participate. So at this moment, the discussion is still open as RFC

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2915#issuecomment-479086374

[dmlc/tvm] [RFC][AUTOTVM] Auto-Scheduler from Compute Decleration (#2954)

2019-04-02 Thread Lianmin Zheng
# Auto-Scheduler
TVM decouples kernel implementation into compute and schedule. The compute part 
is a friendly DSL that can describe algorithms intuitively. However, the 
schedule part still requires strong expert knowledge and time-consuming tuning 
to provide decent performance. The tuning process is partially automated by the 
existing autotvm package, but a human-engineered template is still required.

This RFC proposes a "real" autotvm, which we can call auto scheduler. It aims 
at removing all human efforts on schedule part.

# Proposed Design 
The auto-scheduler is built on the exsiting autotvm package. It will generate a 
template from compute decleration. Then this template can either be 

* Statically filled by heuristic rules and cost functions to provide reasonable 
performance, or
* Dynamically tuned by autotvm to provide better performance with some time 
budget

The auto-scheduler takes a computation graph described by tvm DSL as input, 
then classify the type of read/write patterns and the type of computation. It 
dispatches the declarations to different "meta templates". The "meta templates" 
generates autotvm templates from the declaration. There are four types of meta 
templates : simple reduction, complex reduction, direct compute, and 
location-tunable compute. The auto-scheduler will do parallelization, 
vectorization, tiling, and operator fusion.

The code is available on [my 
branch](https://github.com/merrymercy/tvm/tree/auto-scheduler). The current 
implementation is in pure python bacuse autotvm is mainly written in python. 
But move the whole autotvm package to c++ is within long-term plan. The code is 
organized as follows.
* Analysis on access pattern 
[python/tvm/autotvm/auto_schedule/stage_analysis.py](https://github.com/merrymercy/tvm/blob/auto-scheduler/python/tvm/autotvm/auto_schedule/stage_analysis.py)
* CPU backend 
[python/tvm/autotvm/auto_schedule/backend/cpu.py](https://github.com/merrymercy/tvm/blob/auto-scheduler/python/tvm/autotvm/auto_schedule/backend/cpu.py)
* GPU backend 
[python/tvm/autotvm/auto_schedule/backend/gpu.py](https://github.com/merrymercy/tvm/blob/auto-scheduler/python/tvm/autotvm/auto_schedule/backend/gpu.py)
* Configuration for the auto-scheduler 
[python/tvm/autotvm/auto_schedule/common.py](https://github.com/merrymercy/tvm/blob/auto-scheduler/python/tvm/autotvm/auto_schedule/common.py)
* Experimental auto-packing for optimizing vectorization and locality 
[python/tvm/autotvm/auto_schedule/auto_pack.py](https://github.com/merrymercy/tvm/blob/auto-scheduler/python/tvm/autotvm/auto_schedule/auto_packing.py)
* Test case 
[tests/python/unittest/test_auto_scheduler.py](https://github.com/merrymercy/tvm/blob/auto-scheduler/tests/python/unittest/test_auto_scheduler.py)

## API
There are only two user-oriented API calls

* `autotvm.AutoSchedulerOptions(**kwargs)`
This is used to configure the auto scheduler. The arguments include hardware 
configurations(vector lanes, number of threads, size of shared memory, etc) and 
tuning configurations (how many tuning knobs to generate).
* `autotvm.create_schedule(tensors)`
This is similar to `tvm.create_schedule`, but returns an already optimized 
schedule.

```python
A = tvm.placeholder((128,), name='A')
B = tvm.placeholder((128,), name='B')
C = tvm.compute((128,),  lambda i: A[i] + B[i] * 2)

with tvm.target.create('llvm'):
with autotvm.AutoSchedulerOptions(vec_size=8, num_threads=16):
s, bufs = autotvm.create_schedule([A, B, C])

# NO SCHEDULE REQUIRED

func = tvm.build(s, bufs)
```

# Examples
1. 
[Tutorial](https://github.com/merrymercy/tvm/blob/auto-scheduler/tutorials/autotvm/auto_scheduler.py)
   This is a tutorial on how to statically use the auto-scheduler or auto-tune 
it.
2. [Schedule a whole 
network](https://github.com/merrymercy/tvm/blob/auto-scheduler/scripts/training-with-tvm.py)
   This example is adopted from #2498. It is a LeNet like convolution neural 
network written purely by tvm (without graph IR). The auto-scheduler also 
provides basic operator fusion for it. Right now we can only run forward pass. 
I am working on fixing the backward pass.

# Performance
One reachable performance goal is to replace more than 90% schedule code in 
existing TOPI by this auto-scheduler. I haven't done the experiments, but I 
believe the generated templates cover the existing search space for most 
operators (includes conv2d, reduction, ...).

Another part of the goal is to provide reasonable static performance. In the 
"Schedule a whole network" example, for batched forward pass, the current 
performance is 1.2x slower than out-of-the-box TF + Keras, and 10x faster than 
naive schedule (fuse and parallel outer loop) on an Intel i7-8750H. For static 
usage, the input of the auto-scheduler are parameters for heuristic rules and 
hardware configurations. We will gather all inputs into a global config, so 
users can still do some quick "tuning".

# Todo list
 - [ ] Performance test and improvement 

Re: [dmlc/tvm] [RFC][Graph Tuner] Graph level auto-tuning (#1585)

2019-04-02 Thread Yao Wang
@FrozenGene The default schedule here for x86 eliminates most layout 
transformations. It should have similar performance with "apply_history_best". 
I'll update the data for "apply_history_best".

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/1585#issuecomment-479215581

Re: [dmlc/tvm] [RFC][AUTOTVM] Auto-Schedule from Compute Declaration (#2954)

2019-04-02 Thread Yao Wang
Thank you for opening this RFC! I have a question regarding user API. Does the 
hardware information needed for autotvm.AutoSchedulerOptions(**kwargs) function 
pre-defined for different hardware architectures? If so, how much more 
information does a user need to provide to differentiate between different 
minor types of the same device target, such as Intel Xeon Platinum vs Xeon 
Haswell, or Nvida K80 vs V100? Today we have a single template for minor device 
types. Will auto-scheduler provide different templates?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2954#issuecomment-479219911

Re: [dmlc/tvm] [RFC] Relay C++ Frontend (#2685)

2019-04-02 Thread Jared Roesch
I think we should consider it, I think having the tuner sit in Python is
okay the more important bit being the schedules and other compiler pieces
in C++ for integrating the
compiler. I talked with some PyTorch people today and they suggested a
Python free version of the compiler would be important if they use it in
JITing.

- Jared

On Tue, Apr 2, 2019 at 4:28 AM Lianmin Zheng 
wrote:

> Should we open an RFC to discuss how to port autotvm and topi to c++?
>
> --
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly or view it on GitHub:
> https://github.com/dmlc/tvm/issues/2685#issuecomment-478954829


Re: [dmlc/tvm] [RFC][AUTOTVM] Auto-Schedule from Compute Declaration (#2954)

2019-04-02 Thread Jared Roesch
@merrymercy how much work is there per backend? looking over the code now will 
follow up with more questions later.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2954#issuecomment-47939

Re: [dmlc/tvm] [RFC] Structured Error Handling Mechanism (#2279)

2019-04-02 Thread Tianqi Chen
Close as the first part scaffolding is in. Let us open new RFCs for new error 
class proposals

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2279#issuecomment-479313558

Re: [dmlc/tvm] [RFC] Structured Error Handling Mechanism (#2279)

2019-04-02 Thread Tianqi Chen
Closed #2279.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2279#event-2248518245

Re: [dmlc/tvm] [RFC][AUTOTVM] Auto-Schedule from Compute Declaration (#2954)

2019-04-02 Thread Yizhi Liu
@merrymercy Could you elaborate a bit about the 4 types (simple reduction, 
complex reduction, direct compute, and location-tunable compute) ? Also it 
would be helpful if you can give an example of how the DAG looks like.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2954#issuecomment-479345162

[TVM Discuss] [Development] ONNX model compilation fails with a model that previously worked

2019-04-02 Thread mnboos via TVM Discuss


Okay, I for some reason, there was `opt_level=3` set here. I changed it to 2 
and now it fails with:

```python
  File "/home/martin/Dev/xyz/src/tvm/compile_model.py", line 112, in 
compile_model
lib.export_library(lib_name)
  File 
"/home/martin/.local/lib/python3.6/site-packages/tvm-0.6.dev0-py3.6-linux-x86_64.egg/tvm/module.py",
 line 128, in export_library
fcompile(file_name, files, **kwargs)
  File 
"/home/martin/.local/lib/python3.6/site-packages/tvm-0.6.dev0-py3.6-linux-x86_64.egg/tvm/contrib/cc.py",
 line 33, in create_shared
_linux_shared(output, objects, options, cc)
  File 
"/home/martin/.local/lib/python3.6/site-packages/tvm-0.6.dev0-py3.6-linux-x86_64.egg/tvm/contrib/cc.py",
 line 90, in _linux_shared
raise RuntimeError(msg)
RuntimeError: Compilation error:
/usr/bin/ld: /tmp/tmp_i8odkrm/lib.o: relocation R_X86_64_32S against 
`.rodata.cst4' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: nonrepresentable section on output
collect2: error: ld returned 1 exit status
```





---
[Visit 
Topic](https://discuss.tvm.ai/t/onnx-model-compilation-fails-with-a-model-that-previously-worked/2081/5)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/dc8586351c622e6c603f33954ac4cb99105a6ebf976a90aa40d5ec27c5154577).

Tianqi Chen, UW, Seattle, WA, 98105, United States
http://tracking.discuss.tvm.ai/tracking/unsubscribe?msgid=v2xKOQon2dXtTy-z6jqg4w2