[Apache TVM Discuss] [Development/RFC] [RFC][Quantization] A new quantization framework in TVM: initial RFC (1/4)

2021-04-26 Thread Animesh Jain via Apache TVM Discuss
Thanks, that makes sense. I was thinking that while calibration, you could use different attributes for `simulated_quantize` and `simulated_dequantize` ops. In the callback of calibrating an operator, one can simulate the affine space and argue about scales and zero points. But for capturing r

[Apache TVM Discuss] [Development/RFC] [RFC][Quantization] A new quantization framework in TVM: initial RFC (1/4)

2021-04-26 Thread Animesh Jain via Apache TVM Discuss
I apologize for the long delay. Thanks @electriclilies and team for nicely written RFC. I support the idea. Reading through the comments, it seems that many of us are in agreement about the AutoQ and its reliance on QNN extension. The mentioned pain points mostly revolve around * The inconsi

[Apache TVM Discuss] [Development] Role of the LLVM autovectorizer in TVM

2020-11-09 Thread Animesh Jain via Apache TVM Discuss
@kevinthesun Pinging in case you have wondered about this before --- [Visit Topic](https://discuss.tvm.apache.org/t/role-of-the-llvm-autovectorizer-in-tvm/8388/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](htt

[Apache TVM Discuss] [Development] Quantized models and legalization pass

2020-10-28 Thread Animesh Jain via Apache TVM Discuss
Sorry for late reply. Can you try this? tinfo is nothing but just te placeholder. ~~~ diff --git a/python/tvm/relay/qnn/op/legalizations.py b/python/tvm/relay/qnn/op/legalizations.py index 50e5a02f8..8add434c1 100644 --- a/python/tvm/relay/qnn/op/legalizations.py +++ b/python/tvm/relay/qnn/op/

[TVM Discuss] [Development] Loop partitioning, padding and tensorization

2020-08-28 Thread Animesh Jain via TVM Discuss
How about using Relay Legalize pass to add an explicit padding at the graph level? --- [Visit Topic](https://discuss.tvm.ai/t/loop-partitioning-padding-and-tensorization/7753/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [

[TVM Discuss] [Development/RFC] [RFC] Using arm intrinsics to implement fixed point multiplication in TVM

2020-07-01 Thread Animesh Jain via TVM Discuss
Hi @giuseros You are correct that `qnn.conv2d` and `qnn.requantize` are different operators. And both of them are lowered to a sequence of Relay operators. But, here the strength of Relay comes in. Relay fuses `nn.conv2d` followed by a large number of elemwise ops into one operator. This can

[TVM Discuss] [Development/RFC] [RFC] Using arm intrinsics to implement fixed point multiplication in TVM

2020-07-01 Thread Animesh Jain via TVM Discuss
@tqchen The problem arises because LLVM codegen is not able to use suitable instructions. A fixed point multiply at Relay level will have to upcast the input tensors to int64. ARM instructions that @giuseros shared take int32 tensors and perform the upcasting internally in the HW (please corre

[TVM Discuss] [Development/RFC] [RFC] Using arm intrinsics to implement fixed point multiplication in TVM

2020-07-01 Thread Animesh Jain via TVM Discuss
Thanks for the nice RFC. Trying to understand if I missed anything. What will happen for non-ARM machines? Are we going to use fixed_point_multiply relay operator for non-ARM machines and then use injective schedule? --- [Visit Topic](https://discuss.tvm.ai/t/rfc-using-arm-intrinsics-to

[TVM Discuss] [Development/RFC] [RFC][BYOC] Data Calibration Flow

2020-06-30 Thread Animesh Jain via TVM Discuss
I think we are getting confused because of the overloaded term quantization. To be precise, maybe we can stick to certain terms * *QNN Dialect* - Framework (like TF/PyTorch/MXNet) performs quantization. Relay parser reads this pre-quantized model and creates a QNN-dialect graph. QNN ops are l

[TVM Discuss] [Development/RFC] [RFC][BYOC] Data Calibration Flow

2020-06-30 Thread Animesh Jain via TVM Discuss
LGTM. I think we can rename to `get_calibration_data` or `get_profiling_data` instead of `calibrate_partition_gaph`. I think calibration means more than collecting i/o tensors (for quantization, it means choosing min/max such that quantized data representation is similar to float32 data repres

Re: [apache/incubator-tvm] [RFC] Improve quantized convolution performance for armv8 architectures (#5754)

2020-06-22 Thread Animesh Jain
I push an empty commit to retrigger the CI - https://coderwall.com/p/vkdekq/git-commit-allow-empty -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/apache/incubator-tvm/pull/5754#issuecomment-647622695

Re: [apache/incubator-tvm] [RFC] Improve quantized convolution performance for armv8 architectures (#5754)

2020-06-16 Thread Animesh Jain
@FrozenGene Can you please review when you get time? -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/apache/incubator-tvm/pull/5754#issuecomment-644902827

Re: [apache/incubator-tvm] [RFC] Improve quantized convolution performance for armv8 architectures (#5754)

2020-06-12 Thread Animesh Jain
@FrozenGene @giuseros If QNN Legalization is causing issues, we can remove QNN legalization for ARM CPUs altogether and move the logic to Alter Op layout. Alter op layout might become more complicated (like we might have to handle uint8 x int8 input and kernel dtype in alter op layout now). Just

[TVM Discuss] [Development/RFC] [RFC] Improve quantized convolution performance for armv8 architectures

2020-06-09 Thread Animesh Jain via TVM Discuss
Also cc @FrozenGene @thierry @masahi --- [Visit Topic](https://discuss.tvm.ai/t/rfc-improve-quantized-convolution-performance-for-armv8-architectures/6920/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://

[TVM Discuss] [Development/RFC] [RFC] Search-based Automated Quantization

2020-04-27 Thread Animesh Jain via TVM Discuss
Ping @ziheng, I was wondering if you are pursuing this direction and have any update. --- [Visit Topic](https://discuss.tvm.ai/t/rfc-search-based-automated-quantization/5483/18) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [

[TVM Discuss] [Development/RFC] [RFC] Search-based Automated Quantization

2020-04-09 Thread Animesh Jain via TVM Discuss
Hi @ziheng, I was wondering if you got a chance to work on this further. Any kind of update? --- [Visit Topic](https://discuss.tvm.ai/t/rfc-search-based-automated-quantization/5483/17) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these em

[TVM Discuss] [Development] Separate Relay Depthwise Conv operator

2020-03-27 Thread Animesh Jain via TVM Discuss
Currently, Relay conv2d internally decides whether a Relay Conv2d operator is depthwise or not. This makes code somewhat messy - lots of it conditions and indirections, quite confusing HWOI vs HWOI kernel layout. In addition, it is difficult to understand from the debug runtime if the conv ope

Re: [dmlc/tvm] [RFC] Add AVX512VNNI support for TVM (#3388)

2019-10-11 Thread Animesh Jain
Hi @jianyuh I am getting following error when I try to run my benchmark. It gives following error, ~~~ LLVM ERROR: Cannot select: 0x23809ef0: v16i32 = X86ISD::VPDPBUSD 0x210a09a8, 0x210a02c0, 0x19eb81b0 0x210a09a8: v16i32 = BUILD_VECTOR Constant:i32<0>, Constant:i32<0>, Constant:i32<0>, Cons

Re: [dmlc/tvm] [QNN] [RFC] - Adding QNN operators with simple lowering (#3617)

2019-09-25 Thread Animesh Jain
Closed #3617. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/3617#event-2663467814

Re: [dmlc/tvm] [QNN] [RFC] - Adding QNN operators with simple lowering (#3617)

2019-09-25 Thread Animesh Jain
This is solved. Closing. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/3617#issuecomment-535267980

Re: [dmlc/tvm] [RFC] AlterOpLayout Pass Refactoring (#3670)

2019-08-16 Thread Animesh Jain
@yzhliu Gave it some more thoughts over last few days. I think there might be slightly better way to deal with layouts. * Instead of directly using the `transpose` operators, maybe we can use some new annotation ops like `annotate.change_layout` (or maybe use `layout_transform`). This will hav

Re: [dmlc/tvm] [RFC] AlterOpLayout Pass Refactoring (#3670)

2019-08-09 Thread Animesh Jain
I see. I missed the implementation detail point. My first preference is place it inside `Type` (but I guess that maybe is not the preferred choice as of now given how frameworks handles layout). The second option that you give is pretty good too. However, how do we read the layout for example i

Re: [dmlc/tvm] [RFC] AlterOpLayout Pass Refactoring (#3670)

2019-08-09 Thread Animesh Jain
If its ok, I will give a couple of reasons why I think treating layout as first class citizens is important (The world can do with one *more* opinion :) ) * It seems to me that layout was an afterthought for the frameworks. They started with just one layout, as deep learning progressed, we reali

Re: [dmlc/tvm] [RFC] AlterOpLayout Pass Refactoring (#3670)

2019-08-06 Thread Animesh Jain
What do you guys think about having `Layout` as a member of `TensorType`? Currently `Type` basically means dtype and shape. I think it is very useful to have `Layout` there as well. If thats the case, the `Legalize` API will get arg_dtypes, and thus layout, enabling transformation based on input

Re: [dmlc/tvm] [QNN] [RFC] - Adding QNN operators with simple lowering (#3617)

2019-08-01 Thread Animesh Jain
Thanks @jackwish and @FrozenGene I understand your points. This can be treated as optimization then. If the input zero point is zero OR if the input and output quantization params are same, don't cast, directly apply maxpool. Generally, we would like to keep QNN APIs generic. So, if MxNet for s

Re: [dmlc/tvm] [QNN] [RFC] - Adding QNN operators with simple lowering (#3617)

2019-08-01 Thread Animesh Jain
Thanks @jackwish for confirming the python lowering looks good. For max pooling, we used casting, because we have to subtract the zero point from the quantized tensor. That subtract needs to happen in higher precision than (u)int8. Correct me if I am wrong. -- You are receiving this because yo

Re: [dmlc/tvm] [QNN] [RFC] - Adding QNN operators with simple lowering (#3617)

2019-08-01 Thread Animesh Jain
@FrozenGene Can you please review #3627 -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/3617#issuecomment-517491952

Re: [dmlc/tvm] [QNN] [RFC] - Adding QNN operators with simple lowering (#3617)

2019-07-26 Thread Animesh Jain
@tqchen @FrozenGene Did you get a chance to take a look at this? Please let us know your thoughts. We have some more QNN ops in the working and following this proposal for now. Will be good if we can have some feedback here. -- You are receiving this because you are subscribed to this thread.

Re: [dmlc/tvm] [QNN] [RFC] QNN Dialect -- Prequantize Models (#3591)

2019-07-25 Thread Animesh Jain
@jnorwood We are using intrinsics for Skylake and there is already a PR to take advantage of VNNI intrinsics #3388 -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/3591#issuecomment-5151750

Re: [dmlc/tvm] [QNN] [RFC] - Adding QNN operators with simple lowering (#3617)

2019-07-25 Thread Animesh Jain
We added a QNN max_pool2d operator to show the file changes required in this proposal. Please share your thoughts! -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/3617#issuecomment-51514418

Re: [dmlc/tvm] [QNN] [RFC] QNN Dialect -- Prequantize Models (#3591)

2019-07-25 Thread Animesh Jain
Thanks @jackwish This is a very good analysis. Everything makes sense. I upvote for restricting to `(u)int8` for now for `Quantize` and `Dequantize`. If in future, we see `(u)int16`, we can tackle then. `int32` is highly unlikely (why not just go to `FP32` as you say). -- You are receiving th

[dmlc/tvm] [QNN] [RFC] - Adding QNN operators with simple lowering (#3617)

2019-07-24 Thread Animesh Jain
Relevant QNN Dialect RFC - #3591 Some QNN operators like Requantize and Conv2D are more amenable to going through C++ lowering pass. A couple of factors where C++ implementation seems better is - when new operator is conceptually very different from existing operators (Requantize), when input/

Re: [dmlc/tvm] [QNN] [RFC] QNN Dialect -- Prequantize Models (#3591)

2019-07-23 Thread Animesh Jain
@FrozenGene I dont think `requantize` should take output_min and output_max. We can use `requantize` after/before any operator, where `relu` might not be at all applicable. Instead, I would suggest having two clip operators. And then relying on Relay passes to optimize the graph - in this case c

Re: [dmlc/tvm] [QNN] [RFC] QNN Dialect -- Prequantize Models (#3591)

2019-07-23 Thread Animesh Jain
@jnorwood Yes, bias is kept outside as a separate operator. But, this can be fused with the qnn.con2d. Regarding the accumulation point, if we perform fusion and add the bias in `int32` in the accumulator at the end, is it any different than preloading the accumulator? We need to ensure that op

Re: [dmlc/tvm] [QNN] [RFC] QNN Dialect -- Prequantize Models (#3591)

2019-07-23 Thread Animesh Jain
@FrozenGene Updated the Conv2D API. Also, added a diagram explaining how to go from TFLite to Relay operators. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/3591#issuecomment-514265170

Re: [dmlc/tvm] [QNN] [RFC] QNN Dialect -- Prequantize Models (#3591)

2019-07-22 Thread Animesh Jain
### QNN Conv2D operator Tensorflow ~~~ tf.nn.quantized_conv2d( input, filter, min_input, max_input, min_filter, max_filter, strides, padding, out_type=tf.dtypes.qint32, dilations=[1, 1, 1, 1], name=None ) ~~~ MxNet ~~~ mxnet.symbol.contrib.qua

Re: [dmlc/tvm] [QNN] [RFC] QNN Dialect -- Prequantize Models (#3591)

2019-07-22 Thread Animesh Jain
@jnorwood Thanks for the comment. Both good points. I will keep those abilities, though, outside of the scope of requantize op. Another function (not necessarily a Relay operator) can take min/max, a config (like nudge that zero is exactly representable) and generates scale and zero point as per

Re: [dmlc/tvm] [QNN] [RFC] QNN Dialect -- Prequantize Models (#3591)

2019-07-22 Thread Animesh Jain
@FrozenGene @tqchen @u99127 Can you please approve the above API, so that we can move to next discussion. I have so many things to discuss :) -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues

Re: [dmlc/tvm] [QNN] [RFC] QNN Dialect - Supporting pre-quantized models in TVM (#3591)

2019-07-19 Thread Animesh Jain
Let's start with just Requantize to keep it focussed ### QNN proposal ~~~ def requantize(data, input_scale, input_zero_point, output_scale, output_zero_point, rounding="AWAY_FROM_ZERO", out_dtype="int8"):

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-07-19 Thread Animesh Jain
@tqchen Thanks for reminding. Just created one :) -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/2351#issuecomment-513414742

[dmlc/tvm] [QNN] [RFC] QNN Dialect - Supporting pre-quantized models in TVM (#3591)

2019-07-19 Thread Animesh Jain
We are proposing a new dialect named `QNN`, that introduces a quantized version of existing relay operators. The goal is to support the models that have been pre-quantized in the framework. Some important notes about QNN dialect are * QNN operators are lowered to existing Relay operators to ens

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-07-18 Thread Animesh Jain
I agree, we should move the proposal to a new thread. Yes, I can lead the proposal discussion. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/2351#issuecomment-513025651

Re: [dmlc/tvm] [RFC] Add AVX512VNNI support for TVM (#3388)

2019-07-15 Thread Animesh Jain
> http://ci.tvm.ai:8080/blue/organizations/jenkins/tvm/detail/PR-3388/6/pipeline/ > Not sure why “llvm.x86.avx512.pmaddubs.w.512“ (AVX512 instruction, not VNNI > instruction) is not recognized as an LLVM intrinsic. This is happening because the LLVM version is 6.0 in CI as Tianqi mentioned. You

Re: [dmlc/tvm] [RFC] Add AVX512VNNI support for TVM (#3388)

2019-07-15 Thread Animesh Jain
> I will update the CI to add LLVM8 this week. Hi @tqchen, is there any update on the LLVM8 front? We are also looking into this and have similar test issue. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com

Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3512)

2019-07-08 Thread Animesh Jain
We have made good progress on the Quantization RFC, achieving clarity and convergence on many points. For this PR specifically, @tqchen and @FrozenGene, can you please comment if this looks in line with our quantization RFC. -- You are receiving this because you are subscribed to this thread. R

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-07-07 Thread Animesh Jain
> slight difference in a single point(0.5) is fine and likely won’t have an > impact on final acc Yeah, I was planning to add a rounding param to the op. For "ceil", we could just add a 0.5 rounding without worrying about negative values. For "round', we can be more precise. By default, we can

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-07-07 Thread Animesh Jain
> One thing to be careful about is that when using shift and normalize, right > shift corresponds to round down as opposed to round to nearest, an additional > 0.5 equivalence needs to be added to get the round behavior Yes, I think it is little more complicated. The std::round of -2.5 is -3.

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-07-06 Thread Animesh Jain
Thanks everybody for the fruitful discussion. I think we are gradually reaching convergence :) I am have been prototyping the qnn.conv2d and qnn.requantize at https://github.com/dmlc/tvm/pull/3367 I have still few lose ends to fix. I will update once I am done and then we can discuss if the im

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-07-06 Thread Animesh Jain
> And in the case when the scale is a power of two, use shift and normalize > might be better than float scale and round Yes, the shift and normalize can completely by done in integer scale instead of going to Floating point (even if they are not a multiple of 2). I have been prototyping that.

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-07-06 Thread Animesh Jain
> I can see that you might want the graph to represent all the operations prior > to optimizing the implementation. I just want to point out that the qrelu > implementation can avoid the lowered resolution and can be completely cost > free by revising the downscale multiplier and zero point of a

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-07-06 Thread Animesh Jain
Thanks @tqchen for the detailed explanation. Actually, my proposal is simpler. My `qnn.relu` does not convert to the three stages that you mentioned. It only performs the `relu_int_i8`. The frameworks (atleast TFLite and MxNet) do not go back to FP32 unless the operator is not supported in `i8`

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-07-06 Thread Animesh Jain
> In particular, refer to the current quantization pass, every value could sit > in a domain, which could be fixed point with an implied scale, or floating > point. Conversion between domains might be necessary and should be conducted > in a minimum way. The default way always convert integer do

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-07-06 Thread Animesh Jain
> Do we allow mix of standard ops and qnn ones? The framework parsed graph might have a mix (as shown in the lowering of qconv2d). But in the `relay.build` function, my first pass would be quantize_rewrite pass, that will convert all the `qnn` ops to existing relay ops, resulting in whole graph

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-07-06 Thread Animesh Jain
@tqchen Added the case for qrelu. (I think the asymmetric lowering can be improved further, but thats not the point). Similarly for quantized avg pool2d, as @FrozenGene mentioned, we will still need to upcast the tensor to int32 to avoid saturation. Additionally, we would need to handle the zer

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-07-06 Thread Animesh Jain
![q_conv2d](https://user-images.githubusercontent.com/13822661/60761466-2ef79d80-9ffe-11e9-9895-8707d4d8c100.jpg) -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/2351#issuecomment-508956

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-07-05 Thread Animesh Jain
@tqchen What are your thoughts? Seems like we are agreeing on the proposed design abstraction. There is a concern of not being able to achieve the best schedule performance. We can try to tackle it with fusion and schedule_tagging. -- You are receiving this because you are subscribed to this t

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-07-05 Thread Animesh Jain
@jnorwood Yes, I understand your point. We can use the clip to saturate the values even if Relu was not fused. It fits in the design and the proposed abstractions. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://git

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-07-05 Thread Animesh Jain
@FrozenGene Thanks for the quick feedback on the design. I understand the performance concern. Let's try to tackle them in fusion. Fusion already performs compute_inline to bring the computation at right location. Hopefully, with some tagging and with some arm-twisting, we can achieve same tens

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-07-04 Thread Animesh Jain
![0001](https://user-images.githubusercontent.com/13822661/60703346-6d824080-9eb6-11e9-9ad3-bc01f5e17451.jpg) ![0002](https://user-images.githubusercontent.com/13822661/60703347-6d824080-9eb6-11e9-89ed-e34f270d3f00.jpg) ![0003](https://user-images.githubusercontent.com/13822661/60703349-6e1ad700-9e

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-07-01 Thread Animesh Jain
All of above `qnn` ops will be lowered to existing Relay primitive ops using some Relay pass (for example, using ForwardRewrite infra). For example - `relay.op.qnn.conv2d` can be lowered to ~~~ fn (%quantized_data: Tensor[(2, 1, 2, 4), uint8], %weight: Tensor[(3, 1, 2, 2), uint8]) -> Tensor[(2

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-07-01 Thread Animesh Jain
Finally, we are starting to converge :) I am proposing them on the basis of Resnet network for now. `relay.op.qnn.conv2d` `relay.op.qnn.dense` `relay.op.qnn.relu` `relay.op.qnn.max_pool2d` `relay.op.qnn.avg_pool2d` `relay.op.qnn.concat` (used in Inception) `relay.op.qnn.quantize` `relay.op.qnn.d

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-30 Thread Animesh Jain
@jackwish Yes, `qnn` stands for a generic quantized nn, and not QNNPACK. I think @tqchen also means the same thing. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/2351#issuecomment-5071108

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-30 Thread Animesh Jain
I completely agree with breaking down into primitive ops. Even the `relay.op.qnn` should be broken down into primitive ops. If the primitive op does not exist, we will discuss and maybe create one. I understand the Relay fusion part. I am trying to make another point. I am trying to understand

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-30 Thread Animesh Jain
Thanks @tqchen Of the two choices, I am inclining towards `relay.op.qnn`. My hope is that different frameworks converge to same `qnn` ops. The `relay.op.tflite` seems to be very specific as of now. I agree that these news ops should have a special op_level. I am still unclear about where to d

Re: [dmlc/tvm] [RFC][Quantization] Designing and lowering of quantized ops (#3457)

2019-06-28 Thread Animesh Jain
@tqchen @FrozenGene @ZihengJiang @zhiics @wweic @eqy -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/pull/3457#issuecomment-506844165

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-16 Thread Animesh Jain
> Although the quantized conv result is held in uint8, it could be static > casted to signed int8, or even fewer than 8 bit quantization. That would > require both min and max saturations, as in the reference tflite quantized > conv implementation Ah, I see. That finally makes sense. So, this i

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-16 Thread Animesh Jain
> I think it is ok. If we do this way, we should insert one clamp if we have > activation. > Like our tflite frontend Yes, I agree with that. That's exactly what I was thinking. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHu

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-15 Thread Animesh Jain
@FrozenGene For the output_min and max, isn't the out_dtype enough? If its uint8, we can clamp at 0 and 255. If its int8, we can clamp at -128 and 127. I don't see any reason the values will be any different, unless you want to fuse the quantized relu in the quantized convolution from the starti

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-14 Thread Animesh Jain
@FrozenGene Thanks for replying. I might be wrong, but I don't think it is a good design to take one codegen backend like QNNPACK and make changes all the way into Relay APIs to make the connection. In my opinion, APIs must be minimal. But, your point of using QNNPACK is completely valid. I have

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-13 Thread Animesh Jain
@tqchen @FrozenGene @jackwish I have added a prototype patch. I think it will be helpful to use that patch to drive the discussion further. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-06-04 Thread Animesh Jain
Ok, lets try to finalize the high-level design points. Lets first discuss the # Namespace for the tflite quantize style dialect ### Requirements * This should support both symmetric and asymmetric. * These ops should never go through codegen. They will be lowered to low-level Relay ops (like exi

[TVM Discuss] [Development] TFLite an internal invariant was violated while typechecking your program

2019-05-31 Thread Animesh Jain via TVM Discuss
It seems like NHWC problem. The conv should have a arg called data_layout = "NHWC" here --- [Visit Topic](https://discuss.tvm.ai/t/tflite-an-internal-invariant-was-violated-while-typechecking-your-program/2784/3) to respond. You are receiving this because you enabled mailing list mode.

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-05-31 Thread Animesh Jain
This is most probably out of the context of the issue, but is it possible for all of the people commenting here to join a conference call for an hour and figure out the next steps? I can take notes and document them here for everybody else to see. I think it will be more productive. -- You are

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-05-30 Thread Animesh Jain
I would suggest to design the infrastructure that supports both symmetric/asymmetric quantization. We can certainly start with symmetric to flush the flow, while keeping in mind that we can share as much infrastructure as possible between them. > * namespace for the tflite quantize style dialec

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-05-29 Thread Animesh Jain
> For the `q_conv2d`, we will add two more arguments. > > ```python > output_min=0, > output_max=0 > ``` > > These will be used for restrict the output range, which could be calculated > previously. I see what you are saying, but I am not sure if this is the right approach. In my opinion,

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-05-29 Thread Animesh Jain
> > > For the `q_conv2d`, we will add two more arguments. > > > ```python > > > output_min=0, > > > output_max=0 > > > ``` > > > > > > > > > These will be used for restrict the output range, which could be > > > calculated previously. > > > > > > I see what you are saying, but I am not su

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-05-29 Thread Animesh Jain
> Yes, I believe the MobilenetV2 relu_6 is effectively fused in by the > downscale saturation. You might need it if you want to support their way of > training, though. > > Yes Mobilenet has the q_add, but I suggest the Inceptionv3 for q_concatenate, > since it also has concat nodes feeding int

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-05-29 Thread Animesh Jain
> Hi @anijain2305 regarding the requantization, if the it is not going to put > in conv op, the op may suppose to output FP32, otherwise the semantic is > confusing. The requantization can convert FP32 to INT8. The multiplier/shift > based reuantization approach introduced by TFLite is also adop

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-05-28 Thread Animesh Jain
Thanks. Let's lay down the high-level API design for some of the quantized operators. A large portion of this is coming from the following relevant discussions. Thanks to @jackwish, @FrozenGene and @jnorwood for sharing their experiences with quantization, and also @shoubhik for helping design t

Re: [dmlc/tvm] [RFC] Reading quantized models from TFLite and MxNet - operators API (#3252)

2019-05-28 Thread Animesh Jain
Adding others who might be interested in this @ajtulloch @eqy @ZihengJiang @tqchen -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/3252#issuecomment-496631850

[dmlc/tvm] [RFC] Reading quantized models from TFLite and MxNet - operators API (#3252)

2019-05-28 Thread Animesh Jain
To increase quantization support in TVM, it is necessary to support the pre-quantized models, i.e., the models that have been quantized in the framework itself (outside of Relay). In this issue, we are laying down the high-level API design for some of the quantized operators. A large portion of

Re: [dmlc/tvm] [RFC][Quantization] Support quantized models from TensorflowLite (#2351)

2019-05-14 Thread Animesh Jain
@FrozenGene I am interested in contributing to this Issue. Is it possible to share the progress? -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/2351#issuecomment-492433018