Thanks @tqchen 

Of the two choices, I am inclining towards `relay.op.qnn`. My hope is that 
different frameworks converge to same `qnn` ops. The `relay.op.tflite` seems to 
be very specific as of now. I agree that these news ops should have a special 
op_level.

I am still unclear about where to draw the boundary when to directly translate 
to lower ops vs creating a new`qnn` op. For example, if we are going for 
devices that do not have any FP32 compute units, we might have to create a long 
sequence of existing Relay ops to approximate the FP32 computation with fixed 
point/integer computation. So, encapsulating them would be a good idea.

Basically, we need some kind of abstraction that can be shared across 
frameworks for these framework operations. For now, I was treating this 
abstraction as a new `qnn` relay op. The rationale behind this choice is that 
once we convert from the framework to a Relay graph, we can eyeball the graph 
and make some sense by reading the graph. Directly translating will lose the 
readability of the Relay quantized graph.

However given the tradeoffs, we can very well create a new class that can be 
shared across frameworks. What are your thoughts on this?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/2351#issuecomment-507080267

Reply via email to