The limits depend strongly on number of input tensor channels ( Ci) and the input tensor width ( W). The following tables provide a more explicit representation of the Intel(R) GNA 3.0 2D convolution operations initially supported. There is much more freedom to choose the input tensor height and number of output channels. The tables below show that the exact limitation on the input tensor width W depends on the number of input channels C (indicated as Ci below) and the kernel shape. Not all combinations of kernel shape and input tensor shape are supported (see the tables below for exact limitations). Pooling is limited to pool shapes of, , or. Up to 256 kernels (output channels) are supported. Up to 384 C channels may be used with a subset of kernel sizes (see the table below). Input tensor dimensions are limited to <= <=. Initially, a limited subset of Intel® GNA 3.0 features are added to the previous feature set including the following:ĢD VALID Convolution With Small 2D Kernels: Two-dimensional convolutions with the following kernel dimensions are supported:, ,, ,, ,, ,, , or. However, 2D convolutions can be mapped to 1D when a convolution kernel moves in a single direction. The Intel® GNA 1.0 and 2.0 hardware natively supports only 1D convolutions. Splits and concatenations are supported for continuous portions of memory (e.g., split of 1,2,3,4 to 1,1,3,4 and 1,1,3,4 or concats of 1,2,3,4 and 1,2,3,5 to 2,2,3,4).įor Multiply, Add and Subtract layers, auto broadcasting is only supported for constant inputs. Transpose layer support is limited to the cases where no data reordering is needed or when reordering is happening for two dimensions, at least one of which is not greater than 8. The maximum number of filters is 65532 for GNA 2. The number of output channels for convolutions must be a multiple of 4. Prior to GNA 3.0, only 1D convolutions are natively supported on the HW 2D convolutions have specific limitations (see the table below). The exception are the models specifically adapted for the GNA Plugin. For example, GNA Plugin should not be expected to run computer vision models because the plugin does not fully support 2D convolutions. POT API Usage sample for GNA demonstrates how a model can be quantized for GNA, using POT API in two modes:įor POT quantized model, the ov::hint::inference_precision property has no effect except cases described in Support for 2D Convolutions using POT.ĭue to the specification of hardware architecture, Intel® GNA supports a limited set of operations (including their kinds and combinations). Hello Query Device C++ Sample can be used to print out supported data types for all detected devices. GNA plugin supports the i16 and i8 quantized data types as inference precision of internal primitives. This mode is going to be deprecated soon. However, this mode may not provide satisfactory accuracy because the internal quantization algorithm is based on heuristics, the efficiency of which depends on the model and dynamic range of input data. Therefore, a model can be run without calibration. Unlike other plugins supporting low-precision execution, the GNA plugin can calculate quantization factors at the model loading time. GNA plugin users are encouraged to use the Post-Training Optimization Tool to get a model with quantization hints based on statistics for the provided dataset. Intel® GNA essentially operates in the low-precision mode which represents a mix of 8-bit ( i8), 16-bit ( i16), and 32-bit ( i32) integer computations.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |