14.2. 实例分析

14.2.1. SSD300目标检测网络分析

SSD300目标检测网络的计算量主要在卷积层, 下表给出了SSD300网络的各层计算量分析结果, 包括输入输出大小, 参数量(params), 乘累加量(Fused multiply–add, Madd), 浮点数运算量(FLOPs)以及内存读写量.

number

module name

input shape

output shape

params

memory(MB)

MAdd

Flops

MemRead(B)

MemWrite(B)

duration[%]

MemR+W(B)

0

vgg.0

3 300 300

64 300 300

1792.0

21.97

311,040,000.0

161,280,000.0

1087168.0

23040000.0

9.40%

24127168.0

1

vgg.1

64 300 300

64 300 300

0.0

21.97

5,760,000.0

5,760,000.0

23040000.0

23040000.0

0.30%

46080000.0

2

vgg.2

64 300 300

64 300 300

36928.0

21.97

6,635,520,000.0

3,323,520,000.0

23187712.0

23040000.0

9.76%

46227712.0

3

vgg.3

64 300 300

64 300 300

0.0

21.97

5,760,000.0

5,760,000.0

23040000.0

23040000.0

0.26%

46080000.0

4

vgg.4

64 300 300

64 150 150

0.0

5.49

4,320,000.0

5,760,000.0

23040000.0

5760000.0

21.20%

28800000.0

5

vgg.5

64 150 150

128 150 150

73856.0

10.99

3,317,760,000.0

1,661,760,000.0

6055424.0

11520000.0

7.03%

17575424.0

6

vgg.6

128 150 150

128 150 150

0.0

10.99

2,880,000.0

2,880,000.0

11520000.0

11520000.0

0.16%

23040000.0

7

vgg.7

128 150 150

128 150 150

147584.0

10.99

6,635,520,000.0

3,320,640,000.0

12110336.0

11520000.0

6.33%

23630336.0

8

vgg.8

128 150 150

128 150 150

0.0

10.99

2,880,000.0

2,880,000.0

11520000.0

11520000.0

0.04%

23040000.0

9

vgg.9

128 150 150

128 75 75

0.0

2.75

2,160,000.0

2,880,000.0

11520000.0

2880000.0

4.44%

14400000.0

10

vgg.10

128 75 75

256 75 75

295168.0

5.49

3,317,760,000.0

1,660,320,000.0

4060672.0

5760000.0

2.32%

9820672.0

11

vgg.11

256 75 75

256 75 75

0.0

5.49

1,440,000.0

1,440,000.0

5760000.0

5760000.0

0.04%

11520000.0

12

vgg.12

256 75 75

256 75 75

590080.0

5.49

6,635,520,000.0

3,319,200,000.0

8120320.0

5760000.0

3.26%

13880320.0

13

vgg.13

256 75 75

256 75 75

0.0

5.49

1,440,000.0

1,440,000.0

5760000.0

5760000.0

0.02%

11520000.0

14

vgg.14

256 75 75

256 75 75

590080.0

5.49

6,635,520,000.0

3,319,200,000.0

8120320.0

5760000.0

3.01%

13880320.0

15

vgg.15

256 75 75

256 75 75

0.0

5.49

1,440,000.0

1,440,000.0

5760000.0

5760000.0

0.03%

11520000.0

16

vgg.16

256 75 75

256 38 38

0.0

1.41

1,108,992.0

1,440,000.0

5760000.0

1478656.0

1.74%

7238656.0

17

vgg.17

256 38 38

512 38 38

1180160.0

2.82

3,406,823,424.0

1,704,151,040.0

6199296.0

2957312.0

1.70%

9156608.0

18

vgg.18

512 38 38

512 38 38

0.0

2.82

739,328.0

739,328.0

2957312.0

2957312.0

0.02%

5914624.0

19

vgg.19

512 38 38

512 38 38

2359808.0

2.82

6,813,646,848.0

3,407,562,752.0

12396544.0

2957312.0

3.86%

15353856.0

20

vgg.20

512 38 38

512 38 38

0.0

2.82

739,328.0

739,328.0

2957312.0

2957312.0

0.02%

5914624.0

21

vgg.21

512 38 38

512 38 38

2359808.0

2.82

6,813,646,848.0

3,407,562,752.0

12396544.0

2957312.0

4.16%

15353856.0

22

vgg.22

512 38 38

512 38 38

0.0

2.82

739,328.0

739,328.0

2957312.0

2957312.0

0.02%

5914624.0

23

vgg.23

512 38 38

512 19 19

0.0

0.71

554,496.0

739,328.0

2957312.0

739328.0

0.99%

3696640.0

24

vgg.24

512 19 19

512 19 19

2359808.0

0.71

1,703,411,712.0

851,890,688.0

10178560.0

739328.0

1.98%

10917888.0

25

vgg.25

512 19 19

512 19 19

0.0

0.71

184,832.0

184,832.0

739328.0

739328.0

0.02%

1478656.0

26

vgg.26

512 19 19

512 19 19

2359808.0

0.71

1,703,411,712.0

851,890,688.0

10178560.0

739328.0

1.03%

10917888.0

27

vgg.27

512 19 19

512 19 19

0.0

0.71

184,832.0

184,832.0

739328.0

739328.0

0.02%

1478656.0

28

vgg.28

512 19 19

512 19 19

2359808.0

0.71

1,703,411,712.0

851,890,688.0

10178560.0

739328.0

1.00%

10917888.0

29

vgg.29

512 19 19

512 19 19

0.0

0.71

184,832.0

184,832.0

739328.0

739328.0

0.02%

1478656.0

30

vgg.30

512 19 19

512 19 19

0.0

0.71

1,478,656.0

184,832.0

739328.0

739328.0

1.40%

1478656.0

31

vgg.31

512 19 19

1024 19 19

4719616.0

1.41

3,406,823,424.0

1,703,781,376.0

19617792.0

1478656.0

6.17%

21096448.0

32

vgg.32

1024 19 19

1024 19 19

0.0

1.41

369,664.0

369,664.0

1478656.0

1478656.0

0.04%

2957312.0

33

vgg.33

1024 19 19

1024 19 19

1049600.0

1.41

757,071,872.0

378,905,600.0

5677056.0

1478656.0

2.00%

7155712.0

34

vgg.34

1024 19 19

1024 19 19

0.0

1.41

369,664.0

369,664.0

1478656.0

1478656.0

0.02%

2957312.0

35

L2Norm

512 38 38

512 38 38

512.0

2.82

0.0

0.0

0.0

0.0

0.22%

0.0

36

extras.0

1024 19 19

256 19 19

262400.0

0.35

189,267,968.0

94,726,400.0

2528256.0

369664.0

0.28%

2897920.0

37

extras.1

256 19 19

512 10 10

1180160.0

0.20

235,929,600.0

118,016,000.0

5090304.0

204800.0

0.37%

5295104.0

38

extras.2

512 10 10

128 10 10

65664.0

0.05

13,107,200.0

6,566,400.0

467456.0

51200.0

0.18%

518656.0

39

extras.3

128 10 10

256 5 5

295168.0

0.02

14,745,600.0

7,379,200.0

1231872.0

25600.0

0.39%

1257472.0

40

extras.4

256 5 5

128 5 5

32896.0

0.01

1,638,400.0

822,400.0

157184.0

12800.0

0.18%

169984.0

41

extras.5

128 5 5

256 3 3

295168.0

0.01

5,308,416.0

2,656,512.0

1193472.0

9216.0

1.05%

1202688.0

42

extras.6

256 3 3

128 3 3

32896.0

0.00

589,824.0

296,064.0

140800.0

4608.0

0.13%

145408.0

43

extras.7

128 3 3

256 1 1

295168.0

0.00

589,824.0

295,168.0

1185280.0

1024.0

0.12%

1186304.0

44

loc.0

512 38 38

16 38 38

73744.0

0.09

212,926,464.0

106,486,336.0

3252288.0

92416.0

0.31%

3344704.0

45

loc.1

1024 19 19

24 19 19

221208.0

0.03

159,694,848.0

79,856,088.0

2363488.0

34656.0

0.23%

2398144.0

46

loc.2

512 10 10

24 10 10

110616.0

0.01

22,118,400.0

11,061,600.0

647264.0

9600.0

0.51%

656864.0

47

loc.3

256 5 5

24 5 5

55320.0

0.00

2,764,800.0

1,383,000.0

246880.0

2400.0

0.13%

249280.0

48

loc.4

256 3 3

16 3 3

36880.0

0.00

663,552.0

331,920.0

156736.0

576.0

0.11%

157312.0

49

loc.5

256 1 1

16 1 1

36880.0

0.00

73,728.0

36,880.0

148544.0

64.0

0.09%

148608.0

50

conf.0

512 38 38

84 38 38

387156.0

0.46

1,117,863,936.0

559,053,264.0

4505936.0

485184.0

0.70%

4991120.0

51

conf.1

1024 19 19

126 19 19

1161342.0

0.17

838,397,952.0

419,244,462.0

6124024.0

181944.0

0.57%

6305968.0

52

conf.2

512 10 10

126 10 10

580734.0

0.05

116,121,600.0

58,073,400.0

2527736.0

50400.0

0.23%

2578136.0

53

conf.3

256 5 5

126 5 5

290430.0

0.01

14,515,200.0

7,260,750.0

1187320.0

12600.0

0.14%

1199920.0

54

conf.4

256 3 3

84 3 3

193620.0

0.00

3,483,648.0

1,742,580.0

783696.0

3024.0

0.11%

786720.0

55

conf.5

256 1 1

84 1 1

193620.0

0.00

387,072.0

193,620.0

775504.0

336.0

0.09%

775840.0

56

softmax

8732 21

8732 21

0.0

0.70

550,115.0

0.0

0.0

0.0

0.06%

0.0

total

26285486.0

207.65

62,782,359,651.0

31,435,153,596.0

0.0

0.0

100.00%

542786664.0

SSD300网络总的计算量见下表, 可见网络含有26,285,486参数, 内存占用为 \(207.65 {\rm MB}\), 总乘累加次数为 \(62.78 {\rm G}\), 总的浮点数运算次数为 \(31.44 {\rm GFLOPs}\).

Total params

Total memory

Total MAdd

Total FLOPs

Total MemR+W

26,285,486

207.65MB

62.78GMAdd

31.44GFlops

517.64MB