[1] Girshick R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448.
[2] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2014: 580-587.
[3] He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]//European Conference on Computer Vision.Springer International Publishing, 2014: 346-361.
[4] PASCAL VOC. http://host.robots.ox.ac.uk/pascal/VOC/.
[5] Theeuwes J. Stimulus-driven capture and attentional set: selective search for color and visual abrupt onsets[J]. Journal of Experimental Psychology: Human perception and performance, 1994, 20(4): 799.
[6] Caffe. http://caffe.berkeleyvision.org/.
[7] Dollár P, Zitnick C L. Structured forests for fast edge detection[C]//Proceedings of the IEEE International Conference on Computer Vision, 2013: 1841-1848.
[8] http://www.cnblogs.com/louyihang-loves-baiyan/p/4903231.html.
[1]mAP,即mean Average Precision(平均准确率),是评价信息检索系统性能的常用指标。