SSD-MobileNet V2 Trained on MS-COCO Data
Detect and localize objects in an image

Resource retrieval

Get the pre-trained net:
In[]:=
NetModel["SSD-MobileNet V2 Trained on MS-COCO Data"]
Out[]=

Evaluation function

Define the label list for this model. Integers in the model's output correspond to elements in the label list:
In[]:=
labels={"person","bicycle","car","motorcycle","airplane","bus","train","truck","boat","traffic light","fire hydrant","stop sign","parking meter","bench","bird","cat","dog","horse","sheep","cow","elephant","bear","zebra","giraffe","backpack","umbrella","handbag","tie","suitcase","frisbee","skis","snowboard","sports ball","kite","baseball bat","baseball glove","skateboard","surfboard","tennis racket","bottle","wine glass","cup","fork","knife","spoon","bowl","banana","apple","sandwich","orange","broccoli","carrot","hot dog","pizza","donut","cake","chair","couch","potted plant","bed","dining table","toilet","tv","laptop","mouse","remote","keyboard","cell phone","microwave","oven","toaster","sink","refrigerator","book","clock","vase","scissors","teddy bear","hair drier","toothbrush"};
Write an evaluation function to scale the result to the input image size and suppress the least probable detections:
In[]:=
nonMaxSuppression[overlapThreshold_][detection_]:=Module[{boxes,confidence},Fold[{list,new}If[NoneTrue[list[[All,1]],iou[#,new[[1]]]>overlapThreshold&],Append[list,new],list],Sequence@@TakeDrop[Reverse@SortBy[detection,Last],1]]]​​​​iou:=iou=With[{c=Compile[{{box1,_Real,2},{box2,_Real,2}},Module[{area1,area2,x1,y1,x2,y2,w,h,int},area1=(box1[[2,1]]-box1[[1,1]])(box1[[2,2]]-box1[[1,2]]);​​area2=(box2[[2,1]]-box2[[1,1]])(box2[[2,2]]-box2[[1,2]]);​​x1=Max[box1[[1,1]],box2[[1,1]]];​​y1=Max[box1[[1,2]],box2[[1,2]]];​​x2=Min[box1[[2,1]],box2[[2,1]]];​​y2=Min[box1[[2,2]],box2[[2,2]]];​​w=Max[0.,x2-x1];​​h=Max[0.,y2-y1];​​int=w*h;​​int/(area1+area2-int)],RuntimeAttributes{Listable},ParallelizationTrue,RuntimeOptions"Speed"]},c@@Replace[{##},RectangleList,Infinity,HeadsTrue]&]
In[]:=
netevaluate[img_Image,detectionThreshold_:.5,overlapThreshold_:.45]:=Module[{netOutputDecoder,net,decoded},​​netOutputDecoder[imageDims_,threshold_:.5][netOutput_]:=Module[{detections=Position[netOutput["ClassProb"],x_/;x>threshold]},If[Length[detections]>0,Transpose[{Rectangle@@@Round@Transpose[Transpose[Extract[netOutput["Boxes"],detections[[All,1;;1]]],{2,3,1}]*imageDims/{300,300},{3,1,2}],Extract[labels,detections[[All,2;;2]]],​​Extract[netOutput["ClassProb"],detections]}],​​{}​​]​​];​​net=NetModel["SSD-MobileNet V2 Trained on MS-COCO Data"];​​decoded=netOutputDecoder[ImageDimensions[img],detectionThreshold]@net[img];​​Flatten[Map[nonMaxSuppression[overlapThreshold],GatherBy[decoded,decoded[[2]]&]],1]​​]​​

Basic usage

Obtain the detected bounding boxes with their corresponding classes and confidences for a given image:
In[]:=
testImage=
;
In[]:=
detection=netevaluate[testImage,.5]
Out[]=
{{Rectangle[{-1,91},{225,235}],cat,0.915414},{Rectangle[{229,9},{401,223}],dog,0.817975}}
Inspect which classes are detected:
In[]:=
classes=DeleteDuplicates@detection[[All,2]]
Out[]=
{cat,dog}
Visualize the detection:
In[]:=
HighlightImage[testImage,MapThread[{White,Inset[Style[#2,​​Black,FontSizeScaled[1/20],BackgroundGrayLevel[1,.6]],Last[#1],{Right,Top}],#1}&,Transpose@detection]]
Out[]=

Network result

The network computes 1,917 bounding boxes and the probability that the objects in each box are of any given class:
In[]:=
res=NetModel["SSD-MobileNet V2 Trained on MS-COCO Data"][testImage]
Out[]=
Boxes{{{-0.352299,282.234},{19.2081,300.759}},{{-19.6433,275.772},{49.3239,304.458}},
⋯1914⋯
,{{-3.65158,18.8586},{278.613,285.51}}},ClassProb{
⋯1⋯
}
large output
show less
show more
show all
set size limit...
Visualize all the boxes predicted by the net scaled by their “objectness” measures:
In[]:=
rectangles=Rectangle@@@res["Boxes"];
In[]:=
Graphics[​​MapThread[{EdgeForm[Opacity[#1+.01]],#2}&,{Total[res["ClassProb"],{2}],rectangles}],​​BaseStyle{FaceForm[],EdgeForm[{Thin,Black}]}​​]
Out[]=
Visualize all the boxes scaled by the probability that they contain a cat:
In[]:=
idx=Position[labels,"cat"][[1,1]]
Out[]=
16
In[]:=
Graphics[​​MapThread[{EdgeForm[Opacity[#1+.01]],#2}&,{res["ClassProb"][[All,idx]],rectangles}],​​BaseStyle{FaceForm[],EdgeForm[{Thin,Black}]}​​]
Superimpose the cat prediction on top of the scaled input received by the net:

Advanced visualization

Write a function to apply a custom styling to the result of the detection:
Visualize multiple objects, using a different color for each class:

Net information

Inspect the number of parameters of all arrays in the net:
Obtain the total number of parameters:
Obtain the layer type counts:
Display the summary graphic:

Export to MXNet

Get the size of the parameter file:
The size is similar to the byte count of the resource object: