SSD-MobileNet V2 Trained on MS-COCO Data
Detect and localize objects in an image
Resource retrieval
Resource retrieval
Get the pre-trained net:
In[]:=
NetModel["SSD-MobileNet V2 Trained on MS-COCO Data"]
Out[]=
Evaluation function
Evaluation function
Define the label list for this model. Integers in the model's output correspond to elements in the label list:
In[]:=
labels={"person","bicycle","car","motorcycle","airplane","bus","train","truck","boat","traffic light","fire hydrant","stop sign","parking meter","bench","bird","cat","dog","horse","sheep","cow","elephant","bear","zebra","giraffe","backpack","umbrella","handbag","tie","suitcase","frisbee","skis","snowboard","sports ball","kite","baseball bat","baseball glove","skateboard","surfboard","tennis racket","bottle","wine glass","cup","fork","knife","spoon","bowl","banana","apple","sandwich","orange","broccoli","carrot","hot dog","pizza","donut","cake","chair","couch","potted plant","bed","dining table","toilet","tv","laptop","mouse","remote","keyboard","cell phone","microwave","oven","toaster","sink","refrigerator","book","clock","vase","scissors","teddy bear","hair drier","toothbrush"};
Write an evaluation function to scale the result to the input image size and suppress the least probable detections:
In[]:=
nonMaxSuppression[overlapThreshold_][detection_]:=Module[{boxes,confidence},Fold[{list,new}If[NoneTrue[list[[All,1]],iou[#,new[[1]]]>overlapThreshold&],Append[list,new],list],Sequence@@TakeDrop[Reverse@SortBy[detection,Last],1]]]iou:=iou=With[{c=Compile[{{box1,_Real,2},{box2,_Real,2}},Module[{area1,area2,x1,y1,x2,y2,w,h,int},area1=(box1[[2,1]]-box1[[1,1]])(box1[[2,2]]-box1[[1,2]]);area2=(box2[[2,1]]-box2[[1,1]])(box2[[2,2]]-box2[[1,2]]);x1=Max[box1[[1,1]],box2[[1,1]]];y1=Max[box1[[1,2]],box2[[1,2]]];x2=Min[box1[[2,1]],box2[[2,1]]];y2=Min[box1[[2,2]],box2[[2,2]]];w=Max[0.,x2-x1];h=Max[0.,y2-y1];int=w*h;int/(area1+area2-int)],RuntimeAttributes{Listable},ParallelizationTrue,RuntimeOptions"Speed"]},c@@Replace[{##},RectangleList,Infinity,HeadsTrue]&]
In[]:=
netevaluate[img_Image,detectionThreshold_:.5,overlapThreshold_:.45]:=Module[{netOutputDecoder,net,decoded},netOutputDecoder[imageDims_,threshold_:.5][netOutput_]:=Module[{detections=Position[netOutput["ClassProb"],x_/;x>threshold]},If[Length[detections]>0,Transpose[{Rectangle@@@Round@Transpose[Transpose[Extract[netOutput["Boxes"],detections[[All,1;;1]]],{2,3,1}]*imageDims/{300,300},{3,1,2}],Extract[labels,detections[[All,2;;2]]],Extract[netOutput["ClassProb"],detections]}],{}]];net=NetModel["SSD-MobileNet V2 Trained on MS-COCO Data"];decoded=netOutputDecoder[ImageDimensions[img],detectionThreshold]@net[img];Flatten[Map[nonMaxSuppression[overlapThreshold],GatherBy[decoded,decoded[[2]]&]],1]]
Basic usage
Basic usage
Obtain the detected bounding boxes with their corresponding classes and confidences for a given image:
In[]:=
testImage=
;
In[]:=
detection=netevaluate[testImage,.5]
Out[]=
{{Rectangle[{-1,91},{225,235}],cat,0.915414},{Rectangle[{229,9},{401,223}],dog,0.817975}}
Inspect which classes are detected:
In[]:=
classes=DeleteDuplicates@detection[[All,2]]
Out[]=
{cat,dog}
Visualize the detection:
In[]:=
HighlightImage[testImage,MapThread[{White,Inset[Style[#2,Black,FontSizeScaled[1/20],BackgroundGrayLevel[1,.6]],Last[#1],{Right,Top}],#1}&,Transpose@detection]]
Out[]=
Network result
Network result
The network computes 1,917 bounding boxes and the probability that the objects in each box are of any given class:
In[]:=
res=NetModel["SSD-MobileNet V2 Trained on MS-COCO Data"][testImage]
Out[]=
Visualize all the boxes predicted by the net scaled by their “objectness” measures:
In[]:=
rectangles=Rectangle@@@res["Boxes"];
In[]:=
Graphics[MapThread[{EdgeForm[Opacity[#1+.01]],#2}&,{Total[res["ClassProb"],{2}],rectangles}],BaseStyle{FaceForm[],EdgeForm[{Thin,Black}]}]
Out[]=
Visualize all the boxes scaled by the probability that they contain a cat:
Superimpose the cat prediction on top of the scaled input received by the net:
Advanced visualization
Advanced visualization
Write a function to apply a custom styling to the result of the detection:
Visualize multiple objects, using a different color for each class:
Net information
Net information
Inspect the number of parameters of all arrays in the net:
Obtain the total number of parameters:
Obtain the layer type counts:
Display the summary graphic:
Export to MXNet
Export to MXNet
Get the size of the parameter file:
The size is similar to the byte count of the resource object: