SSD-MobileNet V2 Trained on MS-COCO Data-examples

SSD-MobileNet V2 Trained on MS-COCO Data

Detect and localize objects in an image

Resource retrieval

Get the pre-trained net:

In[]:=

NetModel["SSD-MobileNet V2 Trained on MS-COCO Data"]

Out[]=

Evaluation function

Define the label list for this model. Integers in the model's output correspond to elements in the label list:

In[]:=

labels={"person","bicycle","car","motorcycle","airplane","bus","train","truck","boat","traffic light","fire hydrant","stop sign","parking meter","bench","bird","cat","dog","horse","sheep","cow","elephant","bear","zebra","giraffe","backpack","umbrella","handbag","tie","suitcase","frisbee","skis","snowboard","sports ball","kite","baseball bat","baseball glove","skateboard","surfboard","tennis racket","bottle","wine glass","cup","fork","knife","spoon","bowl","banana","apple","sandwich","orange","broccoli","carrot","hot dog","pizza","donut","cake","chair","couch","potted plant","bed","dining table","toilet","tv","laptop","mouse","remote","keyboard","cell phone","microwave","oven","toaster","sink","refrigerator","book","clock","vase","scissors","teddy bear","hair drier","toothbrush"};

Write an evaluation function to scale the result to the input image size and suppress the least probable detections:

In[]:=

nonMaxSuppression[overlapThreshold_][detection_]:=Module[{boxes,confidence},Fold[{list,new}If[NoneTrue[list[[All,1]],iou[#,new[[1]]]>overlapThreshold&],Append[list,new],list],Sequence@@TakeDrop[Reverse@SortBy[detection,Last],1]]]iou:=iou=With[{c=Compile[{{box1,_Real,2},{box2,_Real,2}},Module[{area1,area2,x1,y1,x2,y2,w,h,int},area1=(box1[[2,1]]-box1[[1,1]])(box1[[2,2]]-box1[[1,2]]);area2=(box2[[2,1]]-box2[[1,1]])(box2[[2,2]]-box2[[1,2]]);x1=Max[box1[[1,1]],box2[[1,1]]];y1=Max[box1[[1,2]],box2[[1,2]]];x2=Min[box1[[2,1]],box2[[2,1]]];y2=Min[box1[[2,2]],box2[[2,2]]];w=Max[0.,x2-x1];h=Max[0.,y2-y1];int=w*h;int/(area1+area2-int)],RuntimeAttributes{Listable},ParallelizationTrue,RuntimeOptions"Speed"]},c@@Replace[{##},RectangleList,Infinity,HeadsTrue]&]

In[]:=

netevaluate[img_Image,detectionThreshold_:.5,overlapThreshold_:.45]:=Module[{netOutputDecoder,net,decoded},netOutputDecoder[imageDims_,threshold_:.5][netOutput_]:=Module[{detections=Position[netOutput["ClassProb"],x_/;x>threshold]},If[Length[detections]>0,Transpose[{Rectangle@@@Round@Transpose[Transpose[Extract[netOutput["Boxes"],detections[[All,1;;1]]],{2,3,1}]*imageDims/{300,300},{3,1,2}],Extract[labels,detections[[All,2;;2]]],Extract[netOutput["ClassProb"],detections]}],{}]];net=NetModel["SSD-MobileNet V2 Trained on MS-COCO Data"];decoded=netOutputDecoder[ImageDimensions[img],detectionThreshold]@net[img];Flatten[Map[nonMaxSuppression[overlapThreshold],GatherBy[decoded,decoded[[2]]&]],1]]

Basic usage

Obtain the detected bounding boxes with their corresponding classes and confidences for a given image:

In[]:=

testImage=

;

In[]:=

detection=netevaluate[testImage,.5]

Out[]=

{{Rectangle[{-1,91},{225,235}],cat,0.915414},{Rectangle[{229,9},{401,223}],dog,0.817975}}

Inspect which classes are detected:

In[]:=

classes=DeleteDuplicates@detection[[All,2]]

Out[]=

{cat,dog}

Visualize the detection:

In[]:=

HighlightImage[testImage,MapThread[{White,Inset[Style[#2,Black,FontSizeScaled[1/20],BackgroundGrayLevel[1,.6]],Last[#1],{Right,Top}],#1}&,Transpose@detection]]

Out[]=

Network result

The network computes 1,917 bounding boxes and the probability that the objects in each box are of any given class:

In[]:=

res=NetModel["SSD-MobileNet V2 Trained on MS-COCO Data"][testImage]

Out[]=

Boxes{{{-0.352299,282.234},{19.2081,300.759}},{{-19.6433,275.772},{49.3239,304.458}},

⋯1914⋯

,{{-3.65158,18.8586},{278.613,285.51}}},ClassProb{

⋯1⋯

}

large output

show less

show all

set size limit...

Visualize all the boxes predicted by the net scaled by their “objectness” measures:

In[]:=

rectangles=Rectangle@@@res["Boxes"];

In[]:=

Graphics[MapThread[{EdgeForm[Opacity[#1+.01]],#2}&,{Total[res["ClassProb"],{2}],rectangles}],BaseStyle{FaceForm[],EdgeForm[{Thin,Black}]}]

Out[]=

Visualize all the boxes scaled by the probability that they contain a cat:

In[]:=

idx=Position[labels,"cat"][[1,1]]

Out[]=

In[]:=

Graphics[MapThread[{EdgeForm[Opacity[#1+.01]],#2}&,{res["ClassProb"][[All,idx]],rectangles}],BaseStyle{FaceForm[],EdgeForm[{Thin,Black}]}]

Superimpose the cat prediction on top of the scaled input received by the net:

Advanced visualization

Write a function to apply a custom styling to the result of the detection:

Visualize multiple objects, using a different color for each class:

Net information

Inspect the number of parameters of all arrays in the net:

Obtain the total number of parameters:

Obtain the layer type counts:

Display the summary graphic:

Export to MXNet

Get the size of the parameter file:

The size is similar to the byte count of the resource object: