Sentiment Language Model Trained on Amazon Product Review Data
Generate text in English and analyze sentiment
Resource retrieval
Resource retrieval
Get the pre-trained net:
In[]:=
NetModel["Sentiment Language Model Trained on Amazon Product Review Data"]
Out[]=
Basic usage
Basic usage
Predict the next character in a piece of text:
In[]:=
result=NetModel["Sentiment Language Model Trained on Amazon Product Review Data"]["This produc"]
Out[]=
117
The output values correspond to bytes in the UTF-8 encoding (modulo a subtraction by 1). Decode the prediction:
In[]:=
FromCharacterCode[result-1,"UTF-8"]
Out[]=
t
Note that since UTF-8 is a variable-length encoding, decoding single byte values may not always make sense:
In[]:=
FromCharacterCode[240,"UTF-8"]
Out[]=
ð
Multiplicative LSTM and UTF-8 encoding
Multiplicative LSTM and UTF-8 encoding
This model features a non-standard multiplicative LSTM (mLSTM), which can be implemented using :
In[]:=
NetExtract[NetModel["Sentiment Language Model Trained on Amazon Product Review Data"],"mLSTM"]
Out[]=
Inspect the inner structure of the multiplicative LSTM:
In[]:=
NetExtract[NetModel["Sentiment Language Model Trained on Amazon Product Review Data"],{"mLSTM","Net"}]
Out[]=
This net encodes its input string into a sequence of byte values corresponding to its UTF-8 encoding. Inspect the input encoder:
In[]:=
NetExtract[NetModel["Sentiment Language Model Trained on Amazon Product Review Data"],"Input"]
Out[]=
NetEncoder
The net predicts the next byte of the sequence. UTF-8 allows single byte values in the range 0-247; hence, there are 248 possible outputs. Inspect the input decoder:
In[]:=
NetExtract[NetModel["Sentiment Language Model Trained on Amazon Product Review Data"],"Output"]
Out[]=
NetDecoder
Generation
Generation
Write a function to generate text efficiently using . Note that […,"UTF-8"] is used at the end, but output byte values are not guaranteed to produce a valid UTF-8 sequence. In this case, will issue messages:
In[]:=
generateSample[start_,len_,temp_:1,device_:"CPU"]:=Block[{enc,obj,generated,bytes},enc=NetExtract[NetModel["Sentiment Language Model Trained on Amazon Product Review Data"],"Input"];obj=NetStateObject@NetReplacePart[NetModel["Sentiment Language Model Trained on Amazon Product Review Data"],"Input"{"Varying","Integer"}];generated=NestList[{obj[#,{"RandomSample","Temperature"temp}]}&,enc[start],len];bytes=Flatten[generated]-1;FromCharacterCode[bytes,"UTF-8"]]
Generate for 100 steps using “This produc” as an initial string:
In[]:=
generateSample["This produc",300]
Out[]=
This product came in at 1040 for all that I've ridden, and nothing exciting about it - a technical statement. The engine works just fine, even in castles with damp or stone areas and the tough cedar plows this speeds up riff on among ssechs and power equipment- I paint and play acoustics, as well as a walking
The third optional argument is a “temperature” parameter that scales the input to the final softmax. A high temperature flattens the distribution from which characters are sampled, increasing the probability of extracting less likely characters:
In[]:=
generateSample["This produc",300,1.3]
Out[]=
This product was set to Paper Reveals Allure theme.There are many textures and greatly made using them. Not keeping it...With limited great vampire orgasms. It is mute if I can get one, but I promise- 10* is a smart, beautifully, yet pleasant-looking flax, comb, and above an the tub, to.WofDemarkeThis is what
Decreasing the temperature sharpens the peaks of the sampling distribution, further decreasing the probability of extracting less likely characters:
In[]:=
generateSample["This produc",300,0.4]
Out[]=
This product is not for everyone. It is a product that is well made and does what it says it will do. I am a big fan of the style and the fact that it is well made. I have a big head and I can still use the handle with the handle and the compass is a great feature. I have a small collection of tools and th
Very low temperature settings are equivalent to always picking the character with maximum probability. It is typical for sampling to “get stuck in a loop”:
In[]:=
generateSample["This produc",300,0.0001]
Out[]=
This product is a great product and I would recommend it to anyone who wants to start a collection of their own collection.I have been using this product for a few years now and I love it. I have tried other products but this one is the best. I have tried other products but this one is the best. I have tri
Very high temperature settings are equivalent to random sampling. Since the output classes are byte values to be decoded using UTF-8, a very high temperature will almost certainly generate invalid sequences:
In[]:=
generateSample["This produc",300,10]
Out[]=
This produc´¢Sl1°L±i@kK-8r)#¹ç]Ƚ|t-j/á¤a)˙¥囂~ïV+Ð/k5UxLDrðYcî¶[¢ÏöPYÚhÎ=Y=µÏ1äE=@â农ɪ±¬Z¥e-.ëֽ('."4,N|vS)O©#Ã_;Tå#WóKèaA,ð¥=O(ìdfx\oCt(zÑr=6®u