A “long short-term memory neural network” is apparently an important and recent advance in artificial
intelligence research, even though it sounds a bit like "jumbo shrimp."
intelligence research, even though it sounds a bit like "jumbo shrimp."
The term refers to a neural network devised with “forget gates” attached to cells of memory, originally
in order to model human short-term memory. But the “forget gate” can be set for any length of time,
so the researchers can make this “short-term memory” last as long as they want it to, hence the
oxymoronic name, "long short-term memory."
in order to model human short-term memory. But the “forget gate” can be set for any length of time,
so the researchers can make this “short-term memory” last as long as they want it to, hence the
oxymoronic name, "long short-term memory."
The Vanishing Gradient
Networks that have these LSTM units as their components have certain advantages over the “vanilla
version” of recurrent neural networks (RNNs). In particular, they can overcome what is known as the
“vanishing gradient” problem. That is, vanilla RNNs were built to learn from new data, but to give that
data a weight that depended on the amount of already -stored data on the same subject a network
already possess; in other words, to model the human process of inductive learning.
version” of recurrent neural networks (RNNs). In particular, they can overcome what is known as the
“vanishing gradient” problem. That is, vanilla RNNs were built to learn from new data, but to give that
data a weight that depended on the amount of already -stored data on the same subject a network
already possess; in other words, to model the human process of inductive learning.
But ... once a network has already received a good deal of data, the marginal significance of the
new datum can be insignificant, “vanishing.” Thus, such machines could become incapable of
learning further at all. This meant that networks had to be allowed to forget, in order to preserve
the salience of new data, which is the problem resolved by the newer LSTM models.
new datum can be insignificant, “vanishing.” Thus, such machines could become incapable of
learning further at all. This meant that networks had to be allowed to forget, in order to preserve
the salience of new data, which is the problem resolved by the newer LSTM models.
Comments
Post a Comment