https://arxiv.org/pdf/1905.05950.pdf
See why would they even consider this. Backpropagation in models with an objective loss tries to discover functions/patterns, so it can predict in ex. images better. It's an intelligent Brute Force because it can find so many rare patterns/functions very accurately ex. how bubbles work (physics rules) or that dogs usually bark than sleep. But there's no such thing as intelligent Brute Force. You either do Brute Force or you do intelligent decisions. So I'm starting to think the backprop going on in these networks that is discovering complex functions/patterns is really just throwing away the unlikely things and merging together Probable things. If I like/predict AI, and cars, maybe AI+car = something that is more probably going to be something worth-while to try. That's why we merge DNA with humans, instead of just pure mutations. I personally think we should interbreed humans with birds because, we really need wings. Complexity is built by small simple rules. Really everything is simple, there's just a few laws in physics. We make life confusing. But only after we learn the true simple ways first do we make life complex by building larger functions/patterns. We want our net not predict by using a precise galaxy-sized physics simulation, but by using camera/etc "snapshots" of our universe and learn first the most basic rules (but not physics rules, other types of simple rules!). Syntactics can capture ANY pattern in any universe that is not 100% random. It's the simplest pattern and builds other larger patterns. The closer our net gets to learning functions/patterns that closely resemble physics simulations the more we know we must be using this either sparingly or will not be learning it simply, as it's costly. In a modern ANN network that learns by backpropagation etc, it is building larger functions/patterns (that are less general and more costly) by using smaller patterns/function nodes (usually the most Probable nodes). It is a BF search but it isn't totally BF/"backprop" because it's based on how favored the nodes are. We can't build/learn higher layers yet until first backprop the lower layers. There's nothing up there really yet. Backprop is not lowering the loss by starting at the end of the net, it's doing it by building new functions out of smaller functions. When we look at the GOFAI that made ex. PPM (Partial Prediction Match) or some similar advanced approach, we see syntactics, semantics ex. word2vec, recency, simple patterns like that. But these are what build all other patterns/functions. IF-THEN rules are syntactic, which runs our physics. So when we pick what nodes to merge, we choose Probable ones, only with some random Brute Force governing it. Backprop should not be the thing deciding which nodes merge their weights by X amount. We get less error when we merge nodes by X amount, but what we really need to know is which to merge and by how much amount. Hinton said Backprop isn't brain-like / the way. We want to only install in AGI the most basic patterns/keys of the universe so it can learn by itself / get some accuracy for all the trillions of rare patterns which are the result of the elementary patterns! Backprop is a pattern that finds basic patterns? No, it can't learn similar words by contexts, it can only strengthen connections between them with luck unless it uses basic patterns "you" add to it to help it do so sooner. Backprop doesn't exist therefore as it's clueless, the hierarchy you give it is syntactics "already"!, it's only finding the weights on its own through a backwards approach. Burrows Wheeler Transform can do pretty good but this too is FREQUENCY, it stacks/merges same types but in "physical reality". ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T17ed4e5aa955f6c2-M861e97bba23cfdd091881799 Delivery options: https://agi.topicbox.com/groups/agi/subscription
