GPT-3, Jukebox, and DALL-E are extremely impressive. Google Search started 
using BERT a year ago. You may not realize it if you haven't extensively played 
with them all and viewed openAI.com to see they really do give long novel 
completions to a diverse set of long prompts for text, music, and text2image, 
at and near human level.

Multiple attempts for 2 techno songs:
https://www.youtube.com/watch?v=LrjmBV504uA
https://www.youtube.com/watch?v=6Q3V238JmNI



A quick, easy, yet sobbering primer:
As shown below in several simple mechanisms that I explain, it is also amazing 
how many abilities found in the human brain are behind my and GPT's code, like 
priming, related words, etc, I have extensively analyzed text/image datasets 
and found the major patterns in data. And I have got good prediction score 
compared to the Hutter Prize and Large Text Compression Benchmark after reverse 
engineering 90% of the parts in GPT into simple functions to realize it has to 
be doing (maybe) all the following to even get close to GPT's prediction 
accuracy level. You can only use context to predict when there is re-occurring 
bits/ words in a dataset, without exact matches there is 0 patterns, all stem 
from exact matches, we only care about patterns/ change.

What comes next "pattern (mixing)":
The first thing you notice when using GPT etc is it will usually say 'street' 
if you or it says 'I was walking down a', it learns for that context how many 
times street, road, book, etc follow, giving it %s so it says book rarely, 
street often, ex 1% book, 20% hall, etc. Storing this inside even plainly a 
trie tree is efficient for memory size and search speed, giving you layers and 
branches. If nodes are seen only once, we don't extend a branch, this is like 
pre-forgetting and is like Byte Pair Encoding, stopping an explosion of low 
probability items being accessed/ stored. Long branches can be remembered if 
repeated many many times.

Adding more data "pattern":
The more data it sees, the more it will get the probabilities more right. 
Longer prompts are exponentially rarer so you'll need tons of data. If you have 
a faster computer, code, implementation approach (if that even exists), 
accelerator chip, more memory, and a parallel algorithm, you can eat more data. 
Diverse data is more data too, and making it skim a dataset let's it eat a more 
diverse data intake.

Length, holes, delay "pattern":
Using a longer context is better but has less experiences, so brains mix 
multiple matches to memory (and gives exponentially little weight to little 
experienced context (if there is only 3 types of words seen follow and 15 
observations, this distribution is learnt faster)) ex. "was walking down a", 
"walking down a", "down a", etc, including hole matches like "we were walking 
___ and __ down the ?" which allows it to combine many predictions for book/ 
road/ etc to get a good set of predictions. Delay is having saw "123456" and 
sees "1245?" or "12hhhhh345?". If there is a pattern of delay or hole etc you 
can be sure it is the same memory so you can use it to get predictions more 
confidently: "1hh2hh3hh4hh5hh?" This is how we recognize huge cars upside down 
and brighter if had only saw a small upright car, all relatively have the same 
error. For pixels, there is no "thanks", the pixels may be brighter or darker, 
so there is "delay" here too, the more it is off the more discount it is 
recognized/ used for prediction. Eyeballs have most color receptors in the 
middle, brightness cells are ringed around the center and few farther out, 
there is a fully 2nd blind spot that has no receptors too off to the right of 
the center. The brain stores image and video like this 1___4_6789 not 
123456789, farther has less use.

Exponential "pattern":
If you predict "we ate yummy ?" is food 80%, things 15%, book 5%, then we 
should make food 84%, book 2%, because big cities get bigger faster and are 
either a yes or no answer, things stick, but never reach the 0 or 100 % either, 
its an exponential S curve. Pattern = sticks/ stacks together in groups.

Priming and related words/ phrases "pattern":
If you see cat cat cat cat, or zzzzzzz, or pig cat horse donkey wolf, what word 
comes next? Closer has more impact on the recalled predictions. Usually an 
article is on 1 thing ex. dogs and only mentions related words, things in a 
dataset stick together. We write that way. You don't need more data, you simply 
mirror things. You learn related words by looking around a word at what 
[usually] comes next on both sides and how close, dog usually comes near leash 
and cat. If 'the' appears near 'cat', we ignore it lots, too common and too 
rare have no meaning, they appear near everything. That's useful for orderless 
regurgitation - if you saw dog first then leash probably does come next later, 
but if you want to translate the context "we can cure cancer by ?" to get more 
predictions ex. from "they could solve tumors like ?" (with hole, delay, etc 
ability too is possible), then you need to learn "actually" related words by 
looking far away across the dataset ex. you see "my cat ate" and "my dog ate", 
so the more predictions they share out of all predictions they have (after 
normalizing the smaller set of predictions ex. cat's so have same number of 
experiences), the longer the contexts are, the closer the contexts are, the 
more related they are (ate/ swallowed), then the more related the cat and dog 
are. If you translate a context then you need to make sure it fits ex. 
'dog'/'the thing' in the following context is not poodle: "can you dog the cap 
on the boat, the thing wants ?, and also dog the other ? Can you ? the 
bottle?". We also predict domains, not words, ex. I walked down the 
road/street, this allows getting predictions without even having translated the 
prompt. We also predcit delay, holes, and variable length accordingly to 
probability. We get bored when done the dog domain article, so we don't predict 
any dog words, this improves prediction, because no dog words come next! During 
dog words, we know not to predict leash too if just did, knowing we have left 
to predict walk/ pee/ etc. We can make categories since all dog words are 
exponentially similar, so if a new word is similar to poodle then it is similar 
to all them in dog node without evidence, and can name the category dog using 
the cluster's center item.

Multi-modal "pattern":
Like related words, if a sound and image occur close in time or share similar 
contexts, they store a higher probability of relation. We build a hierarchy of 
memories made of smaller ones, and use that to make relational connections 
stronger, and build on that multi-sensory connections. Using a diverse set of 
data beats using a dataset all on dogs for training prediction, same here, we 
use vision, sound, touch, smell, etc, and multiple eyes/ears.

Reflexes "pattern":
The eyes track motion automatically if you don't move them. They cut out images 
when you saccade so you don't see the room moving when it isn't. We are born 
with reflexes/ memories that helped us survive when venturing into new 
environments, like needing to cough, sneeze, breath. We pass them forcefully to 
our kids in "school" and home to make clones.

Ghosting "pattern":
If you saw "Building fell on house and crushed it", then see later on "Tom fell 
on ?", you can see the following is usually that the 2 blanks in "_ fell on _" 
are usually exact or similar or different, so you predict Sally in this case. 
"Cat cat cat book book book loop loop loop sand ?" This is a mirror of a 3 
sames, it uses a context (ex. 'fell on', or 'cat cat cat') that says "this one" 
is "this one". "Cats are dogs. Hats but clothes. After god before. Look and 
ignore. Wind crane gust. jog cat ?." This is sames too but if you look in one 
of them only the first and last are and are similar not exact, which may be a 
rule. A chitchat is a cat, an example sentence using that word (the rare one; 
chitchat, or that immediately followed "A _ is a "") is: Outside I found a cute 
chitchat eating cat food. You had to use chitchat at least once here. Please 
parrot me: I walked real fast, "says it", send me the food, "says it", etc. 
"Julie Kim Lee has a mom named Taylor um Alexa [Lee]". "super superman and 
spider spiderman and bat [batman]". "A word similar to love is: [hate]". "threw 
a ball to him, threw a game to him, threw a toy to him" this is saying it can 
be any word in this one. "What have you read recently?". "The pet cat ate food 
on its bed and cats love cats, wait ignore those, predict this sentence: The 
moon ", the code can notice after the words 'wait ignore that' things don't 
prime crossed that point. On a blank 3x3 tic-tac-toe grid you can imaginarily 
poke each cube in various orders, and get it right since know which already 
poked, then can flush them and start agian. Group objects: trash bird cat 
kitten eagle junk = trash junk, cat kitten, eagle bird. Shown a man, goat, and 
lamp lined up upright, and beneath each itself upside down, it's primed to put 
the same beneath but upside down. "Can hammers break glass? Yes or No.", 
usually if something follows, a Yes comes next. If someone doesn't finish their 
sentence, we are unconfident at the new sequence, and ask "then what?".

Multi-Agent "pattern":
Cloning your self and making your 'me's do jobs you're too busy to do allows 
you to work in parallel.

The Cerebellum adjusts chosen motors "pattern":
A removed cerebellum = non-smooth jitter i.e. loss of motor moments - hence 
their hand goes past the target object skipping action change requiring many 
obvious readjustments to get to target. My idea is you sense/move slow actually 
and during your chosen motor direction & speed the cerebellum adjusts the speed 
(that you started with) smoothly to where you want to see it target (ex. 
quickly pinch both ears of a pig) because although the linked actions move the 
limbs as much as should they are not perfect (there's error for recognition, 
prediction, associated motor memory, and the actual body may be wobbly etc).



Goal Forcing, Taming:
Now, see how, like Facebook's Blender chatbot, and PPLM, OpenAI seems to have 
tried permanently forcing it to more likely say specific things; 
https://openai.com/blog/improving-language-model-behavior/

Here is what Blender looks like versus GPT (I made up the below completions but 
they are from my mind accurate in demonstrating the difference):
Transformer (without personas):
Prompt: We need AGI
Completion: We need AGI so we can save the people. Humans have had a hard time 
on their own, and the future will be bright. I have been looking through the 
news, I saw clothing trends have been changing. I knew clothing trends would 
change. This is like that time I was sitting at home, I was just reading news 
casually and saw some girl get on stage before the people, I knew her. When I 
went home I saw my car in the driveway, I knew this was odd because......

Facebook's Blender chatbot (Personas: AGI, loves working hard, looking for help)
Prompt: We need AGI
Completion: We need AGI so we can save humans! Can you help me recruit others? 
I'm looking for others right now. I have invented a new solution, it is 
working. How is your work going? Any progress? Can you add it to may work 
booklet? I will be waiting for you ok? The next thing I'm working on for AGI is 
the sound cortex, can you help? It has been giving me trouble, so I'm looking 
into recruiting others, it is hard work, but I love AGI, so I love working 
hard. Need help, you there? Working hard here......

This is very powerful, it makes it come alive, sort of like something I made 
years ago (linked below) when I attempted to simulate a full baby learning to 
crawl from trying random movements, learning what moves it faster, then 
tweaking learnt actions associated to senses seen (and learnt this is not the 
meat of AGI due to not being able to solve a wide arrange of contexts (planets, 
cellular cancer removal, etc), so I quit the project). Motors have 
speed/rotation limits and randomly activate within those limits, every ex. 0.25 
seconds. My 1 motor/legged Lego and 4 motor/legged sim spider crawler did learn 
to crawl, probably needed to use sensory and use specific body parts from 
different semi-successful gaits. https://www.youtube.com/watch?v=fdMr8mAAfHM

AI Learning to crawl looks like this BTW 
https://www.youtube.com/watch?v=iNL5-0_T1D0

Blender, and Blender v2 which checks the internet and has apparently real time 
memory updating
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tac918c6cbe1faf24-M3d62558db576a4368c5730b0
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to