It looks like you have your work cut out for you.
From: [email protected] [mailto:[email protected]] Sent: September 22, 2021 9:18 AM To: AGI Subject: [agi] my 123 "primer on 90% of AGI" GPT-3, Jukebox, and DALL-E are extremely impressive. Google Search started using BERT a year ago. You may not realize it if you haven't extensively played with them all and viewed openAI.com to see they really do give long novel completions to a diverse set of long prompts for text, music, and text2image, at and near human level. Multiple attempts for 2 techno songs: https://www.youtube.com/watch?v=LrjmBV504uA https://www.youtube.com/watch?v=6Q3V238JmNI A quick, easy, yet sobbering primer: As shown below in several simple mechanisms that I explain, it is also amazing how many abilities found in the human brain are behind my and GPT's code, like priming, related words, etc, I have extensively analyzed text/image datasets and found the major patterns in data. And I have got good prediction score compared to the Hutter Prize and Large Text Compression Benchmark after reverse engineering 90% of the parts in GPT into simple functions to realize it has to be doing (maybe) all the following to even get close to GPT's prediction accuracy level. You can only use context to predict when there is re-occurring bits/ words in a dataset, without exact matches there is 0 patterns, all stem from exact matches, we only care about patterns/ change. What comes next "pattern (mixing)": The first thing you notice when using GPT etc is it will usually say 'street' if you or it says 'I was walking down a', it learns for that context how many times street, road, book, etc follow, giving it %s so it says book rarely, street often, ex 1% book, 20% hall, etc. Storing this inside even plainly a trie tree is efficient for memory size and search speed, giving you layers and branches. If nodes are seen only once, we don't extend a branch, this is like pre-forgetting and is like Byte Pair Encoding, stopping an explosion of low probability items being accessed/ stored. Long branches can be remembered if repeated many many times. Adding more data "pattern": The more data it sees, the more it will get the probabilities more right. Longer prompts are exponentially rarer so you'll need tons of data. If you have a faster computer, code, implementation approach (if that even exists), accelerator chip, more memory, and a parallel algorithm, you can eat more data. Diverse data is more data too, and making it skim a dataset let's it eat a more diverse data intake. Length, holes, delay "pattern": Using a longer context is better but has less experiences, so brains mix multiple matches to memory (and gives exponentially little weight to little experienced context (if there is only 3 types of words seen follow and 15 observations, this distribution is learnt faster)) ex. "was walking down a", "walking down a", "down a", etc, including hole matches like "we were walking ___ and __ down the ?" which allows it to combine many predictions for book/ road/ etc to get a good set of predictions. Delay is having saw "123456" and sees "1245?" or "12hhhhh345?". If there is a pattern of delay or hole etc you can be sure it is the same memory so you can use it to get predictions more confidently: "1hh2hh3hh4hh5hh?" This is how we recognize huge cars upside down and brighter if had only saw a small upright car, all relatively have the same error. For pixels, there is no "thanks", the pixels may be brighter or darker, so there is "delay" here too, the more it is off the more discount it is recognized/ used for prediction. Eyeballs have most color receptors in the middle, brightness cells are ringed around the center and few farther out, there is a fully 2nd blind spot that has no receptors too off to the right of the center. The brain stores image and video like this 1___4_6789 not 123456789, farther has less use. Exponential "pattern": If you predict "we ate yummy ?" is food 80%, things 15%, book 5%, then we should make food 84%, book 2%, because big cities get bigger faster and are either a yes or no answer, things stick, but never reach the 0 or 100 % either, its an exponential S curve. Pattern = sticks/ stacks together in groups. Priming and related words/ phrases "pattern": If you see cat cat cat cat, or zzzzzzz, or pig cat horse donkey wolf, what word comes next? Closer has more impact on the recalled predictions. Usually an article is on 1 thing ex. dogs and only mentions related words, things in a dataset stick together. We write that way. You don't need more data, you simply mirror things. You learn related words by looking around a word at what [usually] comes next on both sides and how close, dog usually comes near leash and cat. If 'the' appears near 'cat', we ignore it lots, too common and too rare have no meaning, they appear near everything. That's useful for orderless regurgitation - if you saw dog first then leash probably does come next later, but if you want to translate the context "we can cure cancer by ?" to get more predictions ex. from "they could solve tumors like ?" (with hole, delay, etc ability too is possible), then you need to learn "actually" related words by looking far away across the dataset ex. you see "my cat ate" and "my dog ate", so the more predictions they share out of all predictions they have (after normalizing the smaller set of predictions ex. cat's so have same number of experiences), the longer the contexts are, the closer the contexts are, the more related they are (ate/ swallowed), then the more related the cat and dog are. If you translate a context then you need to make sure it fits ex. 'dog'/'the thing' in the following context is not poodle: "can you dog the cap on the boat, the thing wants ?, and also dog the other ? Can you ? the bottle?". We also predict domains, not words, ex. I walked down the road/street, this allows getting predictions without even having translated the prompt. We also predcit delay, holes, and variable length accordingly to probability. We get bored when done the dog domain article, so we don't predict any dog words, this improves prediction, because no dog words come next! During dog words, we know not to predict leash too if just did, knowing we have left to predict walk/ pee/ etc. We can make categories since all dog words are exponentially similar, so if a new word is similar to poodle then it is similar to all them in dog node without evidence, and can name the category dog using the cluster's center item. Multi-modal "pattern": Like related words, if a sound and image occur close in time or share similar contexts, they store a higher probability of relation. We build a hierarchy of memories made of smaller ones, and use that to make relational connections stronger, and build on that multi-sensory connections. Using a diverse set of data beats using a dataset all on dogs for training prediction, same here, we use vision, sound, touch, smell, etc, and multiple eyes/ears. Reflexes "pattern": The eyes track motion automatically if you don't move them. They cut out images when you saccade so you don't see the room moving when it isn't. We are born with reflexes/ memories that helped us survive when venturing into new environments, like needing to cough, sneeze, breath. We pass them forcefully to our kids in "school" and home to make clones. Ghosting "pattern": If you saw "Building fell on house and crushed it", then see later on "Tom fell on ?", you can see the following is usually that the 2 blanks in "_ fell on _" are usually exact or similar or different, so you predict Sally in this case. "Cat cat cat book book book loop loop loop sand ?" This is a mirror of a 3 sames, it uses a context (ex. 'fell on', or 'cat cat cat') that says "this one" is "this one". "Cats are dogs. Hats but clothes. After god before. Look and ignore. Wind crane gust. jog cat ?." This is sames too but if you look in one of them only the first and last are and are similar not exact, which may be a rule. A chitchat is a cat, an example sentence using that word (the rare one; chitchat, or that immediately followed "A _ is a "") is: Outside I found a cute chitchat eating cat food. You had to use chitchat at least once here. Please parrot me: I walked real fast, "says it", send me the food, "says it", etc. "Julie Kim Lee has a mom named Taylor um Alexa [Lee]". "super superman and spider spiderman and bat [batman]". "A word similar to love is: [hate]". "threw a ball to him, threw a game to him, threw a toy to him" this is saying it can be any word in this one. "What have you read recently?". "The pet cat ate food on its bed and cats love cats, wait ignore those, predict this sentence: The moon ", the code can notice after the words 'wait ignore that' things don't prime crossed that point. On a blank 3x3 tic-tac-toe grid you can imaginarily poke each cube in various orders, and get it right since know which already poked, then can flush them and start agian. Group objects: trash bird cat kitten eagle junk = trash junk, cat kitten, eagle bird. Shown a man, goat, and lamp lined up upright, and beneath each itself upside down, it's primed to put the same beneath but upside down. "Can hammers break glass? Yes or No.", usually if something follows, a Yes comes next. If someone doesn't finish their sentence, we are unconfident at the new sequence, and ask "then what?". Multi-Agent "pattern": Cloning your self and making your 'me's do jobs you're too busy to do allows you to work in parallel. The Cerebellum adjusts chosen motors "pattern": A removed cerebellum = non-smooth jitter i.e. loss of motor moments - hence their hand goes past the target object skipping action change requiring many obvious readjustments to get to target. My idea is you sense/move slow actually and during your chosen motor direction & speed the cerebellum adjusts the speed (that you started with) smoothly to where you want to see it target (ex. quickly pinch both ears of a pig) because although the linked actions move the limbs as much as should they are not perfect (there's error for recognition, prediction, associated motor memory, and the actual body may be wobbly etc). Goal Forcing, Taming: Now, see how, like Facebook's Blender chatbot, and PPLM, OpenAI seems to have tried permanently forcing it to more likely say specific things; https://openai.com/blog/improving-language-model-behavior/ Here is what Blender looks like versus GPT (I made up the below completions but they are from my mind accurate in demonstrating the difference): Transformer (without personas): Prompt: We need AGI Completion: We need AGI so we can save the people. Humans have had a hard time on their own, and the future will be bright. I have been looking through the news, I saw clothing trends have been changing. I knew clothing trends would change. This is like that time I was sitting at home, I was just reading news casually and saw some girl get on stage before the people, I knew her. When I went home I saw my car in the driveway, I knew this was odd because...... Facebook's Blender chatbot (Personas: AGI, loves working hard, looking for help) Prompt: We need AGI Completion: We need AGI so we can save humans! Can you help me recruit others? I'm looking for others right now. I have invented a new solution, it is working. How is your work going? Any progress? Can you add it to may work booklet? I will be waiting for you ok? The next thing I'm working on for AGI is the sound cortex, can you help? It has been giving me trouble, so I'm looking into recruiting others, it is hard work, but I love AGI, so I love working hard. Need help, you there? Working hard here...... This is very powerful, it makes it come alive, sort of like something I made years ago (linked below) when I attempted to simulate a full baby learning to crawl from trying random movements, learning what moves it faster, then tweaking learnt actions associated to senses seen (and learnt this is not the meat of AGI due to not being able to solve a wide arrange of contexts (planets, cellular cancer removal, etc), so I quit the project). Motors have speed/rotation limits and randomly activate within those limits, every ex. 0.25 seconds. My 1 motor/legged Lego and 4 motor/legged sim spider crawler did learn to crawl, probably needed to use sensory and use specific body parts from different semi-successful gaits. https://www.youtube.com/watch?v=fdMr8mAAfHM AI Learning to crawl looks like this BTW https://www.youtube.com/watch?v=iNL5-0_T1D0 Blender, and Blender v2 which checks the internet and has apparently real time memory updating <https://agi.topicbox.com/latest> Artificial General Intelligence List / AGI / see discussions <https://agi.topicbox.com/groups/agi> + participants <https://agi.topicbox.com/groups/agi/members> + delivery <https://agi.topicbox.com/groups/agi/subscription> options Permalink <https://agi.topicbox.com/groups/agi/Tac918c6cbe1faf24-M3d62558db576a4368c5730b0> ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Tac918c6cbe1faf24-Mc3694f0646fbc856b6c5c708 Delivery options: https://agi.topicbox.com/groups/agi/subscription
