> partial nice idea! sad that only partial > > the idea starts with context: some of us have the experience of being > manipulated by something like AI without regard to our wellbeing or > capacity, with no indication of how to communicate with where-ever the > influence comes from. > > so, then the idea: let’s train an influence AI on agents that have > _limited capacity_ and function poorly and then break if things that work > some in the short term are continued for the long term or excessively :D > > the AI would follow the familiar pattern of discovering influences that > work, and then using them, but then because we are doing the training we > would get to train it on the breaking period, and show it that it needs to > care for the agents for them to succeed, and nurture them to recover if it > broke them, and not repeat breaking any. > > one idea is then we could ask everybody to use the model as a baseline to > train other models! other ideas exist. > > but the idea of building comprehension of capacity and protection of > safety and wellness seems nice. sorry if it’s partial or stated badly :s > > i imagine an agent that has like a secret capacity float that reduces when > it does stuff, you could model it to different degrees, and when too many > things are influenced of it the capacity float reaches zero and the agent > breaks and can’t do things any more :/ > > a more complex model could involve the agent recovering capacity in ways > that the model has difficulty learning >
there are a few ideas, mostly involving simple models, but disagreement around them. some like modeling non-recovering capacity and showing how the AI then maximizes reward by killing all the agents :s