An excerpt from Alvin Wang Graylin and Louis Rosenberg’s Our Next Reality: How the AI-powered Metaverse Will Reshape the World.
The AI-powered metaverse terrifies me. I say that as someone who has devoted his career to developing VR, AR, and AI systems. Not only that, I believe deeply in the potential of these technologies to enhance the human experience, expand human capabilities, and elevate human society. The problem is, that these technologies are so powerful (and personal), they can have dangerous impacts on individuals and populations. This means we need to be proactive about the risks, pushing for a thoughtful policy that protects against the downsides while not limiting the upsides.
To appreciate the dangers, I often remind myself of one fundamental fact – when a user enters a virtual or augmented world, they are immersing themselves in a reality that is controlled by a third party with interests that may not be aligned with their own. That third party will likely be a large corporation or state actor and when you enter their world, they will be able to track everything you do, detect precisely how you feel while doing it, and change the world around you at their discretion.2
Of course, online privacy protections already exist in many jurisdictions around the world, but those were developed with traditional computing in mind. The risks in immersive worlds are far more extensive. That’s because metaverse platforms won’t just track where you click and who your friends are, they will monitor where you go, what you do, who you’re with, what you look at, and even how long your gaze lingers. Platforms will also be able to track your posture, gait, and speed of motion, assessing where you slow down and where you speed up when you lean forward with interest or lean away with boredom. If unregulated, this will likely be continuous monitoring, capturing a complete record of your activities in various environments.
This extreme level of tracking won’t only be in fully virtual worlds, but in the augmented world as you walk down real streets and browse real stores, visit real restaurants, or rush between college classes. Almost everything you do in your daily life can be tracked and stored. Whether you’re walking down real or virtual streets, platforms could know which store windows you slow down and peer into. They will even know what parts of the display draw your attention and for how long. If a stranger passes you on the street and you give them a few extra seconds of attention because you find them attractive, the platform will know. If you refuse to make eye contact with a homeless person who is asking for handouts, the platform will know.
If you sigh with envy when a young family passes by pushing a stroller, the platform will know. If a particular brand of car catches your attention for a few extra milliseconds, the platform will know. If you give a little extra distance when a group of teens of a particular race passes you on the sidewalk, the platform will know. Whenever you grab a product off the shelf, in a virtual or augmented world, the platform will know what you considered, and how long you considered it and might use your pupil dilation to infer varying levels of engagement or enthusiasm.
This brings me to the tracking capabilities of metaverse platforms that go beyond behavioral monitoring and cross the line into emotional profiling. I make this distinction because threats to emotional privacy are potentially the most dangerous of all.
For example, headsets can already track your facial expressions and use that information to infer your emotional reactions in real time. This, along with body posture, eye motions, and pupil dilations, which are also tracked by immersive devices, will enable platforms to generate a rich and nuanced profile of your emotions throughout your normal life, documenting how you react to thousands of interactions every day.
With the power of AI, these capabilities will reach ‘super-human’ levels, able to detect subtle micro-expressions on your face that no person would ever notice, faint blood flow patterns in your complexion that no human would react to, and even detect changes in your respiration rate, blood pressure, pupil dilation or heart rate which can also be used to assess changes in your emotional state. And if you are engaged in conversation, metaverse platforms could track your vocal inflections (and word choices) to further assess your emotions at every moment in time.
You might ask – why would anyone allow these devices to track their facial expressions and vocal inflections? Well, in virtual worlds where your persona will be represented by an increasingly photorealistic avatar, facial expressions, and vocal inflections are needed to make the avatar look like an authentic and expressive human that represents your sentiments to the world. This is a positive technology in many ways, as it will allow person-to-person interactions in the metaverse that convey emotion and evoke empathy, which are critical aspects of our social world.
Similarly, tracking vital signs like heart rate and respiration has valid uses for health and fitness applications. This means we cannot just ban these tracking technologies. On the other hand, it seems unlikely that consumers want third parties to generate insights into their inner feelings throughout their daily lives based on their real-time facial expressions, body posture, pupil dilation, heart rate, and blood pressure.
So, how do we balance the value of tracking behaviors and emotions in immersive worlds with the very obvious privacy risks? One way to parse the problem is to consider what data is tracked in real-time and what data is stored over time. This difference is quite important because the danger of stored data is significantly greater than data used in real-time and then forgotten. That’s because stored data can be processed by machine learning methods, using AI to find patterns in your behavioral and emotional tendencies throughout your life.
These patterns can be used to create behavioral and emotional models that can predict what you are likely to do in a wide range of common situations and how you are likely to feel in response to most common interactions.5 The ability to anticipate what you will do and feel is an extreme power to give to platform providers. I doubt most consumers would want to enter virtual and augmented worlds if they knew that third parties could have such capabilities. For this reason, regulation could be helpful to the industry, building trust among the public.