Invented by Carlos E. GUESTRIN, Leon A. GATYS, Shreyas V. JOSHI, Gustav M. LARSSON, Kory R. WATSON, Srikrishna Sridhar, Karla P. VEGA, Shawn R. Scully, Thorsten Gernoth, Onur C Hamsici, Apple Inc
Machine learning-assisted image prediction refers to the use of algorithms to analyze and predict the content of images. This technology has a wide range of applications across various industries, including healthcare, retail, automotive, and security.
In the healthcare industry, machine learning-assisted image prediction has revolutionized medical imaging. Algorithms can now accurately detect and diagnose diseases such as cancer, cardiovascular conditions, and neurological disorders. This has led to faster and more accurate diagnoses, ultimately improving patient outcomes.
In the retail sector, machine learning-assisted image prediction has transformed the way businesses operate. Algorithms can analyze customer images and predict their preferences, allowing retailers to offer personalized recommendations and targeted advertising. This not only enhances the customer experience but also increases sales and customer loyalty.
In the automotive industry, machine learning-assisted image prediction has enabled the development of advanced driver assistance systems (ADAS) and autonomous vehicles. Algorithms can analyze real-time images from cameras and sensors to detect objects, pedestrians, and road conditions. This technology plays a crucial role in ensuring the safety and efficiency of autonomous vehicles.
In the security sector, machine learning-assisted image prediction has enhanced surveillance and threat detection capabilities. Algorithms can analyze video footage and images to identify suspicious activities, recognize faces, and detect anomalies. This technology has proven to be invaluable in preventing crime and ensuring public safety.
The market for machine learning-assisted image prediction is expected to continue growing at a rapid pace. According to a report by MarketsandMarkets, the global image recognition market is projected to reach $38.9 billion by 2025, with a compound annual growth rate (CAGR) of 19.5% from 2020 to 2025.
The growth of this market can be attributed to several factors. Firstly, the increasing availability of large datasets has provided machine learning algorithms with more training data, leading to improved accuracy and performance. Secondly, advancements in hardware, such as graphics processing units (GPUs) and specialized chips, have enabled faster and more efficient image processing. Lastly, the growing demand for automation and efficiency across industries has fueled the adoption of machine learning-assisted image prediction technologies.
However, there are also challenges associated with this market. One of the main challenges is the need for high-quality and diverse datasets for training machine learning algorithms. Without sufficient and representative data, the accuracy and reliability of image predictions may be compromised. Additionally, there are concerns regarding privacy and data security, especially when it comes to analyzing personal images and videos.
In conclusion, the market for machine learning-assisted image prediction is experiencing significant growth and has transformative potential across various industries. As technology continues to advance and datasets become more abundant, the accuracy and capabilities of machine learning algorithms will continue to improve. This opens up new opportunities for businesses and organizations to leverage image prediction technology for enhanced decision-making, efficiency, and innovation.
The Apple Inc invention works as follows
A device that implements a system to provide predicted RGB images comprises at least one processor configured for obtaining an infrared picture of a target, and a reference RGB picture of the target. The at least processor is configured to further provide the infrared and reference RGB images to a machine-learning model. The machine-learning model has been trained to output predictions RGB images based on the infrared and reference RGB images. The machine learning model output is used to predict a RGB image for the subject.
Background for Machine Learning assisted image prediction
The image sensor of a device (e.g. a camera) can be used by the user to take photos or videos or to join audio-video conferences with other participants. The lighting environment can affect the quality of an image in some cases.
The detailed description below is meant to be a description for various configurations that the technology in question can be used. It is not intended to be the only configurations. The drawings that are attached to this document form part of the detailed descriptions. To provide a comprehensive understanding of the technology, the detailed description contains specific details. The subject technology can be implemented in a variety of ways, not just the details that are provided here. Structures and components in one or more implementations are shown as block diagrams to avoid confusing the concepts of subject technology.
The image sensor of a device (e.g. camera) can be used to take photos or videos or to join audio-video conferences with other participants. Image quality can be affected by the lighting environment in some cases. Image quality can be affected by low-light environments. In low-light settings, for example, the details of a face (or other subject) may not be clear.
Moreover, even though the device could edit RGB images in real-time using general image processing methods (e.g. adjusting contrast and brightness, or color values by using preset values), it may not provide the level of detail that users desire (e.g. with respect to skin tones, skin textures, face shapes). It may be necessary to improve image quality when the ambient lighting is poor.
The subject system uses infrared data captured simultaneously with reference RGB data in conjunction with machine learning models trained on infrared data to predict RGB images. The machine learning model can be trained by a computer using infrared images, reference RGB images, and target RGB data (or expected RGB data) spanning multiple subjects, such as faces of people. The reference RGB data can be captured concurrently with the infrared data (e.g. in low light) or may be captured before the infrared data (e.g. in good lighting). Separate machine learning models can be generated and then trained using these two types. The server can provide the machine learning models trained to devices for local storage.
When a camera on a device (e.g. a front-facing camera used for photos or audio-video conferences) is in use, the device can employ machine learning models to predict RGB images (e.g. reconstructed RGB with improved image quality). The device can also select the machine learning model it will use for low-lighting based on how much ambient light is present. In dimly-lit environments, where the light level is not too low (e.g. low-lit RGB images captured simultaneously with infrared images), RGB image data collected can be provided to a machine learning model that has been trained on low-lit RGB images. If the light level is too low then RGB image data taken prior to the infrared images in good lighting could be used to train a second machine-learning model for predicting RGB pictures.
FIG. “FIG. Not all components shown may be used for all implementations. Some implementations may also include components that are different or additional to those in the figure. The arrangement and type may be changed without departing the spirit or the scope of the present claims. “Either more components, different components or fewer can be provided.
The network environment (100) includes electronic devices 102, 103 and 04 (hereafter 102 – 104), a netwok 106, server 108 and image training database 110. The network 106 can be used to communicate (directly or indirect) with, for instance, two or more electronic devices 102, 103, and 104, as well as the server 108 and/or image training database 110. In some implementations, network 106 can be a network of interconnected devices, which may include the Internet or be communicatively connected to it. The network environment 100 in FIG. is shown for explanation purposes. The network environment 100 is shown in FIG.
One or more electronic devices 102-104 can be, for instance, portable computing devices such as laptop computers, smartphones, smart speakers, peripherals (e.g. digital cameras, headphones), tablet devices, wearables such as watches, bands, etc. or any other suitable device that includes one or more wireless inputs such as WLAN radios or cellular radios or Bluetooth radios or Zigbee or Near Field Communication (NFC) or other wireless interfaces. In FIG. In FIG. The electronic devices 102 to 104 can be all or a part of the device described below in relation to FIG. “Each of the electronic devices 102-104 may be, and/or include all or part of the electronic device discussed below with regard to FIG. 7.
The server 108 can be all or part the electronic system described below in relation to FIG. 7. The server 108 can include one or several servers, for example a cloud. To illustrate the various operations, only one server 108 will be shown. These operations, as described herein, may be carried out by one or multiple servers. Each operation can be performed on the same server or by different servers. The image training database 110 can also be seen in FIG. The server 108 is shown in FIG. “However, the entire or a part of the image-training database 110 can be stored locally in relation to the server 108.
In one or more implementations the server 108 may be used to create, train, and/or update machine-learning models that are configured for predicted RGB images to be output based on infrared image data and/or RGB picture data. The machine learning models may, for example, be trained with infrared and/or RGB reference image data (e.g. captured concurrently and/or before the infrared images), provided by the electronic devices 102 to 104.
As described in this document, there may be different interactions between the electronic devices 102-104 and the server 108, as well as the image training database. One class of interaction may involve uploading infrared data and reference RGB to the image database 110 so that the server can train and update machine learning models based on this data. Downloading machine learning models by electronic devices 1002-104 is another class of interaction. These machine learning models may be downloaded, for example, as part of software updates (e.g. of an operating system or an application that uses the device’s camera) or when updated machine learning models are made available by the server.
FIG. “FIG. To explain the subject, FIG. The electronic device 102 is the primary focus of this description. However, FIG. The electronic devices 102 to 104 in FIG. 1. Nevertheless, not all components depicted in the figure may be used. Some implementations could include different or additional components. The arrangement and type may be changed without departing the spirit or the scope of the present claims. “Either more components, a different component, or fewer are possible.
The electronic device may include a CPU 202, memory 204 and a communication interface (206). It could also include one or more image sensors 208. The processor 202 can include logic, circuitry and/or codes that allow the processing of data and/or control operations on the electronic device 102. The processor 202 can be used to send control signals to other electronic components. The processor 202 can also control data transfers between different parts of the electronic device. The processor 202 can also be used to implement an operating system, or execute code that manages the operations of the electronic devices 102.
The memory 204 can include circuitry and/or logic that allows storage of different types of data such as received data or generated data. It may also contain code and/or configuration data. Memory 204 can include random access memory, read-only memory, flash and/or magnetic storage.
The communication interface 206 can include logic, circuitry and/or code to enable wired or wireless communications, for example, between the electronic devices 102 – 104, the server and/or image training database 110. The communication interface 206 can include, for instance, one or more Bluetooth communication, cellular communication, NFC communication, WLAN communication, USB communication, Zigbee, or other communication interfaces.
The image sensors 208 can be used to capture images corresponding to subjects (e.g. a face of a person). The image sensors 208 can correspond to RGB image sensors and/or infrared images. The image data captured on the image sensor(s), 208, may indicate color, depth and/or 2D or 3D characteristics. The electronic device 102 can be equipped with an ambient light sensor that detects ambient light levels in the current environment.
In one or several implementations, one of the processor 202 and the memory 204 may be implemented as software (e.g. subroutines or code), in hardware (e.g. an Application Specific Integrated Circuit, a Field Programmable Gate Array, a Programmable Logic Device, a Controller, a State Machine, Gated Logic, discrete hardware, or other suitable devices), or a combination.
FIG. “FIG. To explain the concept, FIG. The electronic device 102 is the primary focus of this description. However, FIG. “Instead, FIG. 1.
As noted above, the image quality of an electronic device camera (e.g. a front-facing one) can vary depending on lighting conditions. Image quality can be affected in dimly lit environments, making it unsuitable for the user. In the example shown in FIG. The RGB image 302 corresponds to an RGB captured by the RGB image sensor of the electronic device (e.g. one of the sensor(s) 208), in a low light environment. The image quality of RGB image 302 appears to be poor. Facial features are hard to discern. However, there is still a faint semblance in the RGB 302 image.
FIG. “FIG. In the same dimly lit environment, the infrared picture 304 could have been captured using an infrared sensor (e.g. one of the image sensors 208) on the electronic device 102). In comparison to the RGB image 30, the infrared picture 304 provides better visibility of facial features, such as eyes, noses, and mouths. The infrared picture 304 is lacking color, and it may look unnatural to the user.
Click here to view the patent on Google Patents.