# 393 UTF-8 Validation

## 393. [UTF-8 Validation](https://leetcode.com/problems/utf-8-validation/description/)

## 1. Question

A character in UTF8 can be from**1 to 4 bytes** long, subjected to the following rules:

1. For 1-byte character, the first bit is a 0, followed by its unicode code.
2. For n-bytes character, the first n-bits are all one's, the n+1 bit is 0, followed by n-1 bytes with most significant 2 bits being 10.

This is how the UTF-8 encoding would work:

```
   Char. number range  |        UTF-8 octet sequence
      (hexadecimal)    |              (binary)
   --------------------+---------------------------------------------
   0000 0000-0000 007F | 0xxxxxxx
   0000 0080-0000 07FF | 110xxxxx 10xxxxxx
   0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
   0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
```

Given an array of integers representing the data, return whether it is a valid utf-8 encoding.

**Note:**\
The input is an array of integers. Only the **least significant 8 bits** of each integer is used to store the data. This means each integer represents only 1 byte of data.

**Example 1:**

```
data = [197, 130, 1], which represents the octet sequence: 11000101 10000010 00000001.

Return true.
It is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.
```

**Example 2:**

```
data = [235, 140, 4], which represented the octet sequence: 11101011 10001100 00000100.

Return false.
The first 3 bits are all one's and the 4th bit is 0 means it is a 3-bytes character.
The next byte is a continuation byte which starts with 10 and that's correct.
But the second continuation byte does not start with 10, so it is invalid.
```

## 2. Implementation

**(1) Bit Manipulation**

思路: 这题基本就按照题目给的utf-8的特性去验证，我们用一个变量count检查接下来的有多少个byte需要判断其是否以10开头的。如果count是0的话，我们将data\[i]右移相应的位数，比如将其右移3位，判断右移后的数是否等于11110，如果是的话说明是一个4byte的character, 注意如果count是0，而一个byte是10000000的话，这不是valid的utf-8格式

```java
class Solution {
    public boolean validUtf8(int[] data) {
        int count = 0;

        for (int oneByte : data) {
            if (count == 0) {
                if ((oneByte >> 3) == 0b11110) {
                    count = 3;
                }
                else if ((oneByte >> 4) == 0b1110) {
                    count = 2;
                }
                else if ((oneByte >> 5) == 0b110) {
                    count = 1;
                }
                else if ((oneByte >> 7) == 0b1) {
                    return false;
                }
            }
            else {
                if ((oneByte >> 6) == 0b10) {
                    --count;
                }
                else {
                    return false;
                }
            }
        }
        return count == 0;
    }
}
```

## 3. Time & Space Complexity

**Bit Manipulation:** 时间复杂度O(N), 空间复杂度O(1)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://protegejj.gitbook.io/algorithm-practice/google/393-utf-8-validation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
