本文介绍了 Netty 构建的应用面临的相关风险以及 SSL 安全特性及 Netty 相关的源码解析等。

Netty面临的安全风险

下面介绍 Netty 典型应用场景下面临的安全挑战。

应用场景：内部使用

仅限内部使用的 RPC 通信框架。随着业务的发展，网站规模的扩大，传统基于 MVC 的垂直架构已经无法应对业务的快速发展。需要对数据和业务进行水平拆分，基于 RPC 的分布式服务框架成为最佳选择。

业务水平拆分之后，内部的各个模块需要进行高性能的通信，传统基于 RMI 和 Hession 的同步阻塞式通信已经无法满足性能和可靠性要求。因此，高性能的 NIO 框架成为构建分布式服务框架的基石高性能的 RPC 框架，各模块之间往往采用长连接通信，通过心跳检测保证链路的可靠性。由于 RPC 框架通常是在内部各模块之间使用，运行在授信的内部安全域中，不直接对外开放接口。因此，不需要做握手、黑白名单、SSL/TLS 等，正所谓是“防君子不防人在这种应用场景下，Netty 的安全性是依托企业的防火墙、安全加固操作系统等系统级安全来保障的，它自身并不需要再做额外的安全性保护工作。

应用场景：第三方使用

对第三方开放的通信框架。如果使用 Netty 做 RPC 框架或者私有协议栈，RPC 框架面向非授信的第三方开放，例如将内部的一些能力通过服务对外开放出去，此时就需要进行安全认证，如果开放的是公网 IP，对于安全性要求非常高的一些服务，例如在线支付、订购等，需要通过 SSL/TLS 进行通信。

对第三方开放的通信框架的接口调用存在三种场景：

在企业内网，开放给内部其他模块调用的服务，通常不需要进行安全认证和 SSL/TLS 传输
在企业内网，被外部其他模块调用的服务，往往需要利用 IP 黑白名单、握手登陆等方式进行安全认证，认证通过之后双方使用普通的 Socket 进行通信，如果认证失败，则拒绝客户端连接；
开放给企业外部第三方应用访问的服务，往往需要监听公网 IP（通常是防火墙的 IP 地址），由于对第三方服务调用者的监管存在诸多困难，或者无法有效监管，这些第三方应用实际是非授信的。为了有效应对安全风险，对于敏感的服务往往需要通过 SSL/TLS 进行安全传输。

应用场景：应用层协议的安全性

作为高性能、异步事件驱动的 NIO 框架，Netty 非常适合构建上层的应用层协议，相关原理，如下图所示。

由于绝大多数应用层协议都是公有的，这意味着底层的 Netty 需要向上层提供通信层的安全传输，也就是需要支持 SSL/TLS JDK 的安全类库提供了javax.net.ssl.SSLSocket和javax.net.ssl.SSLServerSocket类库用于支持 SSL/TLS 安全传输，对于 NIO 非阻塞 Socket 通信，JDK并没有提供现成可用的类库简化用户开发。

Netty 通过 JDK 的 SSLEngine，以 SslHandler 的方式提供对 SSL/TLS 安全传输的支持，极大的简化了用户的开发工作量，降低开发难度。对于 Netty 默认提供的 HTTP 协议，Netty 利用 SsIHandler，同样支持 Https 协议。

Netty SSL安全特性

Netty 通过 SslHandler 提供了对 SSL 的支持，它支持的 SSL 协议类型包括：SSL V2、SSL V3 和 TLS。

SSL单向认证

单向认证，即客户端只验证服务端的合法性，服务端不验证客户端。

单向认证原理

SSL双向认证

与单向认证不同的是，服务端也需要对客户端进行安全认证。这就意味着客户端的自签名证书也需要导入到服务端的数字证书仓库中。

双向认证原理

SSL 双向认证相比单向认证，多了一步服务端发送认证请求消息给客户端，客户端发送自签名证书给服务端进行安全认证的过程。

第三方CA认证

使用 jdk keytool 生成的数字证书是自签名的。自签名就是指证书只能保证自己是完整且没有经过非法修改，但是无法保证这个证书是属于谁的。为了对自签名证书进行认证，需要每个客户端和服务端都交换自己自签名的私有证书，对于一个大型网站或者应用服务器，这种工作量是非常大的。

基于自签名的 SSL 双向认证，只要客户端或者服务端修改了密钥和证书，就需要重新进行签名和证书交换，这种调试和维护工作量是非常大的。因此，在实际的商用系统中往往会使用第三方 CA 证书颁发机构进行签名和验证。我们的浏览器就保存了几个常用的 CA_ROOT。每次连接到网站时只要这个网站的证书是经过这些 CA_ROOT 签名过的。就可以通过验证了。

CA 数字证书认证服务往往是收费的，国内有很多数字认证中心都提供相关的服务，有需要的可以通过这些商业机构获取认证。CA 可以通过 OpenSSL 生成。

Netty SSL源码分析

客户端

当客户端和服务端的 TCP 链路建立成功之后，SslHandler 的 channelActive 被触发 SSL 客户端通过 SSL 引擎发起握手请求消息，代码如下：

private void handshake() {
    // Begin handshake.
    final ChannelHandlerContext ctx = this.ctx;
    try {
        engine.beginHandshake();
        wrapNonAppData(ctx, false);
    } catch (Throwable e) {
        setHandshakeFailure(ctx, e);
    } finally {
        forceFlush(ctx);
    }
}

发起握手请求之后，需要将 SSLEngine 创建的握手请求消息进行 SSL 编码，发送给服务端，因此，握手之后立即调用 wrapNonAppData 方法，下面具体对该方法进行分析：

private boolean wrapNonAppData(final ChannelHandlerContext ctx, boolean inUnwrap) throws SSLException {
    ByteBuf out = null;
    ByteBufAllocator alloc = ctx.alloc();
    try {
        // Only continue to loop if the handler was not removed in the meantime.
        // See https://github.com/netty/netty/issues/5860
        outer: while (!ctx.isRemoved()) {
            if (out == null) {
                // As this is called for the handshake we have no real idea how big the buffer needs to be.
                // That said 2048 should give us enough room to include everything like ALPN / NPN data.
                // If this is not enough we will increase the buffer in wrap(...).
                out = allocateOutNetBuf(ctx, 2048, 1);
            }
            SSLEngineResult result = wrap(alloc, engine, Unpooled.EMPTY_BUFFER, out);

            if (result.bytesProduced() > 0) {
                ctx.write(out).addListener(new ChannelFutureListener() {
                    @Override
                    public void operationComplete(ChannelFuture future) {
                        Throwable cause = future.cause();
                        if (cause != null) {
                            setHandshakeFailureTransportFailure(ctx, cause);
                        }
                    }
                });
                if (inUnwrap) {
                    needsFlush = true;
                }
                out = null;
            }

            HandshakeStatus status = result.getHandshakeStatus();
            switch (status) {
                case FINISHED:
                    setHandshakeSuccess();
                    return false;
                case NEED_TASK:
                    if (!runDelegatedTasks(inUnwrap)) {
                        // We scheduled a task on the delegatingTaskExecutor, so stop processing as we will
                        // resume once the task completes.
                        break outer;
                    }
                    break;
                case NEED_UNWRAP:
                    if (inUnwrap) {
                        // If we asked for a wrap, the engine requested an unwrap, and we are in unwrap there is
                        // no use in trying to call wrap again because we have already attempted (or will after we
                        // return) to feed more data to the engine.
                        return false;
                    }

                    unwrapNonAppData(ctx);
                    break;
                case NEED_WRAP:
                    break;
                case NOT_HANDSHAKING:
                    setHandshakeSuccessIfStillHandshaking();
                    // Workaround for TLS False Start problem reported at:
                    // https://github.com/netty/netty/issues/1108#issuecomment-14266970
                    if (!inUnwrap) {
                        unwrapNonAppData(ctx);
                    }
                    return true;
                default:
                    throw new IllegalStateException("Unknown handshake status: " + result.getHandshakeStatus());
            }

            // Check if did not produce any bytes and if so break out of the loop, but only if we did not process
            // a task as last action. It's fine to not produce any data as part of executing a task.
            if (result.bytesProduced() == 0 && status != HandshakeStatus.NEED_TASK) {
                break;
            }

            // It should not consume empty buffers when it is not handshaking
            // Fix for Android, where it was encrypting empty buffers even when not handshaking
            if (result.bytesConsumed() == 0 && result.getHandshakeStatus() == HandshakeStatus.NOT_HANDSHAKING) {
                break;
            }
        }
    }  finally {
        if (out != null) {
            out.release();
        }
    }
    return false;
}

对编码结果进行判断，如果编码字节数大于 0，则将编码后的结果发送给服务端，然后释放临时变量 outer 判断 SSL 引擎的操作结果，SSL 引擎的操作结果定义如下：

FINISHED：SSLEngine 已经完成握手
NEED_TASK：SSLEngine 在继续进行握手前需要一个（或多个）代理任务的结果
NEED_UNWRAP：在继续进行握手前，SSLEngine 需要从远端接收数据，所以应带调用 SSLEngine.unwrap()
NEED_WRAP：在继续进行握手前，SSLEngine 必须向远端发送数据，所以应该调用SSLEngine.wrap()
NOT_HANDSHAKING：SSLEngine 当前没有进行握手

如果握手成功，则设置 handshakePromise 的操作结果为成功，同时发送 SsIHandshakeCompletionEvent.SUCCESS 给 SSL 监听器，代码如下：

private void setHandshakeSuccess() {
    handshakePromise.trySuccess(ctx.channel());
    
    ctx.fireUserEventTriggered(SslHandshakeCompletionEvent.SUCCESS);

    if (readDuringHandshake && !ctx.channel().config().isAutoRead()) {
        readDuringHandshake = false;
        ctx.read();
    }
}

如果是 NEED_TASK，说明异步执行 SSL_Task，完成后续可能耗时的操作或者任务，Netty 封装了一个任务立即执行线程池专门处理 SSL 的代理任务，代码如下：

private boolean runDelegatedTasks(boolean inUnwrap) {
    if (delegatedTaskExecutor == ImmediateExecutor.INSTANCE || inEventLoop(delegatedTaskExecutor)) {
        // We should run the task directly in the EventExecutor thread and not offload at all.
        runAllDelegatedTasks(engine);
        return true;
    } else {
        executeDelegatedTasks(inUnwrap);
        return false;
    }
}

如果是 NEED_UNWRAP，则判断是否由 UNWRAP 发起，如果不是则执行 UNWRAP 操作如果是 NOT_HANDSHAKING，则调用 unwrap，继续接收服务端的消息。

服务端

SSL 服务端接收客户端握手请求消息的入口方法是 decode 方法，下面对它进行详细分析。首先获取接收缓冲区的读写索引，并对读取的偏移量指针进行备份：

@Override
protected void decode(ChannelHandlerContext ctx, ByteBuf in, List<Object> out) throws SSLException {
    if (processTask) {
        return;
    }
    if (jdkCompatibilityMode) {
        decodeJdkCompatible(ctx, in);
    } else {
        decodeNonJdkCompatible(ctx, in);
    }
}

private void decodeJdkCompatible(ChannelHandlerContext ctx, ByteBuf in) throws NotSslRecordException {
    int packetLength = this.packetLength;
    // If we calculated the length of the current SSL record before, use that information.
    //对半包标识进行判断，如果上一个消息是半包消息，则判断当前可读的字节数是否小于整包消息的长度，
    //如果小于整包长度，则说明本次读取操作仍然没有把SSL整包消息读取完整，需要返回IO线程继续读取.
    if (packetLength > 0) {
        if (in.readableBytes() < packetLength) {
            return;
        }
    } else {
        // Get the packet length and wait until we get a packets worth of data to unwrap.
        final int readableBytes = in.readableBytes();
        if (readableBytes < SslUtils.SSL_RECORD_HEADER_LENGTH) {
            return;
        }
        packetLength = getEncryptedPacketLength(in, in.readerIndex());
        if (packetLength == SslUtils.NOT_ENCRYPTED) {
            // Not an SSL/TLS packet
            NotSslRecordException e = new NotSslRecordException(
                    "not an SSL/TLS record: " + ByteBufUtil.hexDump(in));
            in.skipBytes(in.readableBytes());

            // First fail the handshake promise as we may need to have access to the SSLEngine which may
            // be released because the user will remove the SslHandler in an exceptionCaught(...) implementation.
            setHandshakeFailure(ctx, e);

            throw e;
        }
        //对长度进行判断，如果SSL报文长度大于可读的字节数，说明是个半包消息，
        //返回IO线程继续读取后续的数据报
        assert packetLength > 0;
        if (packetLength > readableBytes) {
            // wait until the whole packet can be read
            this.packetLength = packetLength;
            return;
        }
    }

    // Reset the state of this class so we can get the length of the next packet. We assume the entire packet will
    // be consumed by the SSLEngine.
    this.packetLength = 0;
    try {
        //将SSL加密的消息解码为加密前的原始数据
        int bytesConsumed = unwrap(ctx, in, in.readerIndex(), packetLength);
        assert bytesConsumed == packetLength || engine.isInboundDone() :
                "we feed the SSLEngine a packets worth of data: " + packetLength + " but it only consumed: " +
                        bytesConsumed;
        in.skipBytes(bytesConsumed);
    } catch (Throwable cause) {
        handleUnwrapThrowable(ctx, cause);
    }
}

调用 SSLEngine 的 unwrap 方法对 SSL 原始消息进行解码，对解码结果进行判断，如果越界，说明 out 缓冲区不够，需要进行动态扩展。如果是首次越界，为了尽量节约内存，使用 SSL 最大缓冲区长度和 SSL 原始缓冲区可读的字节数中较小的。如果再次发生缓冲区越界，说明扩张后的缓冲区仍然不够用，直接使用 SSL 缓冲区的最大长度，保证下次解码成功。

解码成功之后，对 SSL 引擎的操作结果进行判断：如果需要继续接收数据，则继续执行解码操作；如果需要发送握手消息，则调用 wrapNonAppData 发送握手消息；如果需要异步执行 SSL 代理任务，则调用立即执行线程池执行代理任务；如果是握手成功，则设置 SSL 操作结果，发送 SSL 握手成功事件；如果是应用层的业务数据，则继续执行解码操作，其他操作结果，抛出操作类型异常。

需要指出的是，SSL 客户端和服务端接收对方 SSL 握手消息的代码是相同的，那为什么 SSL 服务端和客户端发送的握手消息不同呢？这些是 SSL 引擎负责区分和处理的，我们在创建 SSL 引擎的时候设置了客户端模式，SSL 引擎就是根据这个来进行区分的，代码如下：

engine.setUseClientMode(isClient());

Netty扩展的安全特性

利用 Netty 的 ChannelHandler 接口提供的网络切面，用户可以非常容易的扩展 Netty 的安全策略，如：IP 地址黑名单机制或者是用户接入认证。

Lanchester Blog

Netty 高安全之道