平面2D数字人王者DH_live

崎山小鹿 · 发表于 2024-11-5 12:16:39

每个人都能用的数字人
一、项目地址：https://github.com/kleinlee/DH_live

目前我们做数字人的通常做法：文字生成图片，或者自己拍摄图片。用图片生成包含动作的视频。用视频添加嘴型和声音生成口播视频。
可以分为下面几种：
第一种是在图片里根据语音“刻”表情和嘴型，头部可以做微小的动作，身体几乎不能动，代表软件是SadTalker
第二种是在视频里根据语音“刻”表情和嘴型，比第一种高级，人物保留视频中的动作同时又拥有了口播的嘴型，表现力比第一种强很多，你会发现他的动作一直在重复。代表软件是Easy_Wav2Lip
第三种是预先用视频训练一个初级的AI模型（模版），让模型根据音频生成人物动作视频，他不仅有动作、有表情和嘴型，更重要的是他的动作是AI生成的，比第二种更高级，代表软件是抖音的数字人、DH_live。

第一、二中数字人都是即时生成，自由度低效果差，第三种数字人需要训练模型，有一定的技术门槛，但是DH_live将这个门槛大大降低。

下载地址：
夸克：https://pan.quark.cn/s/8c30148c537b 提取码：Hcs9

使用教程：
第一步：数字人视频模版生成
运行：AI实时音频驱动数字人工具V1.0.exe 程序，选择一个视频文件，这里选择的系统自带的测试视频文件，点击“数字人视频模版生成”
微信截图_20241105171847.png

点击：“开始生成”

成功之后会在 video_data文件夹下会生成一个以视频文件名命名的文件夹，内有circle.mp4和keypoint_rotate.pkl文件。生成成功会提示关键点提取完成。

第二步：音频驱动数字人生成
完成第一步后视频模版目录就自动选择了最后一次生成的模版。
选择驱动音频，我们选择系统自带的audio1.wav文件
微信截图_20241105173533.png

点击“开始生成”

成功之后在results文件夹下多出一个视频文件，视频生成完成。

看效果

表现还是非常不错的，如果你想定制自己的人物形象那就要训练模型啦，使用下面这个项目

二、项目地址：https://github.com/v3ucn/DH_live_webui
支持训练和微调

主界面：

查看效果：

我们在训练模型的时候要注意：
训练支持多个视频文件同时训练，系统默认参数是4，就是说4个视频为一组，8个视频分2组，12个视频就是3组。单个视频长度越长GPU的运算量越大。

项目下载地址：
DH_Live低成本数字人微调训练(Fine-tune),AI数字人,AI主播,AI带货,唇形同步,唇形合成,音频驱动视频,预训练模型,11000步微调模型效果展示
DH_Live新版微调训练整合包/一键包:https://pan.quark.cn/s/e75123074599
DH_Live新版微调训练整合包/一键包 https://pan.baidu.com/s/1Emzt_5dwTNWDx44Lkn2Jvw?pwd=v3uc 提取码：v3uc
官方项目地址：https://github.com/kleinlee/DH_live
webui项目地址：https://github.com/v3ucn/DH_live_webui

通过百度网盘分享的文件：data_preparation.py
链接：https://pan.baidu.com/s/18gF7_AP4nfp23U-Ohyecmw?pwd=v3uc
提取码：v3uc

覆盖同名文件，解决检查点乱码问题
帧数.png

如果你想让你的数字人实时直播，可以用下面这个项目

三、实时数字人直播DH_live对接：
小雕数字人带60＋人物模型可自定义视频数字人免费使用
通过百度网盘分享的文件：小雕+数字人
链接：https://pan.baidu.com/s/1E_-cIgfamPOmjgP8_zOrRA?pwd=1g62
提取码：1g62
--来自百度网盘超级会员V4的分享

小雕+数字人夸克网盘链接
我用夸克网盘分享了「小雕+数字人」
链接：https://pan.quark.cn/s/9e58ed167c7e

参考：
https://www.youtube.com/watch?v=tuJoobsqxCg
https://www.bilibili.com/video/B ... id_from=333.999.0.0

https://www.bilibili.com/video/B ... 67809830e688efe473d

参考：
实时数字人直播DH_live对接：https://www.bilibili.com/video/B ... 67809830e688efe473d

崎山小鹿 · 发表于 2024-11-7 07:36:19

训练模型的指令：
.\py311\python.exe train/train_render_model.py --train_data ./train/data --coarse2fine --coarse_model_path './checkpoint/epoch_120.pth' --non_decay 20000 --decay 1000

刚开始训练的时候：
learning rate = 0.0001000
learning rate = 0.0001000
===> Epoch[101](0/2): Loss_DI: 0.2479 Loss_GI: 0.2459 Loss_perception: 3.5692 lr_g = 0.0001000 lr_d = 0.0001000
===> Epoch[101](1/2): Loss_DI: 0.2495 Loss_GI: 0.2568 Loss_perception: 3.2296 lr_g = 0.0001000 lr_d = 0.0001000

3千次训练之后
learning rate = 0.0001000
learning rate = 0.0001000
===> Epoch[3546](0/2): Loss_DI: 0.2461 Loss_GI: 0.2234 Loss_perception: 1.9764 lr_g = 0.0001000 lr_d = 0.0001000
===> Epoch[3546](1/2): Loss_DI: 0.2534 Loss_GI: 0.3185 Loss_perception: 2.0884 lr_g = 0.0001000 lr_d = 0.0001000

1万次训练之后：
learning rate = 0.0001000
learning rate = 0.0001000
===> Epoch[14475](0/2): Loss_DI: 0.2343 Loss_GI: 0.3087 Loss_perception: 1.7719 lr_g = 0.0001000 lr_d = 0.0001000
===> Epoch[14475](1/2): Loss_DI: 0.2311 Loss_GI: 0.3050 Loss_perception: 1.8111 lr_g = 0.0001000 lr_d = 0.0001000

2万次训练之后：
learning rate = 0.0000002
learning rate = 0.0000002
===> Epoch[21000](0/2): Loss_DI: 0.2379 Loss_GI: 0.2679 Loss_perception: 1.6209 lr_g = 0.0000002 lr_d = 0.0000002
===> Epoch[21000](1/2): Loss_DI: 0.2400 Loss_GI: 0.2398 Loss_perception: 1.5501 lr_g = 0.0000002 lr_d = 0.0000002
在讲话的时候嘴唇还是有很多黏糊的东西，嘴巴张不开。

14万次训练之后
learning rate = 0.0001000
===> Epoch[140000](0/1): Loss_DI: 0.2458 Loss_GI: 0.2592 Loss_perception: 1.4434 lr_g = 0.0001000 lr_d = 0.0001000

render.pth 是官方的泛化模型
epoch_120.pth 是预训练模型

如何继续训练模型
train\train_render_model.py 加上以下：
if name == "main":
opt.resume = True
opt.resume_path = "需要继续训练的模型"

并且把 # opt.start_epoch = checkpoint['epoch'] 注释去掉，就可以了。

继续训练的时候就可以重新设置最大训练次数，点击开始训练

崎山小鹿 · 发表于 2024-11-8 08:44:25

不错的资料：
数字人实战第六天——DH_live 训练自己的数字人 https://blog.csdn.net/qq_34717531/article/details/142522502
DH_live数字人实时驱动方案 https://blog.csdn.net/qq_34717531/article/details/141065146

崎山小鹿 · 发表于 2024-11-8 12:26:33

yumo
3万次训练之后
learning rate = 0.0001000
learning rate = 0.0001000
===> Epoch[30670](0/2): Loss_DI: 0.2398 Loss_GI: 0.2761 Loss_perception: 1.4671 lr_g = 0.0001000 lr_d = 0.0001000
===> Epoch[30670](1/2): Loss_DI: 0.2398 Loss_GI: 0.2793 Loss_perception: 1.4673 lr_g = 0.0001000 lr_d = 0.0001000

4万次训练之后
learning rate = 0.0001000
learning rate = 0.0001000
===> Epoch[40435](0/2): Loss_DI: 0.2359 Loss_GI: 0.2379 Loss_perception: 1.4120 lr_g = 0.0001000 lr_d = 0.0001000
===> Epoch[40435](1/2): Loss_DI: 0.2408 Loss_GI: 0.2385 Loss_perception: 1.3711 lr_g = 0.0001000 lr_d = 0.0001000

5万次训练之后
learning rate = 0.0001000
learning rate = 0.0001000
===> Epoch[50543](0/2): Loss_DI: 0.2331 Loss_GI: 0.2937 Loss_perception: 1.3629 lr_g = 0.0001000 lr_d = 0.0001000
===> Epoch[50543](1/2): Loss_DI: 0.2304 Loss_GI: 0.2465 Loss_perception: 1.3523 lr_g = 0.0001000 lr_d = 0.0001000

6万次训练之后
learning rate = 0.0000273
learning rate = 0.0000273
===> Epoch[60729](0/2): Loss_DI: 0.2304 Loss_GI: 0.2994 Loss_perception: 1.2715 lr_g = 0.0000273 lr_d = 0.0000273
===> Epoch[60729](1/2): Loss_DI: 0.2276 Loss_GI: 0.2997 Loss_perception: 1.3264 lr_g = 0.0000273 lr_d = 0.0000273

更换数据源视频继续训练
learning rate = 0.0001000
learning rate = 0.0001000
===> Epoch[69027](0/2): Loss_DI: 0.2441 Loss_GI: 0.2857 Loss_perception: 2.1383 lr_g = 0.0001000 lr_d = 0.0001000
===> Epoch[69027](1/2): Loss_DI: 0.2290 Loss_GI: 0.2503 Loss_perception: 2.2832 lr_g = 0.0001000 lr_d = 0.0001000

8万次训练之后
learning rate = 0.0001000
learning rate = 0.0001000
===> Epoch[80218](0/2): Loss_DI: 0.2479 Loss_GI: 0.2459 Loss_perception: 1.6426 lr_g = 0.0001000 lr_d = 0.0001000
===> Epoch[80218](1/2): Loss_DI: 0.2491 Loss_GI: 0.2956 Loss_perception: 1.7142 lr_g = 0.0001000 lr_d = 0.0001000

11万次训练之后
learning rate = 0.0001000
learning rate = 0.0001000
===> Epoch[110504](0/2): Loss_DI: 0.2325 Loss_GI: 0.3246 Loss_perception: 1.5816 lr_g = 0.0001000 lr_d = 0.0001000
===> Epoch[110504](1/2): Loss_DI: 0.2320 Loss_GI: 0.2790 Loss_perception: 1.5337 lr_g = 0.0001000 lr_d = 0.0001000

13万次训练之后
learning rate = 0.0001000
learning rate = 0.0001000
===> Epoch[130389](0/2): Loss_DI: 0.2270 Loss_GI: 0.2706 Loss_perception: 1.5052 lr_g = 0.0001000 lr_d = 0.0001000
===> Epoch[130389](1/2): Loss_DI: 0.2316 Loss_GI: 0.2988 Loss_perception: 1.5066 lr_g = 0.0001000 lr_d = 0.0001000

崎山小鹿 · 发表于 2024-11-9 02:00:54

G:\BaiduNetdiskDownload\DH_live-Fine-tune\DH_live\py311\Lib\site-packages\torchvision\models\_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
G:\BaiduNetdiskDownload\DH_live-Fine-tune\DH_live\py311\Lib\site-packages\torchvision\models\_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG19_Weights.IMAGENET1K_V1`. You can also use `weights=VGG19_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
loading checkpoint checkpoint/Dinet_five_ref/epoch_61000.pth
G:\BaiduNetdiskDownload\DH_live-Fine-tune\DH_live\train\train_render_model_20k.py:75: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytor ... md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(opt.resume_path)

崎山小鹿 · 发表于 2024-11-9 11:59:31

实时数字人访问：
File "G:\BaiduNetdiskDownload\DH_live-Fine-tune\DH_live\py311\Lib\site-packages\aiohttp\client.py", line 951, in _ws_connect
raise WSServerHandshakeError(
aiohttp.client_exceptions.WSServerHandshakeError: 403, message='Invalid response status', url='wss://speech.platform.bing.com/consumer/speech/synthesize/readaloud/edge/v1?TrustedClientToken=6A5AA1D4EAFF4E9FB37E23D68491D6F4&ConnectionId=203e54b0526747aba6eacbbbb209e56b'

崎山小鹿 · 发表于 2024-11-11 11:32:32

tensorboard的使用

安装
pip install tensorboard

如何查看是否安装TensorBoard
在python Console中输入以下代码
from torch.utils.tensorboard import SummaryWriter
如不提示错误，则安装成功.

路径是log文件的上一个目录
启动tensorboard，在本地127.0.0.1:6006 或者 localhost:6006进行访问

格式：tensorboard --logdir=xxx --port=6006
tensorboard --logdir=checkpoint/DiNet_five_ref/log/train --port=6006

TensorBoard 已经成功上线。我们可以用浏览器打开http://localhost:6006/查看。

崎山小鹿 · 发表于 2024-11-13 19:58:45

dh_live
数据预处理指令：
python data_preparation_face.py split_video_25fps

训练(默认训练10000次后提取并保存模型)
python train_render_model.py --train_data split_video_25fps

训练6万次，每训练1000次保存预训练模型
python train_render_model.py --train_data split_video_25fps --non_decay 60000 --decay 1000