|
训练视频liushiqi 6K步12小时
python inference/train_mimictalk_on_a_video.py --video_id data/raw/examples/liushiqi_130s.mp4 --max_updates 2000 --work_dir checkpoints_mimictalk/liushiqi_130s
training lora...: 0%|▎ | 10/2001 [02:11<3:00:13, 5.43s/it]Iter 11: total_loss=0.3888101190328598 v2v_occlusion_reg_l1_loss=0.607401967048645, v2v_occlusion_2_reg_l1_loss=0.3398691415786743, v2v_occlusion_2_weights_entropy_loss=0.12148157507181168, density_weight_l2_loss=0.025346789509058, density_weight_entropy_loss=0.22934022545814514, mse_loss=0.06664450466632843, head_mse_loss=0.02819095179438591, lpips_loss=0.10811140388250351, head_lpips_loss=0.029408320784568787, lip_mse_loss=0.13061849772930145, lip_lpips_loss=0.08910778164863586, blink_reg_loss=0.024365829303860664, triplane_reg_loss=0.030589034780859947, secc_reg_loss=0.0005546677857637405,
...
testing lora...: 100%|███████████████████████████████████████████████████████████████| 250/250 [02:29<00:00, 1.67it/s]
Iter 2001: total_loss=0.14735968708992003 v2v_occlusion_reg_l1_loss=0.5926839709281921, v2v_occlusion_2_reg_l1_loss=0.33623775839805603, v2v_occlusion_2_weights_entropy_loss=0.11574846506118774, density_weight_l2_loss=0.04073842987418175, density_weight_entropy_loss=0.23001746833324432, mse_loss=0.034735675901174545, head_mse_loss=0.009863680228590965, lpips_loss=0.04859258607029915, head_lpips_loss=0.006719035562127829, lip_mse_loss=0.07170353829860687, lip_lpips_loss=0.033719874918460846, blink_reg_loss=0.2683372497558594, triplane_reg_loss=3.0655908584594727, secc_reg_loss=0.00040462621836923063,
training lora...: 100%|██████████████████████████████████████████████████████████| 2001/2001 [2:33:07<00:00, 4.59s/it]
testing lora...: 100%|███████████████████████████████████████████████████████████████| 250/250 [02:52<00:00, 1.45it/s]
继续训练liushiqi命令
python inference/train_mimictalk_on_a_video.py --torso_ckpt checkpoints_mimictalk/liushiqi_130s --video_id data/raw/examples/liushiqi_130s.mp4 --max_updates 6000 --work_dir checkpoints_mimictalk/liushiqi_130s
4K
training lora...: 66%|█████████████████████████████████████▏ | 3990/6001 [5:56:31<3:20:08, 5.97s/it]Iter 3991: total_loss=0.12766672372817994 v2v_occlusion_reg_l1_loss=0.5775173306465149, v2v_occlusion_2_reg_l1_loss=0.3366568684577942, v2v_occlusion_2_weights_entropy_loss=0.11709850281476974, density_weight_l2_loss=0.061210133135318756, density_weight_entropy_loss=0.2922610640525818, mse_loss=0.02765999548137188, head_mse_loss=0.009524929337203503, lpips_loss=0.033704791218042374, head_lpips_loss=0.004674407187849283, lip_mse_loss=0.05166402459144592, lip_lpips_loss=0.01862962730228901, blink_reg_loss=0.12633971869945526, triplane_reg_loss=4.3284454345703125, secc_reg_loss=0.0008574479725211859,
testing lora...: 100%|██████████████████████████████████████████████████████████████████████████████████| 250/250 [01:11<00:00, 3.51it/s]
Iter 4001: total_loss=0.13310704231262208 v2v_occlusion_reg_l1_loss=0.5834369659423828, v2v_occlusion_2_reg_l1_loss=0.3320615291595459, v2v_occlusion_2_weights_entropy_loss=0.11004718393087387, density_weight_l2_loss=0.06258229911327362, density_weight_entropy_loss=0.2984924912452698, mse_loss=0.03138240799307823, head_mse_loss=0.00828567799180746, lpips_loss=0.04263582453131676, head_lpips_loss=0.005273307207971811, lip_mse_loss=0.05257716402411461, lip_lpips_loss=0.023266607895493507, blink_reg_loss=0.13920198380947113, triplane_reg_loss=4.335314750671387, secc_reg_loss=0.0011855922639369965,
6k
testing lora...: 100%|██████████████████████████████████████████████████████████████████████████████████| 250/250 [02:56<00:00, 1.42it/s]
Iter 6001: total_loss=0.1162544161081314 v2v_occlusion_reg_l1_loss=0.5751290321350098, v2v_occlusion_2_reg_l1_loss=0.3346855640411377, v2v_occlusion_2_weights_entropy_loss=0.11792436242103577, density_weight_l2_loss=0.06910260021686554, density_weight_entropy_loss=0.30972737073898315, mse_loss=0.02550116553902626, head_mse_loss=0.007274949457496405, lpips_loss=0.030336204916238785, head_lpips_loss=0.002956756856292486, lip_mse_loss=0.049502819776535034, lip_lpips_loss=0.017222920432686806, blink_reg_loss=0.1471508890390396, triplane_reg_loss=5.438873291015625, secc_reg_loss=0.00046311301412060857,
training lora...: 100%|█████████████████████████████████████████████████████████| 6001/6001 [10:41:21<00:00, 6.41s/it]
testing lora...: 100%|██████████████████████████████████████████████████████████████████████████████████| 250/250 [01:54<00:00, 2.19it/s]
用25s视频重新训练10000次,花了13个小时
python inference/train_mimictalk_on_a_video.py --video_id data/raw/examples/liushiqi_25s_clear.mp4 --max_updates 10000 --work_dir checkpoints_mimictalk/liushiqi_25s
training lora...: 100%|███████████████████████████████████████████████████████▉| 9980/10001 [13:01:04<02:53, 8.28s/it]Iter 9981: total_loss=0.09684689939022065 v2v_occlusion_reg_l1_loss=0.5950009822845459, v2v_occlusion_2_reg_l1_loss=0.3200928568840027, v2v_occlusion_2_weights_entropy_loss=0.10128924995660782, density_weight_l2_loss=0.02812015265226364, density_weight_entropy_loss=0.18296167254447937, mse_loss=0.018398474901914597, head_mse_loss=0.006617757957428694, lpips_loss=0.014327870681881905, head_lpips_loss=0.0026736254803836346, lip_mse_loss=0.038001202046871185, lip_lpips_loss=0.008279431611299515, blink_reg_loss=0.1005098819732666, triplane_reg_loss=7.517608642578125, secc_reg_loss=0.0006871851510368288,
training lora...: 100%|███████████████████████████████████████████████████████▉| 9990/10001 [13:02:25<01:27, 7.96s/it]Iter 9991: total_loss=0.10520399659872055 v2v_occlusion_reg_l1_loss=0.5933628082275391, v2v_occlusion_2_reg_l1_loss=0.31992822885513306, v2v_occlusion_2_weights_entropy_loss=0.10183276236057281, density_weight_l2_loss=0.03247998654842377, density_weight_entropy_loss=0.18195389211177826, mse_loss=0.022826239466667175, head_mse_loss=0.005745640955865383, lpips_loss=0.023273512721061707, head_lpips_loss=0.0026981360279023647, lip_mse_loss=0.05218761786818504, lip_lpips_loss=0.016867591068148613, blink_reg_loss=0.09263632446527481, triplane_reg_loss=7.519522666931152, secc_reg_loss=0.00048013354535214603,
testing lora...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 250/250 [02:48<00:00, 1.49it/s]
Iter 10001: total_loss=0.10925301685929298 v2v_occlusion_reg_l1_loss=0.5926898717880249, v2v_occlusion_2_reg_l1_loss=0.32016319036483765, v2v_occlusion_2_weights_entropy_loss=0.10265837609767914, density_weight_l2_loss=0.028329573571681976, density_weight_entropy_loss=0.18770214915275574, mse_loss=0.02030119113624096, head_mse_loss=0.007713902276009321, lpips_loss=0.01553319115191698, head_lpips_loss=0.003065012628212571, lip_mse_loss=0.0584687814116478, lip_lpips_loss=0.02107074484229088, blink_reg_loss=0.0705994963645935, triplane_reg_loss=7.52163553237915, secc_reg_loss=0.0006910899537615478,
training lora...: 100%|███████████████████████████████████████████████████████| 10001/10001 [13:06:49<00:00, 4.72s/it]
testing lora...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 250/250 [02:43<00:00, 1.53it/s]
===================================
清除杂音重新训练
视频liushiqi130s 6K步12小时
python inference/train_mimictalk_on_a_video.py --video_id data/raw/examples/liushiqi_130s_clear.mp4 --max_updates 6000 --work_dir checkpoints_mimictalk/liushiqi_130s_clear
再训练4千步
python inference/train_mimictalk_on_a_video.py --video_id data/raw/examples/liushiqi_130s_clear.mp4 --max_updates 4000 --torso_ckpt checkpoints_mimictalk/liushiqi_130s_clear --work_dir checkpoints_mimictalk/liushiqi_130s_10k_clear
再训练2千步
python inference/train_mimictalk_on_a_video.py --video_id data/raw/examples/liushiqi_130s_clear.mp4 --max_updates 2000 --torso_ckpt checkpoints_mimictalk/liushiqi_130s_10k_clear --work_dir checkpoints_mimictalk/liushiqi_130s_12k_clear
Iter 2001: total_loss=0.12192660719156265 v2v_occlusion_reg_l1_loss=0.5856319665908813, v2v_occlusion_2_reg_l1_loss=0.33812224864959717, v2v_occlusion_2_weights_entropy_loss=0.1212739571928978, density_weight_l2_loss=0.03360917046666145, density_weight_entropy_loss=0.2701006233692169, mse_loss=0.028573138639330864, head_mse_loss=0.008359271101653576, lpips_loss=0.03497806936502457, head_lpips_loss=0.004728983622044325, lip_mse_loss=0.04588288813829422, lip_lpips_loss=0.014381850138306618, blink_reg_loss=0.19454456865787506, triplane_reg_loss=2.170748710632324, secc_reg_loss=0.00041069742292165756,
training lora...: 100%|██████████████████████████████████████████████████████████| 2001/2001 [4:27:32<00:00, 8.02s/it]
testing lora...: 100%|███████████████████████████████████████████████████████████████| 250/250 [01:58<00:00, 2.11it/s]
=======================================
生成视频liushiqi
要想使用训练的视频来生成视频要重新设置--torso_ckpt checkpoints_mimictalk/liushiqi_130s12k_clear
生成29s视频a
python inference/mimictalk_infer.py --drv_aud data/raw/examples/liushiqi_jinshuangshi_29s_a.wav --drv_pose data/raw/examples/liushiqi_15s_clear.mp4 --drv_style data/raw/examples/liushiqi_15s_clear.mp4 --bg_img data/raw/examples/bg.png --torso_ckpt checkpoints_mimictalk/liushiqi_130s12k_clear --out_name infer_out/tmp/liushiqi130s12k_jinshuangshi_29s_a.mp4 --out_mode final
生成29s视频b
python inference/mimictalk_infer.py --drv_aud data/raw/examples/liushiqi_jinshuangshi_29s_b.wav --drv_pose data/raw/examples/liushiqi_15s_clear.mp4 --drv_style data/raw/examples/liushiqi_15s_clear.mp4 --bg_img data/raw/examples/bg.png --torso_ckpt checkpoints_mimictalk/liushiqi_130s12k_clear --out_name infer_out/tmp/liushiqi130s12k_jinshuangshi_29s_b.mp4 --out_mode final
测试结果:发现有少量的画面扭曲、闪烁的现象,嘴型的准确度和画面的清晰度还尚待提高
=======================================
再训练10000步
python inference/train_mimictalk_on_a_video.py --video_id data/raw/examples/liushiqi_130s_clear.mp4 --max_updates 10000 --torso_ckpt checkpoints_mimictalk/liushiqi_130s12k_clear --work_dir checkpoints_mimictalk/liushiqi_130s22k_clear
生成测试视频:
python inference/mimictalk_infer.py --drv_aud data/raw/examples/liushiqi_jinshuangshi_29s_a.wav --drv_pose data/raw/examples/liushiqi_15s_clear.mp4 --drv_style data/raw/examples/liushiqi_15s_clear.mp4 --bg_img data/raw/examples/bg.png --torso_ckpt checkpoints_mimictalk/liushiqi_130s20k_clear --out_name infer_out/tmp/liushiqi130s20k_jinshuangshi_29s_a.mp4 --out_mode final
python inference/mimictalk_infer.py --drv_aud data/raw/examples/liushiqi_jinshuangshi_29s_b.wav --drv_pose data/raw/examples/liushiqi_15s_clear.mp4 --drv_style data/raw/examples/liushiqi_15s_clear.mp4 --bg_img data/raw/examples/bg.png --torso_ckpt checkpoints_mimictalk/liushiqi_130s20k_clear --out_name infer_out/tmp/liushiqi130s20k_jinshuangshi_29s_b.mp4 --out_mode final |
|