使用awk处理带标签文本
假设有"123<em>abc</em>456<em>def</em>789<em>ghi</em>"这么一个字符串
题中的字符串,要截取长度5,则返回的字符串应该为:123ab,要截取长度8,应返回123<em>abc</em>45。
附:
1 <em>和</em>标记不得计算在长度之内。
2 截取后的字符串,要保留原有<em>标签,不过如果最后有一个标签没有闭合,则去掉其开始标签。
代码:如下
#!/bin/bash
a='123<em>abc</em>456<em>def</em>789<em>ghi</em>'
echo "test text is '$a'"
read -p "please input a number little than `echo $a|awk '{gsub("<em>|</em>","",$0);printf length($0)}'`: " num
echo $a | awk -vnum="$num" -vFS='</em>' '{for(i=1;i<=NF;i++){orisum=sum;sum+=length($i)-4;if(sum<=num){strlist=strlist ~ /./ ? strlist""FS""$i : $i}else{sub("<em>","",$i);strlist=strlist ~ /./ ? strlist""FS""substr($i,1,num-orisum) : substr($i,1,num-orisum);break}}print strlist}'
测试结果:
[root@localhost ~]# bash 1.sh
test text is '123<em>abc</em>456<em>def</em>789<em>ghi</em>'
please input a number little than 18: 2
12
[root@localhost ~]# bash 1.sh
test text is '123<em>abc</em>456<em>def</em>789<em>ghi</em>'
please input a number little than 18: 5
123ab
[root@localhost ~]# bash 1.sh
test text is '123<em>abc</em>456<em>def</em>789<em>ghi</em>'
please input a number little than 18: 7
123<em>abc</em>4
[root@localhost ~]# bash 1.sh
test text is '123<em>abc</em>456<em>def</em>789<em>ghi</em>'
please input a number little than 18: 10
123<em>abc</em>456d
[root@localhost ~]# bash 1.sh
test text is '123<em>abc</em>456<em>def</em>789<em>ghi</em>'
please input a number little than 18: 9
123<em>abc</em>456
[root@localhost ~]# bash 1.sh
test text is '123<em>abc</em>456<em>def</em>789<em>ghi</em>'
please input a number little than 18: 14
123<em>abc</em>456<em>def</em>78
[root@localhost ~]# bash 1.sh
test text is '123<em>abc</em>456<em>def</em>789<em>ghi</em>'
please input a number little than 18: 17
123<em>abc</em>456<em>def</em>789gh
[root@localhost ~]# bash 1.sh
test text is '123<em>abc</em>456<em>def</em>789<em>ghi</em>'
please input a number little than 18: 18
123<em>abc</em>456<em>def</em>789<em>ghi</em>
[root@localhost ~]#