crf++ 是一个开源工具,是自然语言处理的一把利器。它可以编译为 crf_learn 和 crf_test 两个 bin 文件。测试的结果是将预测的结果和真实的结果进行比对。可惜这东西比对完也不给个统计,真教人不爽。 幸亏,有人做了个 perl 工具 conlleval 可以用来分析测试结果。


FIX: 若官方链接挂了, 可参考此处:

将下载下来的 txt 文档,改名为 或者任何你喜欢的。然后 chmod+x 它,使其可以被执行.

使用的时候将crf_test的文件重定位到文件output.txt中 < output.txt



conlleval: unexpected number of features in line XXX    XX  XX

何哉? 查看源代码:

if (@features < 2) { 
    die "conlleval: unexpected number of features in line $line\n"; 




fix 如下:

conlleval -d "\t" < output.txt



processed 19172 tokens with 9715 phrases; found: 9308 phrases; correct: 7755.
accuracy:  88.95%; precision:  83.32%; recall:  79.83%; FB1:  81.53
             ADJP: precision:  58.11%; recall:  25.00%; FB1:  34.96  74
             ADVP: precision:  65.44%; recall:  53.29%; FB1:  58.75  272
            CONJP: precision:   0.00%; recall:   0.00%; FB1:   0.00  0
             INTJ: precision:   0.00%; recall:   0.00%; FB1:   0.00  0
               NP: precision:  81.50%; recall:  79.54%; FB1:  80.51  4957
               PP: precision:  88.96%; recall:  95.27%; FB1:  92.01  2129
              PRT: precision:   0.00%; recall:   0.00%; FB1:   0.00  0
             SBAR: precision:  84.62%; recall:  34.38%; FB1:  48.89  78
               VP: precision:  85.32%; recall:  80.61%; FB1:  82.90  1798


Ideals are like the stars: we never reach them, but like the mariners of the sea, we chart our course by them.


李二 · August 5, 2015 at 15:19

Google Chrome 42.0.2311.152 Google Chrome 42.0.2311.152 Windows 8 x64 Edition Windows 8 x64 Edition

额 有点尴尬 我没太弄懂 直接copy过来用了 都不知道怎么改


conlleval.txt -d "\t" -r -o 1 NOEXIST < your_file.txt
conlleval.txt -d "\t" -r -o 1 < your_file.txt
conlleval.txt -d "\t" -o 1 NOEXIST < your_file.txt
conlleval.txt -d "\t" -o 1 < your_file.txt


    yu · August 5, 2015 at 15:22

    Google Chrome 44.0.2403.125 Google Chrome 44.0.2403.125 Mac OS X  10.10.4 Mac OS X 10.10.4


    conlleval.txt -d "\t"  -r < your_file.txt

    -d ‘\t’ 表示用制表符作为间隔
    -r 表示强行以最后两个的内容加入统计
    -o “1” 表示不计算标记为”1″的

    conlleval.txt -d "\t"  -r < your_file.txt
    conlleval.txt -d "\t"  -r -o "1"< your_file.txt


      李二 · August 5, 2015 at 15:31

      Google Chrome 42.0.2311.152 Google Chrome 42.0.2311.152 Windows 8 x64 Edition Windows 8 x64 Edition

      conlleval.txt -d “\t” -r < your_file.txt就是第一次的结果

      conlleval.txt -d "\t" -r -o "1"< your_file.txt要比上面俩少t统计 标记为"1"的
      处理的总数也从processed 197 tokens with 197 phrases; 变成processed 197 tokens with 177 phrases;


        yu · August 5, 2015 at 15:33

        Google Chrome 44.0.2403.125 Google Chrome 44.0.2403.125 Mac OS X  10.10.4 Mac OS X 10.10.4

        @李二 客气.其实也就是转述下别人的介绍.总有一天我也希望能写点对别人有用的工具.

          李二 · August 5, 2015 at 15:37

          Google Chrome 42.0.2311.152 Google Chrome 42.0.2311.152 Windows 8 x64 Edition Windows 8 x64 Edition

          嗯嗯是的 开着有道强行把官方看一下

          哈肃然起敬 先收藏之

    yu · August 5, 2015 at 15:32

    Google Chrome 44.0.2403.125 Google Chrome 44.0.2403.125 Mac OS X  10.10.4 Mac OS X 10.10.4

    官方介绍道, -o token: … This option only works when -r is used as well. 只有当你使用了-r,才能用-o xxx 这样的参数.若只用-o xxx是不可以的..

李二 · August 5, 2015 at 10:06

Google Chrome 42.0.2311.152 Google Chrome 42.0.2311.152 Windows 8 x64 Edition Windows 8 x64 Edition

您好,请教您一问题啊,为什么我用这个perl文件对自己的数据评测的时候只能得出一个整体的accuracy.后面的precision recall FBI全为0

    yu · August 5, 2015 at 11:08

    Google Chrome 44.0.2403.125 Google Chrome 44.0.2403.125 Mac OS X  10.10.4 Mac OS X 10.10.4

    @李二 可以贴个sample到gist上么,我看看?

      李二 · August 5, 2015 at 14:29

      Google Chrome 42.0.2311.152 Google Chrome 42.0.2311.152 Windows 8 x64 Edition Windows 8 x64 Edition

      哇这么快 吓一跳


      从	p	1	1
      起点	n	7	7
      向	p	1	1
      正北	nd	2_B	2_B
      方向	n	2_E	2_E
      出发	v	3_1	3_1
      沿	p	1	1
      正义路	ns	7	7
      行驶	v	3_2	3_2
      390	m	6	6
      米	q	8	8


      processed 197 tokens with 0 phrases; found: 0 phrases; correct: 0.
      accuracy:  87.82%; precision:   0.00%; recall:   0.00%; FB1:   0.00


        yu · August 5, 2015 at 14:54

        Google Chrome 44.0.2403.125 Google Chrome 44.0.2403.125 Mac OS X  10.10.4 Mac OS X 10.10.4


        -r: Assume raw output tokens, that is without the prefixes B- and I-. In this case each word will be counted as one chunk.
        -o token: Use token as output tag for items that are outside of chunks or other classes. This option only works when -r is used as well. The default value for the outside output tag is O

        我把你那个内容加上参数-d ‘\t’ -r -o NOEXIST ,效果如下:

        $ ./conlleval.txt -d "\t" -r -o NOEXIST < your_file.txt
        processed 11 tokens with 11 phrases; found: 11 phrases; correct: 11.
        accuracy: 100.00%; precision: 100.00%; recall: 100.00%; FB1: 100.00
                        1: precision: 100.00%; recall: 100.00%; FB1: 100.00  3
                      2_B: precision: 100.00%; recall: 100.00%; FB1: 100.00  1
                      2_E: precision: 100.00%; recall: 100.00%; FB1: 100.00  1
                      3_1: precision: 100.00%; recall: 100.00%; FB1: 100.00  1
                      3_2: precision: 100.00%; recall: 100.00%; FB1: 100.00  1
                        6: precision: 100.00%; recall: 100.00%; FB1: 100.00  1
                        7: precision: 100.00%; recall: 100.00%; FB1: 100.00  2
                        8: precision: 100.00%; recall: 100.00%; FB1: 100.00  1

          李二 · August 5, 2015 at 15:02

          Google Chrome 42.0.2311.152 Google Chrome 42.0.2311.152 Windows 8 x64 Edition Windows 8 x64 Edition


          这就试试去,哎 对英语望而生畏,总是这样

        yu · August 5, 2015 at 14:58

        Google Chrome 44.0.2403.125 Google Chrome 44.0.2403.125 Mac OS X  10.10.4 Mac OS X 10.10.4

        @李二 它的意思是强行拿你每行最后两个数字作为token.


          李二 · August 5, 2015 at 15:06

          Google Chrome 42.0.2311.152 Google Chrome 42.0.2311.152 Windows 8 x64 Edition Windows 8 x64 Edition


          嗯嗯可以了 多谢多谢

          李二 · August 5, 2015 at 15:21

          Google Chrome 42.0.2311.152 Google Chrome 42.0.2311.152 Windows 8 x64 Edition Windows 8 x64 Edition


          processed 197 tokens with 197 phrases; found: 197 phrases; correct: 173.
          accuracy:  87.82%; precision:  87.82%; recall:  87.82%; FB1:  87.82
                           : precision: 100.00%; recall: 100.00%; FB1: 100.00  4
                           : precision: 100.00%; recall: 100.00%; FB1: 100.00  4
                          1: precision: 100.00%; recall:  95.00%; FB1:  97.44  19
                          2: precision:  90.00%; recall: 100.00%; FB1:  94.74  10
                        2_B: precision:  66.67%; recall:  40.00%; FB1:  50.00  3
                        2_E: precision: 100.00%; recall:  60.00%; FB1:  75.00  3
                        2_I: precision:   0.00%; recall:   0.00%; FB1:   0.00  0
                        3_1: precision: 100.00%; recall:  50.00%; FB1:  66.67  1
                        3_2: precision:  92.86%; recall: 100.00%; FB1:  96.30  14
                        3_3: precision:   0.00%; recall:   0.00%; FB1:   0.00  0
                        3_4: precision:  91.67%; recall:  91.67%; FB1:  91.67  12
                        3_5: precision:   0.00%; recall:   0.00%; FB1:   0.00  0
                          4: precision:  77.78%; recall:  95.45%; FB1:  85.71  27
                          5: precision: 100.00%; recall:  77.78%; FB1:  87.50  7
                          6: precision: 100.00%; recall: 100.00%; FB1: 100.00  9
                          7: precision:  85.71%; recall:  60.00%; FB1:  70.59  7
                        7_B: precision:  80.00%; recall:  95.24%; FB1:  86.96  25
                        7_E: precision:  83.33%; recall:  95.24%; FB1:  88.89  24
                        7_I: precision:  78.95%; recall:  93.75%; FB1:  85.71  19
                          8: precision: 100.00%; recall: 100.00%; FB1: 100.00  9
                          9: precision: 100.00%; recall: 100.00%; FB1: 100.00  4

        yu · August 5, 2015 at 15:08

        Google Chrome 44.0.2403.125 Google Chrome 44.0.2403.125 Mac OS X  10.10.4 Mac OS X 10.10.4

        按照解释 -o xxx是说某个把token排除出计算. 你可以试试参数 -o 1 , 比较下效果…刚才直接复制文档参数没仔细看

县长 · February 23, 2014 at 19:00

Firefox 27.0 Firefox 27.0 Windows 7 x64 Edition Windows 7 x64 Edition

Use of uninitialized value $firstItem in string eq at line 130, line 195.
Use of uninitialized value $firstItem in string ne at line 162, line 195.
最后$precision和$recall 自然都为0.请问这是为什么呢?

    yu · February 23, 2014 at 21:16

    Google Chrome 32.0.1700.107 Google Chrome 32.0.1700.107 GNU/Linux x64 GNU/Linux x64

    @县长 没有遇到过相同的错误。



    Rockwell NNP B-NP I-NP
    said VBD B-VP B-VP
    the DT B-NP B-NP
    agreement NN I-NP I-NP

    如果间隔符是制表符的话,可以使用-d ‘\t’ 参数来设置 — 见文中。


    然后我检查了一下那个错误(line 130 和 lin 162),所以自己构造了一个数据如下:

    [yu@argcv demo]$ cat > a.c <<'EOF'
    #include <stdio.h>
    int main()
        printf("Rockwell NNP\tB-NP\tB-NP");
        return 0;
    [yu@argcv demo]$ gcc a.c 
    [yu@argcv demo]$ ./a.out > b.txt
    [yu@argcv demo]$ cat b.txt 
    Rockwell NNP	B-NP	B-NP
    [yu@argcv demo]$ conlleval < b.txt 
    Use of uninitialized value $firstItem in string eq at /home/yu/Script/perl/conlleval line 130, <STDIN> line 1.
    Use of uninitialized value $firstItem in string ne at /home/yu/Script/perl/conlleval line 162, <STDIN> line 1.
    processed 1 tokens with 0 phrases; found: 1 phrases; correct: 0.
    accuracy:   0.00%; precision:   0.00%; recall:   0.00%; FB1:   0.00
              NP	B-NP: precision:   0.00%; recall:   0.00%; FB1:   0.00  1






      县长 · February 23, 2014 at 21:30

      Firefox 27.0 Firefox 27.0 Windows 7 x64 Edition Windows 7 x64 Edition

      @Yu Jing
      楼主真是好人 这么快就回复 而且还这么详细 非常感谢!好人一生平安!大吉大利!

Leniy · May 11, 2013 at 16:42

Google Chrome 26.0.1410.64 Google Chrome 26.0.1410.64 Windows 7 Windows 7


    yu · May 12, 2013 at 07:50

    Google Chrome 26.0.1410.63 Google Chrome 26.0.1410.63 GNU/Linux x64 GNU/Linux x64


    yu · May 12, 2013 at 08:01

    Google Chrome 26.0.1410.63 Google Chrome 26.0.1410.63 GNU/Linux x64 GNU/Linux x64


Leniy · May 9, 2013 at 11:28

Google Chrome 26.0.1410.64 Google Chrome 26.0.1410.64 Windows 7 Windows 7


    yu · May 9, 2013 at 13:04

    Google Chrome 26.0.1410.63 Google Chrome 26.0.1410.63 GNU/Linux x64 GNU/Linux x64


      Leniy · May 9, 2013 at 13:05

      Google Chrome 26.0.1410.64 Google Chrome 26.0.1410.64 Windows 7 Windows 7


        yu · May 9, 2013 at 13:26

        Google Chrome 26.0.1410.63 Google Chrome 26.0.1410.63 GNU/Linux x64 GNU/Linux x64



          Leniy · May 9, 2013 at 13:28

          Google Chrome 26.0.1410.64 Google Chrome 26.0.1410.64 Windows 7 Windows 7


            yu · May 10, 2013 at 15:12

            Google Chrome 26.0.1410.63 Google Chrome 26.0.1410.63 GNU/Linux x64 GNU/Linux x64



              Leniy · May 10, 2013 at 15:38

              Google Chrome 26.0.1410.64 Google Chrome 26.0.1410.64 Windows 7 Windows 7


                yu · May 10, 2013 at 18:13

                Google Chrome 26.0.1410.63 Google Chrome 26.0.1410.63 GNU/Linux x64 GNU/Linux x64

                Internal Server Error
                The server encountered an internal error or misconfiguration and was unable to complete your request.
                Please contact the server administrator, and inform them of the time the error occurred, and anything you might have done that may have caused the error.
                More information about this error may be available in the server error log.
                Additionally, a 500 Internal Server Error error was encountered while trying to use an ErrorDocument to handle the request.



                  Leniy · May 11, 2013 at 07:52

                  Google Chrome 26.0.1410.64 Google Chrome 26.0.1410.64 Windows 7 Windows 7


                    yu · May 11, 2013 at 08:01

                    Google Chrome 26.0.1410.64 Google Chrome 26.0.1410.64 Windows 7 x64 Edition Windows 7 x64 Edition


                    Leniy · May 11, 2013 at 08:08

                    Google Chrome 26.0.1410.64 Google Chrome 26.0.1410.64 Windows 7 Windows 7


                    yu · May 11, 2013 at 08:17

                    Google Chrome 26.0.1410.64 Google Chrome 26.0.1410.64 Windows 7 x64 Edition Windows 7 x64 Edition

                    报错是500 而不是404什么的,显然不是和我那样的装上多玩移走wp-comment.php之类的原因。

                    500一般是编码问题 — 最近修代码修多了吧可能是


                    Leniy · May 11, 2013 at 08:44

                    Google Chrome 26.0.1410.64 Google Chrome 26.0.1410.64 Windows 7 Windows 7


                  Leniy · May 12, 2013 at 08:13

                  Google Chrome 28.0.1485.0 Google Chrome 28.0.1485.0 Windows 7 Windows 7


Leniy · May 9, 2013 at 08:38

Google Chrome 26.0.1410.64 Google Chrome 26.0.1410.64 Windows 7 Windows 7


Leave a Reply

Your email address will not be published. Required fields are marked *