Volume 39 Issue 1
Feb.  2021
Turn off MathJax
Article Contents
YU Weihong, FU Piaoyun, REN Yue, WANG Qingwu. Text Mining for Causes of Ship Accidents Based on PMI and BTM[J]. Journal of Transport Information and Safety, 2021, 39(1): 35-44. doi: 10.3963/j.jssn.1674-4861.2021.01.0005
Citation: YU Weihong, FU Piaoyun, REN Yue, WANG Qingwu. Text Mining for Causes of Ship Accidents Based on PMI and BTM[J]. Journal of Transport Information and Safety, 2021, 39(1): 35-44. doi: 10.3963/j.jssn.1674-4861.2021.01.0005

Text Mining for Causes of Ship Accidents Based on PMI and BTM

doi: 10.3963/j.jssn.1674-4861.2021.01.0005
  • Received Date: 2020-10-23
  • Publish Date: 2021-02-28
  • The paper proposes a method of semantic mining for ship accident investigation reports from words and topics to automatically extract knowledge of water traffic safety from massive ship accident investigation reports. Moreover, 100 investigation reports on the self-sinking accidents of ships are used as corpus for specific excavations. At the word level, the PMI algorithm is used to mine frequent co-occurrence word patterns from the texts describing the causes of the accidents, and relationships between accident-causing factors are revealed through the co-occurrence of text feature words. At the topic level, the BTM algorithm is used to model the texts describing the causes of the accidents, and the modeling results are evaluated by topic log-likelihood and coherence. The feature words representing the causes of foundering accidents are clustered through topic modeling, and the occurrence probability of each cause is preliminarily quantified according to the distribution of topics in the corpus. According to the results on the predictive ability of the topic model using 500 new data sets, the topic model can recognize 100% of the domain-independent words and automatically ignore them. For 85.6% of the words in the corpus, the topic model can attribute them to a certain topic representing a specific cause. For about 14.4% of the words, the topic boundary is not obvious, so it is not easy to attribute them with a high probability.

     

  • loading
  • [1]
    姚厚杰. 自主货物运输船舶航行风险辨识与事故致因分析研究[D]. 武汉: 武汉理工大学, 2019.

    YAO Houjie. Study on navigation risk identification and accident causation analysis of autonomous cargo ships[D]. Wuhan: Wuhan University of Technology, 2019. (in Chinese)
    [2]
    LEE Jeongseok, LEE Bokyeong, CHO Lksoon. Text mining analysis technique on ecdis accident report[J]. Journal of the Korean Society of Marine Environment and Safety, 2019, 25 (4). http://www.researchgate.net/publication/334389437_Text_Mining_Analysis_Technique_on_ECDIS_Accident_Report/download
    [3]
    吴伋, 江福才, 姚厚杰, 等. 基于文本挖掘的内河船舶碰撞事故致因因素分析与风险预测[J]. 交通信息与安全, 2018, 36 (3): 8-18. doi: 10.3963/j.issn.1674-4861.2018.03.002

    WU Ji, JIANG Fucai, YAO Houjie, et al. An analysis and risk forecasting of inland ship collision based on text mining[J]. Journal of Transport Information and Safety, 2018, 36 (3): 8-18. (in Chinese) doi: 10.3963/j.issn.1674-4861.2018.03.002
    [4]
    余晨, 毛喆, 高嵩. 基于规则的海事自由文本信息抽取方法研究[J]. 交通信息与安全, 2017, 35 (2): 40-47. https://www.cnki.com.cn/Article/CJFDTOTAL-JTJS201702007.htm

    YU Chen, MAO Zhe, GAO Song. An approach of extracting information for maritime unstructured text based on rules[J]. Journal of Transport Information and Safety, 2017, 35(2): 40-47. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JTJS201702007.htm
    [5]
    BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003(3): 993-1022.
    [6]
    中华人民共和国交通部. 水上安全监督常用术语: GB/T 19945-2005[S]. 北京: 中国标准出版社, 2011.

    Ministry of Transport, People's Republic of China. Marine supervision terminology in common use: GB/T 19945-2005[S]. Beijing: Standards Press of China, 2011. (in Chinese)
    [7]
    陈鑫, 王素格, 廖健. 基于词语相关度的微博新情感词自动识别[J]. 计算机应用, 2016, 36 (2): 424-427. doi: 10.3969/j.issn.1001-3695.2016.02.024

    CHEN Xin, WANG Suge, LIAO Jian. Automatic identification of new sentiment word about microblog based on word association[J]. Journal of Computer Applications, 2016, 36(2): 424-427. (in Chinese) doi: 10.3969/j.issn.1001-3695.2016.02.024
    [8]
    OLIVEIRA N, CORTEZ P, AREAL N. Stock market sentiment lexicon acquisition using microblogging data and statistical measures[J]. Decision Support Systems, 2016 (85): 62-73. http://dl.acm.org/citation.cfm?id=2928086
    [9]
    聂卉, 首欢容. 基于修正点互信息的特征级情感词极性自动研判[J]. 图书情报工作, 2020, 64 (5): 114-123. https://www.cnki.com.cn/Article/CJFDTOTAL-TSQB202005017.htm

    NIE Hui, SHOU Huanrong. Feature-opinion polarity identification based on the modified PMI algorithm[J]. Library and Information Service, 2020, 64 (5): 114-123. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-TSQB202005017.htm
    [10]
    赵传君, 王素格, 李德玉. 跨领域文本情感分类研究进展[J]. 软件学报, 2020, 31 (6): 1723-1746. https://www.cnki.com.cn/Article/CJFDTOTAL-RJXB202006010.htm

    ZHAO Chuanjun, WANG Suge, LI Deyu. Research progress on cross-domain text sentiment classification[J]. Journal of Software, 2020, 31 (6): 1723-1746. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-RJXB202006010.htm
    [11]
    YAN Xiaohui, GUO Jiafeng, LAN Yanyan, et al. A biterm topic model for short texts[C]. 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil: ACM, 2013.
    [12]
    MIMNO D M, WALLACH H M, TALLEY E M. Optimizing semantic coherence in topic models[C]. The 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK: ACL, 2011.
    [13]
    JÓNSSON E, STOLEE J. An evaluation of topic modelling techniques for twitter[C]. The 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China: ACL, 2015.
    [14]
    RÖDER Michael, BOTH Andreas, HINNEBURG Alexander. Exploring the space of topic coherence measures[C]. The Eighth ACM International Conference on Web Search and Data Mining, Shanghai, China: ACM, 2015.
    [15]
    李奕良. 基于贝叶斯网络的干散货船舶自沉事故致因分析[D]. 大连: 大连海事大学, 2020.

    LI Yiliang. Cause analysis of ship foundering accident of dry bulk carrier based on Bayesian network[D]. Dalian: Dalian Maritime University, 2020. (in Chinese).
    [16]
    陈兴园. 基于MAIB事故报告的水上交通事故管理致因研究[D]. 武汉: 武汉理工大学, 2016.

    CHEN Xingyuan. Study on management factors of water traffic accident based on MAIB accident reports[D]. Wuhan: Wuhan University of Technology, 2016. (in Chinese)
    [17]
    韩俊松, 吴宛青, 杜嘉立, 等. 中国沿海固体散货运输船自沉事故分析与对策[J]. 中国航海, 2014, 37 (1): 82-86. doi: 10.3969/j.issn.1000-4653.2014.01.018

    HAN Junsong, WU Wanqing, DU Jiali, et al. Countermeasures to foundering accidents of ships carrying solid bulk cargo in the coastal area of China[J]. Navigation of China, 2014, 37 (1): 82-86. (in Chinese) doi: 10.3969/j.issn.1000-4653.2014.01.018
    [18]
    乔赛雯. 基于贝叶斯网络方法对干散货船舶航行事故致因分析[D]. 大连: 大连海事大学, 2017.

    QIAO Saiwen. Based on Bayesian network method of dry bulk ships sailing accident cause analysis[D]. Dalian: Dalian Maritime University, 2017. (in Chinese)
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(6)  / Tables(10)

    Article Metrics

    Article views (904) PDF downloads(28) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return