一颗蔬菜

我虽是只猫却也常常思考

相关性度量

  • 是一种研究变量之间线性相关程度的量
  • 我们主要介绍皮尔逊相关系数

皮尔逊相关系数

皮尔逊相关系数例子

皮尔逊相关系数例子

  • 皮尔逊系数绝对值越大,线性相关程度越高。
  • 正负号代表正相关或负相关。

实践皮尔逊相关系数

  • 对北京市历年降水量进行相关性统计,看看年份与降水量之间的相关程度有多大
  • 原始数据

    2009,2007,2006,2005,2004,2003,2002,2001,2000,1999,1998,1997,1996,1995,1994,1993,1992,1991,1990,1989,1988,1987,1986,1985,1984,1983,1982,1981,1980,1979,1978,1977,1976,1975,1974,1973,1972,1971,1970,1969,1968,1967,1966,1965,1964,1963,1962,1961,1960,1959,1958,1957,1956,1955,1954,1953,1952,1951,1950,1949
    0.4806,0.4839,0.318,0.4107,0.4835,0.4445,0.3704,0.3389,0.3711,0.2669,0.7317,0.4309,0.7009,0.5725,0.8132,0.5067,0.5415,0.7479,0.6973,0.4422,0.6733,0.6839,0.6653,0.721,0.4888,0.4899,0.5444,0.3932,0.3807,0.7184,0.6648,0.779,0.684,0.3928,0.4747,0.6982,0.3742,0.5112,0.597,0.9132,0.3867,0.5934,0.5279,0.2618,0.8177,0.7756,0.3669,0.5998,0.5271,1.406,0.6919,0.4868,1.1157,0.9332,0.9614,0.6577,0.5573,0.4816,0.9109,0.921
  • 读取文本数据

    scala> val txt = sc.textFile("file:///home/ml/Documents/beijing1.txt")
  • 数据处理

    scala> val data = txt.flatMap(_.split(",")).map(_.toDouble)
    scala> val year = data.filter(_ > 1000)
    scala> val value = data.filter(_ <= 1000)
  • 相关性统计

    scala> import org.apache.spark.mllib.stat._
    scala> Statistics.corr(year,value)
    res3: Double = -0.4385405496488065  
  • 结论

年份与降水量的皮尔逊相关系数约为0.4,成负相关。可见,年份与降水量的线性相关程度小。


版权声明:本文为原创文章,版权归 一颗蔬菜 所有,转载请联系博主获得授权!
本文地址:https://www.suwenjin.com/index.php/archives/258/

发表评论

正在加载 Emoji