2018-11-12 Program Development Spark - 动态获取UDF 12345678910111213def getPreprocessFunc(meanList: List[Double], stdList: List[Double]): Vector => Vector = (featureVec: Vector) => { var featureListBuffer = new ListBuffer[Double]() for (i <- 0 until featureVec.size){ if (stdList(i) > 0) featureListBuffer += (featureVec(i) - meanList(i)) / stdList(i) else featureListBuffer += featureVec(i) } Vectors.dense(featureListBuffer.toList.toArray)}val preprocessUdf = udf(getPreprocessFunc(meanList, stdList))df.withColumn("featureVecCol", preprocessUdf(col("featureVecCol"))) Newer Spark - 过滤DataFrame Older Spark - 将DataFrame中的Vector col切分