wangbo commented on a change in pull request #4524:
URL: https://github.com/apache/incubator-doris/pull/4524#discussion_r484158638
##########
File path:
fe/spark-dpp/src/main/java/org/apache/doris/load/loadv2/dpp/SparkDpp.java
##########
@@ -358,12 +362,44 @@ private void processRollupTree(RollupTreeNode rootNode,
return Pair.of(keyMap.toArray(new Integer[keyMap.size()]),
valueMap.toArray(new Integer[valueMap.size()]));
}
- // repartition dataframe by partitionid_bucketid
- // so data in the same bucket will be consecutive.
- private JavaPairRDD<List<Object>, Object[]>
fillTupleWithPartitionColumn(SparkSession spark, Dataset<Row> dataframe,
+ /**
+ * check decimal,char/varchar
+ */
+ private boolean validateData(Object srcValue, EtlJobConfig.EtlColumn
etlColumn, ColumnParser columnParser,Row row) {
+
+ switch (etlColumn.columnType.toUpperCase()) {
+ case "DECIMALV2":
+ // TODO(wb): support decimal round; see be
DecimalV2Value::round
+ DecimalParser decimalParser = (DecimalParser) columnParser;
+ BigDecimal srcBigDecimal = (BigDecimal) srcValue;
+ if (srcValue != null &&
(decimalParser.getMaxValue().compareTo(srcBigDecimal) < 0 ||
decimalParser.getMinValue().compareTo(srcBigDecimal) > 0)) {
+ LOG.warn(String.format("decimal value is not valid for
defination, column=%s, value=%s,precision=%s,scale=%s",
+ etlColumn.columnName, srcValue.toString(),
srcBigDecimal.precision(), srcBigDecimal.scale()));
+ abnormalRowAcc.add(1);
+ return false;
+ }
+ break;
+ case "CHAR":
+ case "VARCHAR":
+ // TODO(wb) padding char type
+ if (srcValue != null && srcValue.toString().length() >
etlColumn.stringLength) {
+ LOG.warn(String.format("the length of input is too long
than schema. column_name:%s,input_str[%s],schema length:%s,actual length:%s",
+ etlColumn.columnName, row.toString(),
etlColumn.stringLength, srcValue.toString().length()));
+ return false;
+ }
+ break;
Review comment:
we just validate char/varchar and decimal here,so it't not necessary
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]