[GitHub] [incubator-doris] wuyunfeng opened a new issue #1525: Expose data pruned-filter-scan ability for Integrating Spark and Doris
wuyunfeng opened a new issue #1525: Expose data pruned-filter-scan ability for Integrating Spark and Doris URL: https://github.com/apache/incubator-doris/issues/1525 ### Background 1. Recently years, Machine Learning intensively are integrated to computing system such as Spark MLib for Spark etc. if Doris expose the data for Computing System such as Spark, then we can make full use of the Computing System's MLib gains a competitive edge with state-of-the art Machine Learning on large data sets, collaborate, and productionize models at massive scale 2. When Doris processing large query, if memory exceed the memory limitation, this query would fail, but Spark can write the intermediate result back to Disk gracefully, the Integration for Spark and Doris not only resolved this problem, but also Spark can take full advantage of the scan performance of the column-stride of Doris Storage Engine which can provide extra push-down filter ability save a lot of network IO . 3. Nowadays, enterprise store lots data on different Storage Service such as Mysql、Elasticsearch、Doris、HDFS、NFS、Table etc. They need a way to analyze all this data with conjunctive query,Spark already get through with almost these Service except Doris. ### How to realize 1. Doris FE is responsible for pruning the related tablet, decide which predicate can used to filter data, providing single node query plan fragment, then packed all those things and return client without scheduling these to backend instance in contrast to before. In this way, FE no longer would coordinate the query lifecycle 2. Doris BE is responsible for assembling all parameters from client, such as tabletIds、version、encoded plan fragment etc which mostly generated by `Doris FE `, and then execute this plan fragment instance, all the row results return by this plan fragment would be pushed to a attached blocking queue firstly, when client call `get_next` to iterate the result, fetch this batched result from blocking queue and answer the client. ### Additional API Doris FE HTTP Transport Protocol 1. GET Table Schema ``` GET /{cluster}/{database}/{table}/_schema ``` 2. GET Query Plan ``` POST /{cluster}/{database}/{table}/_query_plan { "sql": "select k2, k4 from table where k1 > 2 and k3 > 4" } ``` Doris BE Thrift Transport Protocol ``` // scan service expose ability of scanning data ability to other compute system service **TDorisExternalService** { // doris will build a scan context for this session, context_id returned if success TScanOpenResult open(1: TScanOpenParams params); // return the batch_size of data TScanBatchResult getNext(1: TScanNextBatchParams params); // release the context resource associated with the context_id TScanCloseResult close(1: TScanCloseParams params); } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on a change in pull request #1515: Add storage rowwise iterator
imay commented on a change in pull request #1515: Add storage rowwise iterator URL: https://github.com/apache/incubator-doris/pull/1515#discussion_r305700627 ## File path: be/src/olap/generic_iterators.cpp ## @@ -0,0 +1,346 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "olap/iterators.h" + +#include + +#include "olap/row_block2.h" +#include "util/arena.h" + +namespace doris { + +// This iterator will generate ordered data. For example for schema +// (int, int) this iterator will generator data like +// (0, 1), (1, 2), (2, 4), (3, 4)... +// +// Usage: +// SchemaV2 schema; +// AutoIncrementIterator iter(schema, 1000); +// StorageReadOptions opts; +// RETURN_IF_ERROR(iter.init(opts)); +// RowBlockV2 block; +// do { +// st = iter.next_batch(&block); +// } while (st.ok()); +class AutoIncrementIterator : public RowwiseIterator { +public: +// Will generate num_rows rows in total +AutoIncrementIterator(const SchemaV2& schema, size_t num_rows) +: _schema(schema), _num_rows(num_rows), _rows_returned(0) { +} +~AutoIncrementIterator() override { } + +// NOTE: Currently, this function will ignore StorageReadOptions +Status init(const StorageReadOptions& opts) override; +Status next_batch(RowBlockV2* block) override; + +const SchemaV2& schema() const override { return _schema; } +private: +SchemaV2 _schema; +size_t _num_rows; +size_t _rows_returned; +}; + +Status AutoIncrementIterator::init(const StorageReadOptions& opts) { +return Status::OK(); +} + +Status AutoIncrementIterator::next_batch(RowBlockV2* block) { +int row_idx = 0; +while (row_idx < block->capacity() && _rows_returned < _num_rows) { +RowBlockRow row = block->row(row_idx); + +for (int i = 0; i < _schema.column_schemas().size(); ++i) { +row.set_is_null(i, false); +auto& col_schema = _schema.column_schemas()[i]; +switch (col_schema.type()) { +case OLAP_FIELD_TYPE_SMALLINT: +*(int16_t*)row.cell_ptr(i) = _rows_returned + i; +break; +case OLAP_FIELD_TYPE_INT: +*(int32_t*)row.cell_ptr(i) = _rows_returned + i; +break; +case OLAP_FIELD_TYPE_BIGINT: +*(int64_t*)row.cell_ptr(i) = _rows_returned + i; +break; +case OLAP_FIELD_TYPE_FLOAT: +*(float*)row.cell_ptr(i) = _rows_returned + i; +break; +case OLAP_FIELD_TYPE_DOUBLE: +*(double*)row.cell_ptr(i) = _rows_returned + i; +break; +default: +break; +} +} +row_idx++; +_rows_returned++; +} +block->resize(row_idx); +if (row_idx > 0) { +return Status::OK(); +} +return Status::EndOfFile("End of AutoIncrementIterator"); +} + +// Used to store merge state for a MergeIterator input. +// This class will iterate all data from internal iterator +// through client call advance(). +// Usage: +// MergeContext ctx(iter); +// RETURN_IF_ERROR(ctx.init()); +// while (ctx.valid()) { +// visit(ctx.current_row()); +// RETURN_IF_ERROR(ctx.advance()); +// } +class MergeContext { +public: +// This class don't take iter's ownership, client should delete it +MergeContext(RowwiseIterator* iter) +: _iter(iter), _block(iter->schema(), 1024, &_arena) { +} + +// Intialize this context and will prepare data for current_row() +Status init(const StorageReadOptions& opts); + +// Return current row which internal row index points to +// And this function won't make internal index advance. +// Before call this function, Client must assure that +// valid() return true +RowBlockRow current_row() const { +return RowBlockRow(&_block, _index_in_block); +} + +// Advance internal row index to next valid row +// Return error if error happens +// Don't call this when valid() is false, action is undefined +Status advance(); + +// Return if has remaining data in this context. +// Only when
[GitHub] [incubator-doris] imay commented on a change in pull request #1515: Add storage rowwise iterator
imay commented on a change in pull request #1515: Add storage rowwise iterator URL: https://github.com/apache/incubator-doris/pull/1515#discussion_r305700583 ## File path: be/src/olap/generic_iterators.cpp ## @@ -0,0 +1,346 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "olap/iterators.h" + +#include + +#include "olap/row_block2.h" +#include "util/arena.h" + +namespace doris { + +// This iterator will generate ordered data. For example for schema +// (int, int) this iterator will generator data like +// (0, 1), (1, 2), (2, 4), (3, 4)... +// +// Usage: +// SchemaV2 schema; +// AutoIncrementIterator iter(schema, 1000); +// StorageReadOptions opts; +// RETURN_IF_ERROR(iter.init(opts)); +// RowBlockV2 block; +// do { +// st = iter.next_batch(&block); +// } while (st.ok()); +class AutoIncrementIterator : public RowwiseIterator { +public: +// Will generate num_rows rows in total +AutoIncrementIterator(const SchemaV2& schema, size_t num_rows) +: _schema(schema), _num_rows(num_rows), _rows_returned(0) { +} +~AutoIncrementIterator() override { } + +// NOTE: Currently, this function will ignore StorageReadOptions +Status init(const StorageReadOptions& opts) override; +Status next_batch(RowBlockV2* block) override; + +const SchemaV2& schema() const override { return _schema; } +private: +SchemaV2 _schema; +size_t _num_rows; +size_t _rows_returned; +}; + +Status AutoIncrementIterator::init(const StorageReadOptions& opts) { +return Status::OK(); +} + +Status AutoIncrementIterator::next_batch(RowBlockV2* block) { +int row_idx = 0; +while (row_idx < block->capacity() && _rows_returned < _num_rows) { +RowBlockRow row = block->row(row_idx); + +for (int i = 0; i < _schema.column_schemas().size(); ++i) { +row.set_is_null(i, false); +auto& col_schema = _schema.column_schemas()[i]; +switch (col_schema.type()) { +case OLAP_FIELD_TYPE_SMALLINT: +*(int16_t*)row.cell_ptr(i) = _rows_returned + i; +break; +case OLAP_FIELD_TYPE_INT: +*(int32_t*)row.cell_ptr(i) = _rows_returned + i; +break; +case OLAP_FIELD_TYPE_BIGINT: +*(int64_t*)row.cell_ptr(i) = _rows_returned + i; +break; +case OLAP_FIELD_TYPE_FLOAT: +*(float*)row.cell_ptr(i) = _rows_returned + i; +break; +case OLAP_FIELD_TYPE_DOUBLE: +*(double*)row.cell_ptr(i) = _rows_returned + i; +break; +default: +break; +} +} +row_idx++; +_rows_returned++; +} +block->resize(row_idx); +if (row_idx > 0) { +return Status::OK(); +} +return Status::EndOfFile("End of AutoIncrementIterator"); +} + +// Used to store merge state for a MergeIterator input. +// This class will iterate all data from internal iterator +// through client call advance(). +// Usage: +// MergeContext ctx(iter); +// RETURN_IF_ERROR(ctx.init()); +// while (ctx.valid()) { +// visit(ctx.current_row()); +// RETURN_IF_ERROR(ctx.advance()); +// } +class MergeContext { +public: +// This class don't take iter's ownership, client should delete it +MergeContext(RowwiseIterator* iter) +: _iter(iter), _block(iter->schema(), 1024, &_arena) { +} + +// Intialize this context and will prepare data for current_row() +Status init(const StorageReadOptions& opts); + +// Return current row which internal row index points to +// And this function won't make internal index advance. +// Before call this function, Client must assure that +// valid() return true +RowBlockRow current_row() const { +return RowBlockRow(&_block, _index_in_block); +} + +// Advance internal row index to next valid row +// Return error if error happens +// Don't call this when valid() is false, action is undefined +Status advance(); + +// Return if has remaining data in this context. +// Only when
[GitHub] [incubator-doris] imay commented on a change in pull request #1515: Add storage rowwise iterator
imay commented on a change in pull request #1515: Add storage rowwise iterator URL: https://github.com/apache/incubator-doris/pull/1515#discussion_r305701273 ## File path: be/src/olap/row_block2.h ## @@ -0,0 +1,113 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include +#include +#include + +#include "common/status.h" +#include "olap/column_block.h" +#include "olap/schema2.h" +#include "olap/types.h" + +namespace doris { + +class Arena; +class RowCursor; + +// This struct contains a block of rows, in which each column's data is stored +// in a vector. We don't use VectorizedRowBatch because it doesn't own the data +// in block, however it is used by old code, which we don't want to change. +class RowBlockV2 { +public: +RowBlockV2(const SchemaV2& schema, uint16_t capacity, Arena* arena); +~RowBlockV2(); + +void resize(size_t num_rows) { _num_rows = num_rows; } +size_t num_rows() const { return _num_rows; } +size_t capacity() const { return _capacity; } +Arena* arena() const { return _arena; } + +// Copy the row_idx row's data into given row_cursor. +// This function will use shallow copy, so the client should +// notice the life time of returned value +Status copy_to_row_cursor(size_t row_idx, RowCursor* row_cursor); + +// Get column block for input column index. This input is the index in +// this row block, is not the index in table's schema +ColumnBlock column_block(size_t col_idx) const { +const TypeInfo* type_info = _schema.column(col_idx).type_info(); +uint8_t* data = _column_datas[col_idx]; +uint8_t* null_bitmap = _column_null_bitmaps[col_idx]; +return ColumnBlock(type_info, data, null_bitmap, _arena); +} + +RowBlockRow row(size_t row_idx) const; + +const SchemaV2& schema() const { return _schema; } + +private: +SchemaV2 _schema; +std::vector _column_datas; +std::vector _column_null_bitmaps; +size_t _capacity; +size_t _num_rows; +Arena* _arena; Review comment: arena is used to allocate variable length column, like varchar This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on a change in pull request #1515: Add storage rowwise iterator
imay commented on a change in pull request #1515: Add storage rowwise iterator URL: https://github.com/apache/incubator-doris/pull/1515#discussion_r305702566 ## File path: be/src/olap/generic_iterators.cpp ## @@ -0,0 +1,346 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "olap/iterators.h" + +#include + +#include "olap/row_block2.h" +#include "util/arena.h" + +namespace doris { + +// This iterator will generate ordered data. For example for schema +// (int, int) this iterator will generator data like +// (0, 1), (1, 2), (2, 4), (3, 4)... +// +// Usage: +// SchemaV2 schema; +// AutoIncrementIterator iter(schema, 1000); +// StorageReadOptions opts; +// RETURN_IF_ERROR(iter.init(opts)); +// RowBlockV2 block; +// do { +// st = iter.next_batch(&block); +// } while (st.ok()); +class AutoIncrementIterator : public RowwiseIterator { +public: +// Will generate num_rows rows in total +AutoIncrementIterator(const SchemaV2& schema, size_t num_rows) +: _schema(schema), _num_rows(num_rows), _rows_returned(0) { +} +~AutoIncrementIterator() override { } + +// NOTE: Currently, this function will ignore StorageReadOptions +Status init(const StorageReadOptions& opts) override; +Status next_batch(RowBlockV2* block) override; + +const SchemaV2& schema() const override { return _schema; } +private: +SchemaV2 _schema; +size_t _num_rows; +size_t _rows_returned; +}; + +Status AutoIncrementIterator::init(const StorageReadOptions& opts) { +return Status::OK(); +} + +Status AutoIncrementIterator::next_batch(RowBlockV2* block) { +int row_idx = 0; +while (row_idx < block->capacity() && _rows_returned < _num_rows) { +RowBlockRow row = block->row(row_idx); + +for (int i = 0; i < _schema.column_schemas().size(); ++i) { +row.set_is_null(i, false); +auto& col_schema = _schema.column_schemas()[i]; +switch (col_schema.type()) { +case OLAP_FIELD_TYPE_SMALLINT: +*(int16_t*)row.cell_ptr(i) = _rows_returned + i; +break; +case OLAP_FIELD_TYPE_INT: +*(int32_t*)row.cell_ptr(i) = _rows_returned + i; +break; +case OLAP_FIELD_TYPE_BIGINT: +*(int64_t*)row.cell_ptr(i) = _rows_returned + i; +break; +case OLAP_FIELD_TYPE_FLOAT: +*(float*)row.cell_ptr(i) = _rows_returned + i; +break; +case OLAP_FIELD_TYPE_DOUBLE: +*(double*)row.cell_ptr(i) = _rows_returned + i; +break; +default: +break; +} +} +row_idx++; +_rows_returned++; +} +block->resize(row_idx); +if (row_idx > 0) { +return Status::OK(); +} +return Status::EndOfFile("End of AutoIncrementIterator"); +} + +// Used to store merge state for a MergeIterator input. +// This class will iterate all data from internal iterator +// through client call advance(). +// Usage: +// MergeContext ctx(iter); +// RETURN_IF_ERROR(ctx.init()); +// while (ctx.valid()) { +// visit(ctx.current_row()); +// RETURN_IF_ERROR(ctx.advance()); +// } +class MergeContext { +public: +// This class don't take iter's ownership, client should delete it +MergeContext(RowwiseIterator* iter) +: _iter(iter), _block(iter->schema(), 1024, &_arena) { +} + +// Intialize this context and will prepare data for current_row() +Status init(const StorageReadOptions& opts); + +// Return current row which internal row index points to +// And this function won't make internal index advance. +// Before call this function, Client must assure that +// valid() return true +RowBlockRow current_row() const { +return RowBlockRow(&_block, _index_in_block); +} + +// Advance internal row index to next valid row +// Return error if error happens +// Don't call this when valid() is false, action is undefined +Status advance(); + +// Return if has remaining data in this context. +// Only when
[GitHub] [incubator-doris] imay commented on a change in pull request #1505: Add timediff function
imay commented on a change in pull request #1505: Add timediff function URL: https://github.com/apache/incubator-doris/pull/1505#discussion_r305703865 ## File path: be/src/exprs/literal.cpp ## @@ -87,6 +87,11 @@ Literal::Literal(const TExprNode& node) : _value.datetime_val.from_date_str( node.date_literal.value.c_str(), node.date_literal.value.size()); break; +case TYPE_TIME: Review comment: reuse TYPE_DOUBLE This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on a change in pull request #1505: Add timediff function
imay commented on a change in pull request #1505: Add timediff function URL: https://github.com/apache/incubator-doris/pull/1505#discussion_r305705377 ## File path: be/src/runtime/result_writer.cpp ## @@ -104,6 +104,39 @@ Status ResultWriter::add_one_row(TupleRow* row) { buf_ret = _row_buffer->push_double(*static_cast(item)); break; +case TYPE_TIME: { Review comment: you can write a function to do this to avoid write this many times This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on a change in pull request #1505: Add timediff function
imay commented on a change in pull request #1505: Add timediff function URL: https://github.com/apache/incubator-doris/pull/1505#discussion_r305705198 ## File path: be/src/runtime/raw_value.cpp ## @@ -267,6 +267,7 @@ void RawValue::write(const void* value, void* dst, const TypeDescriptor& type, M break; } +case TYPE_TIME: Review comment: I think this should be the same as TYPE_DOUBLE This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on a change in pull request #1505: Add timediff function
imay commented on a change in pull request #1505: Add timediff function URL: https://github.com/apache/incubator-doris/pull/1505#discussion_r305704769 ## File path: be/src/exprs/time_operators.h ## @@ -0,0 +1,48 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#ifndef DORIS_BE_SRC_EXPRS_TIME_OPERATORS_H +#define DORIS_BE_SRC_EXPRS_TIME_OPERATORS_H + +#include +#include "udf/udf.h" + +namespace doris { +class Expr; +struct ExprValue; +class TupleRow; + +/// Implementation of the time operators. These include the cast, +/// arithmetic and binary operators. +class TimeOperators { +public: +static void init(); + +static BooleanVal cast_to_boolean_val(FunctionContext*, const DoubleVal&); Review comment: I think we can only support time to double and to string This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on a change in pull request #1505: Add timediff function
imay commented on a change in pull request #1505: Add timediff function URL: https://github.com/apache/incubator-doris/pull/1505#discussion_r305705811 ## File path: fe/src/main/java/org/apache/doris/catalog/Function.java ## @@ -480,6 +481,7 @@ public static String getUdfType(PrimitiveType t) { case INT: return "IntVal"; case BIGINT: +case TIME: Review comment: why BIGINT? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on a change in pull request #1505: Add timediff function
imay commented on a change in pull request #1505: Add timediff function URL: https://github.com/apache/incubator-doris/pull/1505#discussion_r305707087 ## File path: fe/src/main/java/org/apache/doris/catalog/PrimitiveType.java ## @@ -480,6 +486,7 @@ public static boolean isImplicitCast(PrimitiveType type, PrimitiveType target) { compatibilityMatrix[DECIMALV2.ordinal()][DECIMAL.ordinal()] = DECIMALV2; compatibilityMatrix[HLL.ordinal()][HLL.ordinal()] = HLL; +compatibilityMatrix[TIME.ordinal()][TIME.ordinal()] = TIME; Review comment: you should add more relation between time and other types This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] wuyunfeng opened a new pull request #1526: Add spark-doris-connector overview
wuyunfeng opened a new pull request #1526: Add spark-doris-connector overview URL: https://github.com/apache/incubator-doris/pull/1526 spark-doris-connector architecture overview This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] wuyunfeng commented on issue #1526: Add spark-doris-connector overview
wuyunfeng commented on issue #1526: Add spark-doris-connector overview URL: https://github.com/apache/incubator-doris/pull/1526#issuecomment-513680135 [architecture](https://github.com/apache/incubator-doris/issues/1525) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] wuyunfeng opened a new pull request #1527: Expose data pruned-filter-scan ability
wuyunfeng opened a new pull request #1527: Expose data pruned-filter-scan ability URL: https://github.com/apache/incubator-doris/pull/1527 This PR is related with [Spark-Doris-Connector](https://github.com/apache/incubator-doris/issues/1525). The First Step is Expose data pruned-filter-scan ability, then implement the Spark-Doris-Connector on Spark side. below is the tremendous detailed design This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] kangpinghuang commented on a change in pull request #1409: Add dict page
kangpinghuang commented on a change in pull request #1409: Add dict page URL: https://github.com/apache/incubator-doris/pull/1409#discussion_r305778667 ## File path: be/src/olap/rowset/segment_v2/binary_dict_page.cpp ## @@ -0,0 +1,231 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "olap/rowset/segment_v2/binary_dict_page.h" + +#include "util/slice.h" // for Slice +#include "gutil/strings/substitute.h" // for Substitute +#include "runtime/mem_pool.h" + +#include "olap/rowset/segment_v2/bitshuffle_page.h" +#include "olap/rowset/segment_v2/rle_page.h" + +namespace doris { +namespace segment_v2 { + +using strings::Substitute; + +BinaryDictPageBuilder::BinaryDictPageBuilder(const PageBuilderOptions& options) : +_options(options), +_finished(false), +_data_page_builder(nullptr), +_dict_builder(nullptr), +_encoding_type(DICT_ENCODING) { +// initially use DICT_ENCODING +// TODO: the data page builder type can be created by Factory according to user config +_data_page_builder.reset(new BitshufflePageBuilder(options)); +PageBuilderOptions dict_builder_options; +dict_builder_options.data_page_size = _options.dict_page_size; +_dict_builder.reset(new BinaryPlainPageBuilder(dict_builder_options)); +reset(); +} + +bool BinaryDictPageBuilder::is_page_full() { +if (_data_page_builder->is_page_full()) { +return true; +} +if (_encoding_type == DICT_ENCODING && _dict_builder->is_page_full()) { +return true; +} +return false; +} + +Status BinaryDictPageBuilder::add(const uint8_t* vals, size_t* count) { +if (_encoding_type == DICT_ENCODING) { +DCHECK(!_finished); +DCHECK_GT(*count, 0); +const Slice* src = reinterpret_cast(vals); +size_t num_added = 0; +for (int i = 0; i < *count; ++i, ++src) { +auto ret = _dictionary.find(*src); +size_t add_count = 1; +if (ret != _dictionary.end()) { +uint32_t value_code = ret->second; +RETURN_IF_ERROR(_data_page_builder->add(reinterpret_cast(&value_code), &add_count)); +num_added += add_count; +if (add_count == 0) { +// current data page is full, stop processing remaining inputs +break; +} +} else { +if (_dict_builder->is_page_full()) { +break; +} +char* item_mem = _arena.Allocate(src->size); +if (item_mem == nullptr) { +return Status::Corruption(Substitute("memory allocate failed, size:$0", src->size)); +} +Slice dict_item(src->data, src->size); +dict_item.relocate(item_mem); +uint32_t value_code = _dictionary.size(); +_dictionary.insert({dict_item, value_code}); +_dict_items.push_back(dict_item); +_dict_builder->update_prepared_size(dict_item.size); +RETURN_IF_ERROR(_data_page_builder->add(reinterpret_cast(&value_code), &add_count)); +if (add_count == 0) { +// current data page is full, stop processing remaining inputs +break; +} +num_added += 1; +} +} +*count = num_added; +return Status::OK(); +} else { +DCHECK_EQ(_encoding_type, PLAIN_ENCODING); +return _data_page_builder->add(vals, count); +} +} + +Slice BinaryDictPageBuilder::finish() { +_finished = true; + +Slice data_slice = _data_page_builder->finish(); +_buffer.append(data_slice.data, data_slice.size); +encode_fixed32_le(&_buffer[0], _encoding_type); +return Slice(_buffer.data(), _buffer.size()); +} + +void BinaryDictPageBuilder::reset() { +_finished = false; +_buffer.clear(); +_buffer.resize(BINARY_DICT_PAGE_HEADER_SIZE); +_buffer.reserve(_options.data_page_size); + +if (_encoding_type == DICT_ENCODING +&& _dict_builder->is_page_full()) { +_data_page_builder.reset(new BinaryPlainPageBuilder(_options)); +
[GitHub] [incubator-doris] kangpinghuang commented on a change in pull request #1409: Add dict page
kangpinghuang commented on a change in pull request #1409: Add dict page URL: https://github.com/apache/incubator-doris/pull/1409#discussion_r305779712 ## File path: be/src/olap/rowset/segment_v2/binary_dict_page.cpp ## @@ -0,0 +1,231 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "olap/rowset/segment_v2/binary_dict_page.h" + +#include "util/slice.h" // for Slice +#include "gutil/strings/substitute.h" // for Substitute +#include "runtime/mem_pool.h" + +#include "olap/rowset/segment_v2/bitshuffle_page.h" +#include "olap/rowset/segment_v2/rle_page.h" + +namespace doris { +namespace segment_v2 { + +using strings::Substitute; + +BinaryDictPageBuilder::BinaryDictPageBuilder(const PageBuilderOptions& options) : +_options(options), +_finished(false), +_data_page_builder(nullptr), +_dict_builder(nullptr), +_encoding_type(DICT_ENCODING) { +// initially use DICT_ENCODING +// TODO: the data page builder type can be created by Factory according to user config +_data_page_builder.reset(new BitshufflePageBuilder(options)); +PageBuilderOptions dict_builder_options; +dict_builder_options.data_page_size = _options.dict_page_size; +_dict_builder.reset(new BinaryPlainPageBuilder(dict_builder_options)); +reset(); +} + +bool BinaryDictPageBuilder::is_page_full() { +if (_data_page_builder->is_page_full()) { +return true; +} +if (_encoding_type == DICT_ENCODING && _dict_builder->is_page_full()) { +return true; +} +return false; +} + +Status BinaryDictPageBuilder::add(const uint8_t* vals, size_t* count) { +if (_encoding_type == DICT_ENCODING) { +DCHECK(!_finished); +DCHECK_GT(*count, 0); +const Slice* src = reinterpret_cast(vals); +size_t num_added = 0; +for (int i = 0; i < *count; ++i, ++src) { +auto ret = _dictionary.find(*src); +size_t add_count = 1; +if (ret != _dictionary.end()) { +uint32_t value_code = ret->second; +RETURN_IF_ERROR(_data_page_builder->add(reinterpret_cast(&value_code), &add_count)); +num_added += add_count; +if (add_count == 0) { +// current data page is full, stop processing remaining inputs +break; +} +} else { +if (_dict_builder->is_page_full()) { +break; +} +char* item_mem = _arena.Allocate(src->size); +if (item_mem == nullptr) { +return Status::Corruption(Substitute("memory allocate failed, size:$0", src->size)); +} +Slice dict_item(src->data, src->size); +dict_item.relocate(item_mem); +uint32_t value_code = _dictionary.size(); +_dictionary.insert({dict_item, value_code}); +_dict_items.push_back(dict_item); +_dict_builder->update_prepared_size(dict_item.size); +RETURN_IF_ERROR(_data_page_builder->add(reinterpret_cast(&value_code), &add_count)); +if (add_count == 0) { +// current data page is full, stop processing remaining inputs +break; +} +num_added += 1; +} +} +*count = num_added; +return Status::OK(); +} else { +DCHECK_EQ(_encoding_type, PLAIN_ENCODING); +return _data_page_builder->add(vals, count); +} +} + +Slice BinaryDictPageBuilder::finish() { +_finished = true; + +Slice data_slice = _data_page_builder->finish(); +_buffer.append(data_slice.data, data_slice.size); +encode_fixed32_le(&_buffer[0], _encoding_type); +return Slice(_buffer.data(), _buffer.size()); +} + +void BinaryDictPageBuilder::reset() { +_finished = false; +_buffer.clear(); +_buffer.resize(BINARY_DICT_PAGE_HEADER_SIZE); +_buffer.reserve(_options.data_page_size); + +if (_encoding_type == DICT_ENCODING +&& _dict_builder->is_page_full()) { +_data_page_builder.reset(new BinaryPlainPageBuilder(_options)); +
[GitHub] [incubator-doris] kangpinghuang commented on a change in pull request #1409: Add dict page
kangpinghuang commented on a change in pull request #1409: Add dict page URL: https://github.com/apache/incubator-doris/pull/1409#discussion_r305781309 ## File path: be/src/olap/rowset/segment_v2/binary_dict_page.cpp ## @@ -0,0 +1,231 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "olap/rowset/segment_v2/binary_dict_page.h" + +#include "util/slice.h" // for Slice +#include "gutil/strings/substitute.h" // for Substitute +#include "runtime/mem_pool.h" + +#include "olap/rowset/segment_v2/bitshuffle_page.h" +#include "olap/rowset/segment_v2/rle_page.h" + +namespace doris { +namespace segment_v2 { + +using strings::Substitute; + +BinaryDictPageBuilder::BinaryDictPageBuilder(const PageBuilderOptions& options) : +_options(options), +_finished(false), +_data_page_builder(nullptr), +_dict_builder(nullptr), +_encoding_type(DICT_ENCODING) { +// initially use DICT_ENCODING +// TODO: the data page builder type can be created by Factory according to user config +_data_page_builder.reset(new BitshufflePageBuilder(options)); +PageBuilderOptions dict_builder_options; +dict_builder_options.data_page_size = _options.dict_page_size; +_dict_builder.reset(new BinaryPlainPageBuilder(dict_builder_options)); +reset(); +} + +bool BinaryDictPageBuilder::is_page_full() { +if (_data_page_builder->is_page_full()) { +return true; +} +if (_encoding_type == DICT_ENCODING && _dict_builder->is_page_full()) { +return true; +} +return false; +} + +Status BinaryDictPageBuilder::add(const uint8_t* vals, size_t* count) { +if (_encoding_type == DICT_ENCODING) { +DCHECK(!_finished); +DCHECK_GT(*count, 0); +const Slice* src = reinterpret_cast(vals); +size_t num_added = 0; +for (int i = 0; i < *count; ++i, ++src) { +auto ret = _dictionary.find(*src); +size_t add_count = 1; +if (ret != _dictionary.end()) { +uint32_t value_code = ret->second; +RETURN_IF_ERROR(_data_page_builder->add(reinterpret_cast(&value_code), &add_count)); +num_added += add_count; +if (add_count == 0) { +// current data page is full, stop processing remaining inputs +break; +} +} else { +if (_dict_builder->is_page_full()) { +break; +} +char* item_mem = _arena.Allocate(src->size); +if (item_mem == nullptr) { +return Status::Corruption(Substitute("memory allocate failed, size:$0", src->size)); +} +Slice dict_item(src->data, src->size); +dict_item.relocate(item_mem); +uint32_t value_code = _dictionary.size(); +_dictionary.insert({dict_item, value_code}); +_dict_items.push_back(dict_item); +_dict_builder->update_prepared_size(dict_item.size); +RETURN_IF_ERROR(_data_page_builder->add(reinterpret_cast(&value_code), &add_count)); +if (add_count == 0) { +// current data page is full, stop processing remaining inputs +break; +} +num_added += 1; +} +} +*count = num_added; +return Status::OK(); +} else { +DCHECK_EQ(_encoding_type, PLAIN_ENCODING); +return _data_page_builder->add(vals, count); +} +} + +Slice BinaryDictPageBuilder::finish() { +_finished = true; + +Slice data_slice = _data_page_builder->finish(); +_buffer.append(data_slice.data, data_slice.size); +encode_fixed32_le(&_buffer[0], _encoding_type); +return Slice(_buffer.data(), _buffer.size()); +} + +void BinaryDictPageBuilder::reset() { +_finished = false; +_buffer.clear(); +_buffer.resize(BINARY_DICT_PAGE_HEADER_SIZE); +_buffer.reserve(_options.data_page_size); Review comment: ok This is an automated message from the Apache Git Service. To respond to the messag
[GitHub] [incubator-doris] wuyunfeng closed pull request #1527: Expose data pruned-filter-scan ability
wuyunfeng closed pull request #1527: Expose data pruned-filter-scan ability URL: https://github.com/apache/incubator-doris/pull/1527 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] wuyunfeng opened a new pull request #1527: Expose data pruned-filter-scan ability
wuyunfeng opened a new pull request #1527: Expose data pruned-filter-scan ability URL: https://github.com/apache/incubator-doris/pull/1527 This PR is related with [Spark-Doris-Connector](https://github.com/apache/incubator-doris/issues/1525). The First Step is Expose data pruned-filter-scan ability, then implement the Spark-Doris-Connector on Spark side. below is the tremendous detailed design This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] morningman opened a new pull request #1528: Fix bug that user with LOAD_PRIV can see load job by SHOW LOAD stmt
morningman opened a new pull request #1528: Fix bug that user with LOAD_PRIV can see load job by SHOW LOAD stmt URL: https://github.com/apache/incubator-doris/pull/1528 User should has LOAD_PRIV to use SHOW LOAD stmt, not SHOW_PRIV. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
Expose data pruned-filter-scan ability for Integrating Spark and Doris
Hi, all Doris Commiter Recently I will create a pull request for Expose data pruned-filter-scan ability for Integrating Spark and Doris. Please propose this proposal, thanks. The First Step is Expose data pruned-filter-scan ability, then implement the Spark-Doris-Connector on Spark side. below is the tremendous detailed design. Outline Doris for Apache Spark (Spark-Doris-Connector) allowing Spark tasks to use it as a library and interact with Doris through Doris for Apache Spark APIs Background Recently years, Machine Learning intensively are integrated to computing system such as Spark MLib for Spark etc. if Doris expose the data for Computing System such as Spark, then we can make full use of the Computing System's MLib gains a competitive edge with state-of-the art Machine Learning on large data sets, collaborate, and productionize models at massive scale When Doris processing large query, if memory exceed the memory limitation, this query would fail, but Spark can write the intermediate result back to Disk gracefully, the Integration for Spark and Doris not only resolved this problem, but also Spark can take full advantage of the scan performance of the column-stride of Doris Storage Engine which can provide extra push-down filter ability save a lot of network IO . Nowadays, enterprise store lots data on different Storage Service such as Mysql、Elasticsearch、Doris、HDFS、NFS、Table etc. They need a way to analyze all this data with conjunctive query,Spark already get through with almost these Service except Doris. If build the bridge for Spark and Doris, Spark can provided unified data manipulation platform for this scenario How to realize Doris FE is responsible for pruning the related tablet, decide which predicate can used to filter data, providing single node query plan fragment, then packed all those things and return client without scheduling these to backend instance in contrast to before. In this way, FE no longer would coordinate the query lifecycle Doris BE is responsible for assembling all parameters from client, such as tabletIds、version、encoded plan fragment etc which mostly generated by Doris FE , and then execute this plan fragment instance, all the row results return by this plan fragment would be pushed to a attached blocking queue firstly, when client call get_next to iterate the result, fetch this batched result from blocking queue and answer the client. Architecture https://github.com/wuyunfeng/incubator-doris/blob/docs/docs/resources/images/spark_doris_connector.jpg WorkFlow Spark ApplicationMaster fetch the Table schema from Doris FE Spark ApplicationMaster get the Query Plan From Doris FE. Spark ApplicationMaster transfer the Query Plan to all relevant Executor Worker Spark Executor Worker open context from Doris BE with the generated QueryPlan Spark Executor Worker iterate all data from Doris BE by get_next Spark Executor Worker close context from Doris BE Additional API Doris FE HTTP Transport Protocol GET Table Schema Request: GET /{cluster}/{database}/{table}/_schema Response: { "status": 200, "properties":{ "k1":{ "type":"SMALLINT", "comment":"this is a small SMALLINT column" }, "k2":{ "type":"LARGEINT", "comment":"this is a small LARGEINT column" }, "k3":{ "type":"FLOAT", "comment":"this is a small FLOAT column" }, "k4":{ "type":"CHAR", "comment":"this is a small CHAR column" } } } GET Query Plan Request: POST /{cluster}/{database}/{table}/_query_plan { "sql": "select k2, k4 from table where k1 > 2 and k3 > 4" } Response: { "status": 200, "encrypted_plan_desc":"This is a base64 encrypted doris FE query plan", "partitions":{ "{tabletId}":[ { "host":"192.168.0.1", "port":"9301" }, { "host":"192.168.0.2", "port":"9301" }, { "host":"192.168.0.3", "port":"9301" } ], "139":[ { "host":"192.168.0.3", "port":"9301" }, { "host":"192.168.0.4", "port":"9301" }, { "host":"192.168.0.6", "port":"9301" } ], "140":[ { "host":"192.168.0.1", "port":"9301" }, { "host":"192.168.0.2", "port":"9301" }, { "host":"192.168.0.3", "port":"9301" } ] } } Doris BE Thrift Transport Protocol External Service: // scan service expose ability of scanning data ability to other compute system service **TDorisExternalService**
[GitHub] [incubator-doris] imay closed pull request #1413: Add short key index builder, decoder, iterator
imay closed pull request #1413: Add short key index builder, decoder, iterator URL: https://github.com/apache/incubator-doris/pull/1413 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay merged pull request #1528: Fix bug that user with LOAD_PRIV can see load job by SHOW LOAD stmt
imay merged pull request #1528: Fix bug that user with LOAD_PRIV can see load job by SHOW LOAD stmt URL: https://github.com/apache/incubator-doris/pull/1528 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay opened a new pull request #1529: Refactor agg
imay opened a new pull request #1529: Refactor agg URL: https://github.com/apache/incubator-doris/pull/1529 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1409: Add dict page
gaodayue commented on a change in pull request #1409: Add dict page URL: https://github.com/apache/incubator-doris/pull/1409#discussion_r306099633 ## File path: be/src/olap/rowset/segment_v2/bitshuffle_page.h ## @@ -170,11 +148,11 @@ class BitshufflePageBuilder : public PageBuilder { _data.push_back(0); } -_buffer.resize(BITSHUFFLE_BLOCK_HEADER_SIZE + +_buffer.resize(BITSHUFFLE_PAGE_HEADER_SIZE + bitshuffle::compress_lz4_bound(num_elems_after_padding, final_size_of_type, 0)); encode_fixed32_le(&_buffer[0], _count); -int64_t bytes = bitshuffle::compress_lz4(_data.data(), &_buffer[BITSHUFFLE_BLOCK_HEADER_SIZE], +uint32_t bytes = bitshuffle::compress_lz4(_data.data(), &_buffer[BITSHUFFLE_PAGE_HEADER_SIZE], Review comment: please fix this also This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] yiguolei commented on a change in pull request #1497: Add ALTER_TABLET task in be
yiguolei commented on a change in pull request #1497: Add ALTER_TABLET task in be URL: https://github.com/apache/incubator-doris/pull/1497#discussion_r306106519 ## File path: be/src/olap/schema_change.cpp ## @@ -1121,6 +1121,207 @@ bool SchemaChangeWithSorting::_external_sorting( return true; } +OLAPStatus SchemaChangeHandler::process_alter_tablet_v2(const TAlterTabletReqV2& request) { +LOG(INFO) << "begin to validate alter tablet request. base_tablet_id=" << request.base_tablet_id + << ", base_schema_hash" << request.base_schema_hash + << ", new_tablet_id=" << request.new_tablet_id + << ", new_schema_hash=" << request.new_schema_hash + << ", alter_version=" << request.alter_version + << ", alter_version_hash=" << request.alter_version_hash; + +// Lock schema_change_lock util schema change info is stored in tablet header +if (!StorageEngine::instance()->tablet_manager()->try_schema_change_lock(request.base_tablet_id)) { +LOG(WARNING) << "failed to obtain schema change lock. " + << "base_tablet=" << request.base_tablet_id; +return OLAP_ERR_TRY_LOCK_FAILED; +} + +OLAPStatus res = _do_process_alter_tablet_v2(request); +LOG(INFO) << "finished alter tablet process, res=" << res; + StorageEngine::instance()->tablet_manager()->release_schema_change_lock(request.base_tablet_id); +return res; +} + +OLAPStatus SchemaChangeHandler::_do_process_alter_tablet_v2(const TAlterTabletReqV2& request) { Review comment: NOT_READY tablet is created during create tablet task. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] yiguolei commented on a change in pull request #1497: Add ALTER_TABLET task in be
yiguolei commented on a change in pull request #1497: Add ALTER_TABLET task in be URL: https://github.com/apache/incubator-doris/pull/1497#discussion_r306107406 ## File path: be/src/olap/schema_change.cpp ## @@ -1121,6 +1121,207 @@ bool SchemaChangeWithSorting::_external_sorting( return true; } +OLAPStatus SchemaChangeHandler::process_alter_tablet_v2(const TAlterTabletReqV2& request) { +LOG(INFO) << "begin to validate alter tablet request. base_tablet_id=" << request.base_tablet_id + << ", base_schema_hash" << request.base_schema_hash + << ", new_tablet_id=" << request.new_tablet_id + << ", new_schema_hash=" << request.new_schema_hash + << ", alter_version=" << request.alter_version + << ", alter_version_hash=" << request.alter_version_hash; + +// Lock schema_change_lock util schema change info is stored in tablet header +if (!StorageEngine::instance()->tablet_manager()->try_schema_change_lock(request.base_tablet_id)) { +LOG(WARNING) << "failed to obtain schema change lock. " + << "base_tablet=" << request.base_tablet_id; +return OLAP_ERR_TRY_LOCK_FAILED; +} + +OLAPStatus res = _do_process_alter_tablet_v2(request); +LOG(INFO) << "finished alter tablet process, res=" << res; + StorageEngine::instance()->tablet_manager()->release_schema_change_lock(request.base_tablet_id); +return res; +} + +OLAPStatus SchemaChangeHandler::_do_process_alter_tablet_v2(const TAlterTabletReqV2& request) { Review comment: Add some comment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] yiguolei commented on a change in pull request #1497: Add ALTER_TABLET task in be
yiguolei commented on a change in pull request #1497: Add ALTER_TABLET task in be URL: https://github.com/apache/incubator-doris/pull/1497#discussion_r306107360 ## File path: be/src/olap/schema_change.cpp ## @@ -1121,6 +1121,207 @@ bool SchemaChangeWithSorting::_external_sorting( return true; } +OLAPStatus SchemaChangeHandler::process_alter_tablet_v2(const TAlterTabletReqV2& request) { +LOG(INFO) << "begin to validate alter tablet request. base_tablet_id=" << request.base_tablet_id Review comment: Should not in _do_process_alter_Tablet_v2() because the request maybe failed during acquire lock. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] yiguolei commented on a change in pull request #1497: Add ALTER_TABLET task in be
yiguolei commented on a change in pull request #1497: Add ALTER_TABLET task in be URL: https://github.com/apache/incubator-doris/pull/1497#discussion_r306107791 ## File path: be/src/olap/schema_change.cpp ## @@ -1121,6 +1121,207 @@ bool SchemaChangeWithSorting::_external_sorting( return true; } +OLAPStatus SchemaChangeHandler::process_alter_tablet_v2(const TAlterTabletReqV2& request) { +LOG(INFO) << "begin to validate alter tablet request. base_tablet_id=" << request.base_tablet_id + << ", base_schema_hash" << request.base_schema_hash + << ", new_tablet_id=" << request.new_tablet_id + << ", new_schema_hash=" << request.new_schema_hash + << ", alter_version=" << request.alter_version + << ", alter_version_hash=" << request.alter_version_hash; + +// Lock schema_change_lock util schema change info is stored in tablet header +if (!StorageEngine::instance()->tablet_manager()->try_schema_change_lock(request.base_tablet_id)) { +LOG(WARNING) << "failed to obtain schema change lock. " + << "base_tablet=" << request.base_tablet_id; +return OLAP_ERR_TRY_LOCK_FAILED; +} + +OLAPStatus res = _do_process_alter_tablet_v2(request); +LOG(INFO) << "finished alter tablet process, res=" << res; + StorageEngine::instance()->tablet_manager()->release_schema_change_lock(request.base_tablet_id); +return res; +} + +OLAPStatus SchemaChangeHandler::_do_process_alter_tablet_v2(const TAlterTabletReqV2& request) { +OLAPStatus res = OLAP_SUCCESS; +TabletSharedPtr base_tablet = StorageEngine::instance()->tablet_manager()->get_tablet( +request.base_tablet_id, request.base_schema_hash); +if (base_tablet == nullptr) { +LOG(WARNING) << "fail to find base tablet. base_tablet=" << request.base_tablet_id + << ", base_schema_hash=" << request.base_schema_hash; +return OLAP_ERR_TABLE_NOT_FOUND; +} + +// new tablet has to exist +TabletSharedPtr new_tablet = StorageEngine::instance()->tablet_manager()->get_tablet( +request.new_tablet_id, request.new_schema_hash); +if (new_tablet == nullptr) { +LOG(WARNING) << "fail to find new tablet." + << ", new_tablet=" << request.new_tablet_id + << ", new_schema_hash=" << request.new_schema_hash; +return OLAP_ERR_TABLE_NOT_FOUND; +} + +// check if tablet's state is not_ready, if it is ready, it means the tablet already finished +// check whether the tablet's max continuous version == request.version +if (new_tablet->tablet_state() != TABLET_NOTREADY) { +res = _validate_alter_result(new_tablet, request); +LOG(INFO) << "tablet's state=" << new_tablet->tablet_state() + << " the convert job alreay finished, check its version" + << " res=" << res; +return res; +} + +LOG(INFO) << "finish to validate alter tablet request. begin to convert data from base tablet to new tablet" + << " base_tablet=" << base_tablet->full_name() + << " new_tablet=" << new_tablet->full_name(); + +ReadLock base_migration_rlock(base_tablet->get_migration_lock_ptr(), TRY_LOCK); +if (!base_migration_rlock.own_lock()) { +return OLAP_ERR_RWLOCK_ERROR; +} +ReadLock new_migration_rlock(new_tablet->get_migration_lock_ptr(), TRY_LOCK); +if (!new_migration_rlock.own_lock()) { +return OLAP_ERR_RWLOCK_ERROR; +} + +// 2. Get version_to_be_changed and store into tablet header +base_tablet->obtain_push_lock(); +base_tablet->obtain_header_wrlock(); +new_tablet->obtain_header_wrlock(); Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] imay merged pull request #1505: Add timediff function
imay merged pull request #1505: Add timediff function URL: https://github.com/apache/incubator-doris/pull/1505 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org
[GitHub] [incubator-doris] morningman opened a new pull request #1530: Add more logs and metrics to trace the broker load process
morningman opened a new pull request #1530: Add more logs and metrics to trace the broker load process URL: https://github.com/apache/incubator-doris/pull/1530 1. The Operator wants to known when the job being scheduled as PENDING and LOADING. And how long it takes to finish these sub states. So I add log at the beginning of each state. 2. Add 2 metrics on BE to monitor the memtable's flush time. `memtable_flush_total` and `memtable_flush_duration_us` 3. Add a log when closing OlapTableSink, to record the add_batch() time of each Backend, cumulative. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org