[GitHub] [incubator-doris] wuyunfeng opened a new issue #1525: Expose data pruned-filter-scan ability for Integrating Spark and Doris

2019-07-22 Thread GitBox
wuyunfeng opened a new issue #1525: Expose data pruned-filter-scan ability for 
Integrating Spark and Doris
URL: https://github.com/apache/incubator-doris/issues/1525
 
 
   ### Background
   1. Recently years,  Machine Learning  intensively are integrated to 
computing system such as Spark MLib for Spark etc.  if Doris expose the data 
for Computing System such as Spark,  then we can make full use of the Computing 
System's MLib gains a competitive edge with state-of-the art Machine Learning 
on large data sets, collaborate, and productionize models at massive scale
   
   2. When Doris processing large query, if memory exceed the memory 
limitation, this query would fail, but Spark can write the intermediate result 
back to Disk gracefully, the Integration for Spark and Doris not only resolved 
this problem, but also Spark can take full advantage of the scan performance of 
the column-stride of Doris Storage Engine which can provide extra push-down 
filter ability save a lot of network IO .
   
   3. Nowadays, enterprise  store lots data on different Storage Service such 
as Mysql、Elasticsearch、Doris、HDFS、NFS、Table etc. They need a way to analyze all 
this data with conjunctive query,Spark already get through with almost these 
Service except Doris.
   
   ### How to realize
   
   1. Doris FE  is responsible for pruning the related tablet, decide which 
predicate can used to filter data, providing single node query plan fragment, 
then packed all those things and return client without scheduling these to 
backend instance in contrast to before. In this way, FE no longer  would 
coordinate the query lifecycle
   
   2. Doris BE is responsible for assembling all parameters from client, such 
as tabletIds、version、encoded plan fragment etc which mostly generated by `Doris 
FE `, and then execute this plan fragment instance, all the row results return 
by this plan fragment would be pushed to a attached blocking queue firstly,  
when client call `get_next` to iterate the result, fetch this batched result 
from  blocking queue and answer the client.
   
   ### Additional  API 
    Doris FE  HTTP Transport Protocol
   1.  GET Table Schema
   
   ```
   GET /{cluster}/{database}/{table}/_schema
   ```
   
   2. GET Query Plan
   
   ```
   POST /{cluster}/{database}/{table}/_query_plan
   {
 "sql": "select k2, k4 from table where k1 > 2 and k3 > 4"
   }
   ```
    Doris BE  Thrift Transport Protocol
   
   ```
   // scan service expose ability of scanning data ability to other compute 
system
   service **TDorisExternalService** {
   // doris will build  a scan context for this session, context_id 
returned if success
   TScanOpenResult open(1: TScanOpenParams params);
   
   // return the batch_size of data
   TScanBatchResult getNext(1: TScanNextBatchParams params);
   
   // release the context resource associated with the context_id
   TScanCloseResult close(1: TScanCloseParams params);
   }
   ```
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay commented on a change in pull request #1515: Add storage rowwise iterator

2019-07-22 Thread GitBox
imay commented on a change in pull request #1515: Add storage rowwise iterator
URL: https://github.com/apache/incubator-doris/pull/1515#discussion_r305700627
 
 

 ##
 File path: be/src/olap/generic_iterators.cpp
 ##
 @@ -0,0 +1,346 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "olap/iterators.h"
+
+#include 
+
+#include "olap/row_block2.h"
+#include "util/arena.h"
+
+namespace doris {
+
+// This iterator will generate ordered data. For example for schema
+// (int, int) this iterator will generator data like
+// (0, 1), (1, 2), (2, 4), (3, 4)...
+//
+// Usage:
+//  SchemaV2 schema;
+//  AutoIncrementIterator iter(schema, 1000);
+//  StorageReadOptions opts;
+//  RETURN_IF_ERROR(iter.init(opts));
+//  RowBlockV2 block;
+//  do {
+//  st = iter.next_batch(&block);
+//  } while (st.ok());
+class AutoIncrementIterator : public RowwiseIterator {
+public:
+// Will generate num_rows rows in total
+AutoIncrementIterator(const SchemaV2& schema, size_t num_rows)
+: _schema(schema), _num_rows(num_rows), _rows_returned(0) {
+}
+~AutoIncrementIterator() override { }
+
+// NOTE: Currently, this function will ignore StorageReadOptions
+Status init(const StorageReadOptions& opts) override;
+Status next_batch(RowBlockV2* block) override;
+
+const SchemaV2& schema() const override { return _schema; }
+private:
+SchemaV2 _schema;
+size_t _num_rows;
+size_t _rows_returned;
+};
+
+Status AutoIncrementIterator::init(const StorageReadOptions& opts) {
+return Status::OK();
+}
+
+Status AutoIncrementIterator::next_batch(RowBlockV2* block) {
+int row_idx = 0;
+while (row_idx < block->capacity() && _rows_returned < _num_rows) {
+RowBlockRow row = block->row(row_idx);
+
+for (int i = 0; i < _schema.column_schemas().size(); ++i) {
+row.set_is_null(i, false);
+auto& col_schema = _schema.column_schemas()[i];
+switch (col_schema.type()) {
+case OLAP_FIELD_TYPE_SMALLINT:
+*(int16_t*)row.cell_ptr(i) = _rows_returned + i;
+break;
+case OLAP_FIELD_TYPE_INT:
+*(int32_t*)row.cell_ptr(i) = _rows_returned + i;
+break;
+case OLAP_FIELD_TYPE_BIGINT:
+*(int64_t*)row.cell_ptr(i) = _rows_returned + i;
+break;
+case OLAP_FIELD_TYPE_FLOAT:
+*(float*)row.cell_ptr(i) = _rows_returned + i;
+break;
+case OLAP_FIELD_TYPE_DOUBLE:
+*(double*)row.cell_ptr(i) = _rows_returned + i;
+break;
+default:
+break;
+}
+}
+row_idx++;
+_rows_returned++;
+}
+block->resize(row_idx);
+if (row_idx > 0) {
+return Status::OK();
+}
+return Status::EndOfFile("End of AutoIncrementIterator");
+}
+
+// Used to store merge state for a MergeIterator input.
+// This class will iterate all data from internal iterator
+// through client call advance().
+// Usage:
+//  MergeContext ctx(iter);
+//  RETURN_IF_ERROR(ctx.init());
+//  while (ctx.valid()) {
+//  visit(ctx.current_row());
+//  RETURN_IF_ERROR(ctx.advance());
+//  }
+class MergeContext {
+public:
+// This class don't take iter's ownership, client should delete it
+MergeContext(RowwiseIterator* iter)
+: _iter(iter), _block(iter->schema(), 1024, &_arena) {
+}
+
+// Intialize this context and will prepare data for current_row()
+Status init(const StorageReadOptions& opts);
+
+// Return current row which internal row index points to
+// And this function won't make internal index advance.
+// Before call this function, Client must assure that
+// valid() return true
+RowBlockRow current_row() const {
+return RowBlockRow(&_block, _index_in_block);
+}
+
+// Advance internal row index to next valid row
+// Return error if error happens
+// Don't call this when valid() is false, action is undefined
+Status advance();
+
+// Return if has remaining data in this context.
+// Only when

[GitHub] [incubator-doris] imay commented on a change in pull request #1515: Add storage rowwise iterator

2019-07-22 Thread GitBox
imay commented on a change in pull request #1515: Add storage rowwise iterator
URL: https://github.com/apache/incubator-doris/pull/1515#discussion_r305700583
 
 

 ##
 File path: be/src/olap/generic_iterators.cpp
 ##
 @@ -0,0 +1,346 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "olap/iterators.h"
+
+#include 
+
+#include "olap/row_block2.h"
+#include "util/arena.h"
+
+namespace doris {
+
+// This iterator will generate ordered data. For example for schema
+// (int, int) this iterator will generator data like
+// (0, 1), (1, 2), (2, 4), (3, 4)...
+//
+// Usage:
+//  SchemaV2 schema;
+//  AutoIncrementIterator iter(schema, 1000);
+//  StorageReadOptions opts;
+//  RETURN_IF_ERROR(iter.init(opts));
+//  RowBlockV2 block;
+//  do {
+//  st = iter.next_batch(&block);
+//  } while (st.ok());
+class AutoIncrementIterator : public RowwiseIterator {
+public:
+// Will generate num_rows rows in total
+AutoIncrementIterator(const SchemaV2& schema, size_t num_rows)
+: _schema(schema), _num_rows(num_rows), _rows_returned(0) {
+}
+~AutoIncrementIterator() override { }
+
+// NOTE: Currently, this function will ignore StorageReadOptions
+Status init(const StorageReadOptions& opts) override;
+Status next_batch(RowBlockV2* block) override;
+
+const SchemaV2& schema() const override { return _schema; }
+private:
+SchemaV2 _schema;
+size_t _num_rows;
+size_t _rows_returned;
+};
+
+Status AutoIncrementIterator::init(const StorageReadOptions& opts) {
+return Status::OK();
+}
+
+Status AutoIncrementIterator::next_batch(RowBlockV2* block) {
+int row_idx = 0;
+while (row_idx < block->capacity() && _rows_returned < _num_rows) {
+RowBlockRow row = block->row(row_idx);
+
+for (int i = 0; i < _schema.column_schemas().size(); ++i) {
+row.set_is_null(i, false);
+auto& col_schema = _schema.column_schemas()[i];
+switch (col_schema.type()) {
+case OLAP_FIELD_TYPE_SMALLINT:
+*(int16_t*)row.cell_ptr(i) = _rows_returned + i;
+break;
+case OLAP_FIELD_TYPE_INT:
+*(int32_t*)row.cell_ptr(i) = _rows_returned + i;
+break;
+case OLAP_FIELD_TYPE_BIGINT:
+*(int64_t*)row.cell_ptr(i) = _rows_returned + i;
+break;
+case OLAP_FIELD_TYPE_FLOAT:
+*(float*)row.cell_ptr(i) = _rows_returned + i;
+break;
+case OLAP_FIELD_TYPE_DOUBLE:
+*(double*)row.cell_ptr(i) = _rows_returned + i;
+break;
+default:
+break;
+}
+}
+row_idx++;
+_rows_returned++;
+}
+block->resize(row_idx);
+if (row_idx > 0) {
+return Status::OK();
+}
+return Status::EndOfFile("End of AutoIncrementIterator");
+}
+
+// Used to store merge state for a MergeIterator input.
+// This class will iterate all data from internal iterator
+// through client call advance().
+// Usage:
+//  MergeContext ctx(iter);
+//  RETURN_IF_ERROR(ctx.init());
+//  while (ctx.valid()) {
+//  visit(ctx.current_row());
+//  RETURN_IF_ERROR(ctx.advance());
+//  }
+class MergeContext {
+public:
+// This class don't take iter's ownership, client should delete it
+MergeContext(RowwiseIterator* iter)
+: _iter(iter), _block(iter->schema(), 1024, &_arena) {
+}
+
+// Intialize this context and will prepare data for current_row()
+Status init(const StorageReadOptions& opts);
+
+// Return current row which internal row index points to
+// And this function won't make internal index advance.
+// Before call this function, Client must assure that
+// valid() return true
+RowBlockRow current_row() const {
+return RowBlockRow(&_block, _index_in_block);
+}
+
+// Advance internal row index to next valid row
+// Return error if error happens
+// Don't call this when valid() is false, action is undefined
+Status advance();
+
+// Return if has remaining data in this context.
+// Only when

[GitHub] [incubator-doris] imay commented on a change in pull request #1515: Add storage rowwise iterator

2019-07-22 Thread GitBox
imay commented on a change in pull request #1515: Add storage rowwise iterator
URL: https://github.com/apache/incubator-doris/pull/1515#discussion_r305701273
 
 

 ##
 File path: be/src/olap/row_block2.h
 ##
 @@ -0,0 +1,113 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "olap/column_block.h"
+#include "olap/schema2.h"
+#include "olap/types.h"
+
+namespace doris {
+
+class Arena;
+class RowCursor;
+
+// This struct contains a block of rows, in which each column's data is stored
+// in a vector. We don't use VectorizedRowBatch because it doesn't own the data
+// in block, however it is used by old code, which we don't want to change.
+class RowBlockV2 {
+public:
+RowBlockV2(const SchemaV2& schema, uint16_t capacity, Arena* arena);
+~RowBlockV2();
+
+void resize(size_t num_rows) { _num_rows = num_rows; }
+size_t num_rows() const { return _num_rows; }
+size_t capacity() const { return _capacity; }
+Arena* arena() const { return _arena; }
+
+// Copy the row_idx row's data into given row_cursor.
+// This function will use shallow copy, so the client should
+// notice the life time of returned value
+Status copy_to_row_cursor(size_t row_idx, RowCursor* row_cursor);
+
+// Get column block for input column index. This input is the index in
+// this row block, is not the index in table's schema
+ColumnBlock column_block(size_t col_idx) const {
+const TypeInfo* type_info = _schema.column(col_idx).type_info();
+uint8_t* data = _column_datas[col_idx];
+uint8_t* null_bitmap = _column_null_bitmaps[col_idx];
+return ColumnBlock(type_info, data, null_bitmap, _arena);
+}
+
+RowBlockRow row(size_t row_idx) const;
+
+const SchemaV2& schema() const { return _schema; }
+
+private:
+SchemaV2 _schema;
+std::vector _column_datas;
+std::vector _column_null_bitmaps;
+size_t _capacity;
+size_t _num_rows;
+Arena* _arena;
 
 Review comment:
   arena is used to allocate variable length column, like varchar


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay commented on a change in pull request #1515: Add storage rowwise iterator

2019-07-22 Thread GitBox
imay commented on a change in pull request #1515: Add storage rowwise iterator
URL: https://github.com/apache/incubator-doris/pull/1515#discussion_r305702566
 
 

 ##
 File path: be/src/olap/generic_iterators.cpp
 ##
 @@ -0,0 +1,346 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "olap/iterators.h"
+
+#include 
+
+#include "olap/row_block2.h"
+#include "util/arena.h"
+
+namespace doris {
+
+// This iterator will generate ordered data. For example for schema
+// (int, int) this iterator will generator data like
+// (0, 1), (1, 2), (2, 4), (3, 4)...
+//
+// Usage:
+//  SchemaV2 schema;
+//  AutoIncrementIterator iter(schema, 1000);
+//  StorageReadOptions opts;
+//  RETURN_IF_ERROR(iter.init(opts));
+//  RowBlockV2 block;
+//  do {
+//  st = iter.next_batch(&block);
+//  } while (st.ok());
+class AutoIncrementIterator : public RowwiseIterator {
+public:
+// Will generate num_rows rows in total
+AutoIncrementIterator(const SchemaV2& schema, size_t num_rows)
+: _schema(schema), _num_rows(num_rows), _rows_returned(0) {
+}
+~AutoIncrementIterator() override { }
+
+// NOTE: Currently, this function will ignore StorageReadOptions
+Status init(const StorageReadOptions& opts) override;
+Status next_batch(RowBlockV2* block) override;
+
+const SchemaV2& schema() const override { return _schema; }
+private:
+SchemaV2 _schema;
+size_t _num_rows;
+size_t _rows_returned;
+};
+
+Status AutoIncrementIterator::init(const StorageReadOptions& opts) {
+return Status::OK();
+}
+
+Status AutoIncrementIterator::next_batch(RowBlockV2* block) {
+int row_idx = 0;
+while (row_idx < block->capacity() && _rows_returned < _num_rows) {
+RowBlockRow row = block->row(row_idx);
+
+for (int i = 0; i < _schema.column_schemas().size(); ++i) {
+row.set_is_null(i, false);
+auto& col_schema = _schema.column_schemas()[i];
+switch (col_schema.type()) {
+case OLAP_FIELD_TYPE_SMALLINT:
+*(int16_t*)row.cell_ptr(i) = _rows_returned + i;
+break;
+case OLAP_FIELD_TYPE_INT:
+*(int32_t*)row.cell_ptr(i) = _rows_returned + i;
+break;
+case OLAP_FIELD_TYPE_BIGINT:
+*(int64_t*)row.cell_ptr(i) = _rows_returned + i;
+break;
+case OLAP_FIELD_TYPE_FLOAT:
+*(float*)row.cell_ptr(i) = _rows_returned + i;
+break;
+case OLAP_FIELD_TYPE_DOUBLE:
+*(double*)row.cell_ptr(i) = _rows_returned + i;
+break;
+default:
+break;
+}
+}
+row_idx++;
+_rows_returned++;
+}
+block->resize(row_idx);
+if (row_idx > 0) {
+return Status::OK();
+}
+return Status::EndOfFile("End of AutoIncrementIterator");
+}
+
+// Used to store merge state for a MergeIterator input.
+// This class will iterate all data from internal iterator
+// through client call advance().
+// Usage:
+//  MergeContext ctx(iter);
+//  RETURN_IF_ERROR(ctx.init());
+//  while (ctx.valid()) {
+//  visit(ctx.current_row());
+//  RETURN_IF_ERROR(ctx.advance());
+//  }
+class MergeContext {
+public:
+// This class don't take iter's ownership, client should delete it
+MergeContext(RowwiseIterator* iter)
+: _iter(iter), _block(iter->schema(), 1024, &_arena) {
+}
+
+// Intialize this context and will prepare data for current_row()
+Status init(const StorageReadOptions& opts);
+
+// Return current row which internal row index points to
+// And this function won't make internal index advance.
+// Before call this function, Client must assure that
+// valid() return true
+RowBlockRow current_row() const {
+return RowBlockRow(&_block, _index_in_block);
+}
+
+// Advance internal row index to next valid row
+// Return error if error happens
+// Don't call this when valid() is false, action is undefined
+Status advance();
+
+// Return if has remaining data in this context.
+// Only when

[GitHub] [incubator-doris] imay commented on a change in pull request #1505: Add timediff function

2019-07-22 Thread GitBox
imay commented on a change in pull request #1505: Add timediff function
URL: https://github.com/apache/incubator-doris/pull/1505#discussion_r305703865
 
 

 ##
 File path: be/src/exprs/literal.cpp
 ##
 @@ -87,6 +87,11 @@ Literal::Literal(const TExprNode& node) :
 _value.datetime_val.from_date_str(
 node.date_literal.value.c_str(), node.date_literal.value.size());
 break;
+case TYPE_TIME:
 
 Review comment:
   reuse TYPE_DOUBLE


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay commented on a change in pull request #1505: Add timediff function

2019-07-22 Thread GitBox
imay commented on a change in pull request #1505: Add timediff function
URL: https://github.com/apache/incubator-doris/pull/1505#discussion_r305705377
 
 

 ##
 File path: be/src/runtime/result_writer.cpp
 ##
 @@ -104,6 +104,39 @@ Status ResultWriter::add_one_row(TupleRow* row) {
 buf_ret = _row_buffer->push_double(*static_cast(item));
 break;
 
+case TYPE_TIME: {
 
 Review comment:
   you can write a function to do this to avoid write this many times


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay commented on a change in pull request #1505: Add timediff function

2019-07-22 Thread GitBox
imay commented on a change in pull request #1505: Add timediff function
URL: https://github.com/apache/incubator-doris/pull/1505#discussion_r305705198
 
 

 ##
 File path: be/src/runtime/raw_value.cpp
 ##
 @@ -267,6 +267,7 @@ void RawValue::write(const void* value, void* dst, const 
TypeDescriptor& type, M
 break;
 }
 
+case TYPE_TIME:
 
 Review comment:
   I think this should be the same as TYPE_DOUBLE


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay commented on a change in pull request #1505: Add timediff function

2019-07-22 Thread GitBox
imay commented on a change in pull request #1505: Add timediff function
URL: https://github.com/apache/incubator-doris/pull/1505#discussion_r305704769
 
 

 ##
 File path: be/src/exprs/time_operators.h
 ##
 @@ -0,0 +1,48 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef DORIS_BE_SRC_EXPRS_TIME_OPERATORS_H
+#define DORIS_BE_SRC_EXPRS_TIME_OPERATORS_H
+
+#include 
+#include "udf/udf.h"
+
+namespace doris {
+class Expr;
+struct ExprValue;
+class TupleRow;
+
+/// Implementation of the time operators. These include the cast,
+/// arithmetic and binary operators.
+class TimeOperators {
+public:
+static void init();
+
+static BooleanVal cast_to_boolean_val(FunctionContext*, const DoubleVal&);
 
 Review comment:
   I think we can only support time to double and to string


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay commented on a change in pull request #1505: Add timediff function

2019-07-22 Thread GitBox
imay commented on a change in pull request #1505: Add timediff function
URL: https://github.com/apache/incubator-doris/pull/1505#discussion_r305705811
 
 

 ##
 File path: fe/src/main/java/org/apache/doris/catalog/Function.java
 ##
 @@ -480,6 +481,7 @@ public static String getUdfType(PrimitiveType t) {
 case INT:
 return "IntVal";
 case BIGINT:
+case TIME:
 
 Review comment:
   why BIGINT?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay commented on a change in pull request #1505: Add timediff function

2019-07-22 Thread GitBox
imay commented on a change in pull request #1505: Add timediff function
URL: https://github.com/apache/incubator-doris/pull/1505#discussion_r305707087
 
 

 ##
 File path: fe/src/main/java/org/apache/doris/catalog/PrimitiveType.java
 ##
 @@ -480,6 +486,7 @@ public static boolean isImplicitCast(PrimitiveType type, 
PrimitiveType target) {
 compatibilityMatrix[DECIMALV2.ordinal()][DECIMAL.ordinal()] = 
DECIMALV2;
 
 compatibilityMatrix[HLL.ordinal()][HLL.ordinal()] = HLL;
+compatibilityMatrix[TIME.ordinal()][TIME.ordinal()] = TIME;
 
 Review comment:
   you should add more relation between time and other types


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] wuyunfeng opened a new pull request #1526: Add spark-doris-connector overview

2019-07-22 Thread GitBox
wuyunfeng opened a new pull request #1526: Add spark-doris-connector overview
URL: https://github.com/apache/incubator-doris/pull/1526
 
 
   spark-doris-connector architecture overview


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] wuyunfeng commented on issue #1526: Add spark-doris-connector overview

2019-07-22 Thread GitBox
wuyunfeng commented on issue #1526: Add spark-doris-connector overview
URL: https://github.com/apache/incubator-doris/pull/1526#issuecomment-513680135
 
 
   [architecture](https://github.com/apache/incubator-doris/issues/1525)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] wuyunfeng opened a new pull request #1527: Expose data pruned-filter-scan ability

2019-07-22 Thread GitBox
wuyunfeng opened a new pull request #1527: Expose data pruned-filter-scan 
ability
URL: https://github.com/apache/incubator-doris/pull/1527
 
 
   This PR is related with  
[Spark-Doris-Connector](https://github.com/apache/incubator-doris/issues/1525).
   
   The First Step is Expose data pruned-filter-scan ability, then implement the 
Spark-Doris-Connector on Spark side. below is the tremendous detailed design
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] kangpinghuang commented on a change in pull request #1409: Add dict page

2019-07-22 Thread GitBox
kangpinghuang commented on a change in pull request #1409: Add dict page
URL: https://github.com/apache/incubator-doris/pull/1409#discussion_r305778667
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/binary_dict_page.cpp
 ##
 @@ -0,0 +1,231 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "olap/rowset/segment_v2/binary_dict_page.h"
+
+#include "util/slice.h" // for Slice
+#include "gutil/strings/substitute.h" // for Substitute
+#include "runtime/mem_pool.h"
+
+#include "olap/rowset/segment_v2/bitshuffle_page.h"
+#include "olap/rowset/segment_v2/rle_page.h"
+
+namespace doris {
+namespace segment_v2 {
+
+using strings::Substitute;
+
+BinaryDictPageBuilder::BinaryDictPageBuilder(const PageBuilderOptions& 
options) :
+_options(options),
+_finished(false),
+_data_page_builder(nullptr),
+_dict_builder(nullptr),
+_encoding_type(DICT_ENCODING) {
+// initially use DICT_ENCODING
+// TODO: the data page builder type can be created by Factory according to 
user config
+_data_page_builder.reset(new 
BitshufflePageBuilder(options));
+PageBuilderOptions dict_builder_options;
+dict_builder_options.data_page_size = _options.dict_page_size;
+_dict_builder.reset(new BinaryPlainPageBuilder(dict_builder_options));
+reset();
+}
+
+bool BinaryDictPageBuilder::is_page_full() {
+if (_data_page_builder->is_page_full()) {
+return true;
+}
+if (_encoding_type == DICT_ENCODING && _dict_builder->is_page_full()) {
+return true;
+}
+return false;
+}
+
+Status BinaryDictPageBuilder::add(const uint8_t* vals, size_t* count) {
+if (_encoding_type == DICT_ENCODING) {
+DCHECK(!_finished);
+DCHECK_GT(*count, 0);
+const Slice* src = reinterpret_cast(vals);
+size_t num_added = 0;
+for (int i = 0; i < *count; ++i, ++src) {
+auto ret = _dictionary.find(*src);
+size_t add_count = 1;
+if (ret != _dictionary.end()) {
+uint32_t value_code = ret->second;
+RETURN_IF_ERROR(_data_page_builder->add(reinterpret_cast(&value_code), &add_count));
+num_added += add_count;
+if (add_count == 0) {
+// current data page is full, stop processing remaining 
inputs
+break;
+}
+} else {
+if (_dict_builder->is_page_full()) {
+break;
+}
+char* item_mem = _arena.Allocate(src->size);
+if (item_mem == nullptr) {
+return Status::Corruption(Substitute("memory allocate 
failed, size:$0", src->size));
+}
+Slice dict_item(src->data, src->size);
+dict_item.relocate(item_mem);
+uint32_t value_code = _dictionary.size();
+_dictionary.insert({dict_item, value_code});
+_dict_items.push_back(dict_item);
+_dict_builder->update_prepared_size(dict_item.size);
+RETURN_IF_ERROR(_data_page_builder->add(reinterpret_cast(&value_code), &add_count));
+if (add_count == 0) {
+// current data page is full, stop processing remaining 
inputs
+break;
+}
+num_added += 1;
+}
+}
+*count = num_added;
+return Status::OK();
+} else {
+DCHECK_EQ(_encoding_type, PLAIN_ENCODING);
+return _data_page_builder->add(vals, count);
+}
+}
+
+Slice BinaryDictPageBuilder::finish() {
+_finished = true;
+
+Slice data_slice = _data_page_builder->finish();
+_buffer.append(data_slice.data, data_slice.size);
+encode_fixed32_le(&_buffer[0], _encoding_type);
+return Slice(_buffer.data(), _buffer.size());
+}
+
+void BinaryDictPageBuilder::reset() {
+_finished = false;
+_buffer.clear();
+_buffer.resize(BINARY_DICT_PAGE_HEADER_SIZE);
+_buffer.reserve(_options.data_page_size);
+
+if (_encoding_type == DICT_ENCODING
+&& _dict_builder->is_page_full()) {
+_data_page_builder.reset(new BinaryPlainPageBuilder(_options));
+   

[GitHub] [incubator-doris] kangpinghuang commented on a change in pull request #1409: Add dict page

2019-07-22 Thread GitBox
kangpinghuang commented on a change in pull request #1409: Add dict page
URL: https://github.com/apache/incubator-doris/pull/1409#discussion_r305779712
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/binary_dict_page.cpp
 ##
 @@ -0,0 +1,231 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "olap/rowset/segment_v2/binary_dict_page.h"
+
+#include "util/slice.h" // for Slice
+#include "gutil/strings/substitute.h" // for Substitute
+#include "runtime/mem_pool.h"
+
+#include "olap/rowset/segment_v2/bitshuffle_page.h"
+#include "olap/rowset/segment_v2/rle_page.h"
+
+namespace doris {
+namespace segment_v2 {
+
+using strings::Substitute;
+
+BinaryDictPageBuilder::BinaryDictPageBuilder(const PageBuilderOptions& 
options) :
+_options(options),
+_finished(false),
+_data_page_builder(nullptr),
+_dict_builder(nullptr),
+_encoding_type(DICT_ENCODING) {
+// initially use DICT_ENCODING
+// TODO: the data page builder type can be created by Factory according to 
user config
+_data_page_builder.reset(new 
BitshufflePageBuilder(options));
+PageBuilderOptions dict_builder_options;
+dict_builder_options.data_page_size = _options.dict_page_size;
+_dict_builder.reset(new BinaryPlainPageBuilder(dict_builder_options));
+reset();
+}
+
+bool BinaryDictPageBuilder::is_page_full() {
+if (_data_page_builder->is_page_full()) {
+return true;
+}
+if (_encoding_type == DICT_ENCODING && _dict_builder->is_page_full()) {
+return true;
+}
+return false;
+}
+
+Status BinaryDictPageBuilder::add(const uint8_t* vals, size_t* count) {
+if (_encoding_type == DICT_ENCODING) {
+DCHECK(!_finished);
+DCHECK_GT(*count, 0);
+const Slice* src = reinterpret_cast(vals);
+size_t num_added = 0;
+for (int i = 0; i < *count; ++i, ++src) {
+auto ret = _dictionary.find(*src);
+size_t add_count = 1;
+if (ret != _dictionary.end()) {
+uint32_t value_code = ret->second;
+RETURN_IF_ERROR(_data_page_builder->add(reinterpret_cast(&value_code), &add_count));
+num_added += add_count;
+if (add_count == 0) {
+// current data page is full, stop processing remaining 
inputs
+break;
+}
+} else {
+if (_dict_builder->is_page_full()) {
+break;
+}
+char* item_mem = _arena.Allocate(src->size);
+if (item_mem == nullptr) {
+return Status::Corruption(Substitute("memory allocate 
failed, size:$0", src->size));
+}
+Slice dict_item(src->data, src->size);
+dict_item.relocate(item_mem);
+uint32_t value_code = _dictionary.size();
+_dictionary.insert({dict_item, value_code});
+_dict_items.push_back(dict_item);
+_dict_builder->update_prepared_size(dict_item.size);
+RETURN_IF_ERROR(_data_page_builder->add(reinterpret_cast(&value_code), &add_count));
+if (add_count == 0) {
+// current data page is full, stop processing remaining 
inputs
+break;
+}
+num_added += 1;
+}
+}
+*count = num_added;
+return Status::OK();
+} else {
+DCHECK_EQ(_encoding_type, PLAIN_ENCODING);
+return _data_page_builder->add(vals, count);
+}
+}
+
+Slice BinaryDictPageBuilder::finish() {
+_finished = true;
+
+Slice data_slice = _data_page_builder->finish();
+_buffer.append(data_slice.data, data_slice.size);
+encode_fixed32_le(&_buffer[0], _encoding_type);
+return Slice(_buffer.data(), _buffer.size());
+}
+
+void BinaryDictPageBuilder::reset() {
+_finished = false;
+_buffer.clear();
+_buffer.resize(BINARY_DICT_PAGE_HEADER_SIZE);
+_buffer.reserve(_options.data_page_size);
+
+if (_encoding_type == DICT_ENCODING
+&& _dict_builder->is_page_full()) {
+_data_page_builder.reset(new BinaryPlainPageBuilder(_options));
+   

[GitHub] [incubator-doris] kangpinghuang commented on a change in pull request #1409: Add dict page

2019-07-22 Thread GitBox
kangpinghuang commented on a change in pull request #1409: Add dict page
URL: https://github.com/apache/incubator-doris/pull/1409#discussion_r305781309
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/binary_dict_page.cpp
 ##
 @@ -0,0 +1,231 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "olap/rowset/segment_v2/binary_dict_page.h"
+
+#include "util/slice.h" // for Slice
+#include "gutil/strings/substitute.h" // for Substitute
+#include "runtime/mem_pool.h"
+
+#include "olap/rowset/segment_v2/bitshuffle_page.h"
+#include "olap/rowset/segment_v2/rle_page.h"
+
+namespace doris {
+namespace segment_v2 {
+
+using strings::Substitute;
+
+BinaryDictPageBuilder::BinaryDictPageBuilder(const PageBuilderOptions& 
options) :
+_options(options),
+_finished(false),
+_data_page_builder(nullptr),
+_dict_builder(nullptr),
+_encoding_type(DICT_ENCODING) {
+// initially use DICT_ENCODING
+// TODO: the data page builder type can be created by Factory according to 
user config
+_data_page_builder.reset(new 
BitshufflePageBuilder(options));
+PageBuilderOptions dict_builder_options;
+dict_builder_options.data_page_size = _options.dict_page_size;
+_dict_builder.reset(new BinaryPlainPageBuilder(dict_builder_options));
+reset();
+}
+
+bool BinaryDictPageBuilder::is_page_full() {
+if (_data_page_builder->is_page_full()) {
+return true;
+}
+if (_encoding_type == DICT_ENCODING && _dict_builder->is_page_full()) {
+return true;
+}
+return false;
+}
+
+Status BinaryDictPageBuilder::add(const uint8_t* vals, size_t* count) {
+if (_encoding_type == DICT_ENCODING) {
+DCHECK(!_finished);
+DCHECK_GT(*count, 0);
+const Slice* src = reinterpret_cast(vals);
+size_t num_added = 0;
+for (int i = 0; i < *count; ++i, ++src) {
+auto ret = _dictionary.find(*src);
+size_t add_count = 1;
+if (ret != _dictionary.end()) {
+uint32_t value_code = ret->second;
+RETURN_IF_ERROR(_data_page_builder->add(reinterpret_cast(&value_code), &add_count));
+num_added += add_count;
+if (add_count == 0) {
+// current data page is full, stop processing remaining 
inputs
+break;
+}
+} else {
+if (_dict_builder->is_page_full()) {
+break;
+}
+char* item_mem = _arena.Allocate(src->size);
+if (item_mem == nullptr) {
+return Status::Corruption(Substitute("memory allocate 
failed, size:$0", src->size));
+}
+Slice dict_item(src->data, src->size);
+dict_item.relocate(item_mem);
+uint32_t value_code = _dictionary.size();
+_dictionary.insert({dict_item, value_code});
+_dict_items.push_back(dict_item);
+_dict_builder->update_prepared_size(dict_item.size);
+RETURN_IF_ERROR(_data_page_builder->add(reinterpret_cast(&value_code), &add_count));
+if (add_count == 0) {
+// current data page is full, stop processing remaining 
inputs
+break;
+}
+num_added += 1;
+}
+}
+*count = num_added;
+return Status::OK();
+} else {
+DCHECK_EQ(_encoding_type, PLAIN_ENCODING);
+return _data_page_builder->add(vals, count);
+}
+}
+
+Slice BinaryDictPageBuilder::finish() {
+_finished = true;
+
+Slice data_slice = _data_page_builder->finish();
+_buffer.append(data_slice.data, data_slice.size);
+encode_fixed32_le(&_buffer[0], _encoding_type);
+return Slice(_buffer.data(), _buffer.size());
+}
+
+void BinaryDictPageBuilder::reset() {
+_finished = false;
+_buffer.clear();
+_buffer.resize(BINARY_DICT_PAGE_HEADER_SIZE);
+_buffer.reserve(_options.data_page_size);
 
 Review comment:
   ok


This is an automated message from the Apache Git Service.
To respond to the messag

[GitHub] [incubator-doris] wuyunfeng closed pull request #1527: Expose data pruned-filter-scan ability

2019-07-22 Thread GitBox
wuyunfeng closed pull request #1527: Expose data pruned-filter-scan ability
URL: https://github.com/apache/incubator-doris/pull/1527
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] wuyunfeng opened a new pull request #1527: Expose data pruned-filter-scan ability

2019-07-22 Thread GitBox
wuyunfeng opened a new pull request #1527: Expose data pruned-filter-scan 
ability
URL: https://github.com/apache/incubator-doris/pull/1527
 
 
   This PR is related with  
[Spark-Doris-Connector](https://github.com/apache/incubator-doris/issues/1525).
   
   The First Step is Expose data pruned-filter-scan ability, then implement the 
Spark-Doris-Connector on Spark side. below is the tremendous detailed design
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] morningman opened a new pull request #1528: Fix bug that user with LOAD_PRIV can see load job by SHOW LOAD stmt

2019-07-22 Thread GitBox
morningman opened a new pull request #1528: Fix bug that user with LOAD_PRIV 
can see load job by SHOW LOAD stmt
URL: https://github.com/apache/incubator-doris/pull/1528
 
 
   User should has LOAD_PRIV to use SHOW LOAD stmt, not SHOW_PRIV.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



Expose data pruned-filter-scan ability for Integrating Spark and Doris

2019-07-22 Thread B哥
Hi, all Doris Commiter


Recently I will create a pull request for Expose data pruned-filter-scan 
ability for Integrating Spark and Doris.  Please propose this proposal, thanks.


The First Step is Expose data pruned-filter-scan ability, then implement the 
Spark-Doris-Connector on Spark side. below is the tremendous detailed design.


Outline

Doris for Apache Spark (Spark-Doris-Connector) allowing Spark tasks to use it 
as a library and interact with Doris through Doris for Apache Spark APIs

Background

Recently years, Machine Learning intensively are integrated to computing system 
such as Spark MLib for Spark etc. if Doris expose the data for Computing System 
such as Spark, then we can make full use of the Computing System's MLib gains a 
competitive edge with state-of-the art Machine Learning on large data sets, 
collaborate, and productionize models at massive scale

When Doris processing large query, if memory exceed the memory limitation, this 
query would fail, but Spark can write the intermediate result back to Disk 
gracefully, the Integration for Spark and Doris not only resolved this problem, 
but also Spark can take full advantage of the scan performance of the 
column-stride of Doris Storage Engine which can provide extra push-down filter 
ability save a lot of network IO .

Nowadays, enterprise store lots data on different Storage Service such as 
Mysql、Elasticsearch、Doris、HDFS、NFS、Table etc. They need a way to analyze all 
this data with conjunctive query,Spark already get through with almost these 
Service except Doris. If build the bridge for Spark and Doris, Spark can 
provided unified data manipulation platform for this scenario

How to realize

Doris FE is responsible for pruning the related tablet, decide which predicate 
can used to filter data, providing single node query plan fragment, then packed 
all those things and return client without scheduling these to backend instance 
in contrast to before. In this way, FE no longer would coordinate the query 
lifecycle

Doris BE is responsible for assembling all parameters from client, such as 
tabletIds、version、encoded plan fragment etc which mostly generated by Doris FE 
, and then execute this plan fragment instance, all the row results return by 
this plan fragment would be pushed to a attached blocking queue firstly, when 
client call get_next to iterate the result, fetch this batched result from 
blocking queue and answer the client.

Architecture
https://github.com/wuyunfeng/incubator-doris/blob/docs/docs/resources/images/spark_doris_connector.jpg
WorkFlow

Spark ApplicationMaster fetch the Table schema from Doris FE

Spark ApplicationMaster get the Query Plan From Doris FE.

Spark ApplicationMaster transfer the Query Plan to all relevant Executor Worker

Spark Executor Worker open context from Doris BE with the generated QueryPlan

Spark Executor Worker iterate all data from Doris BE by get_next

Spark Executor Worker close context from Doris BE

Additional API
Doris FE HTTP Transport Protocol
GET Table Schema

Request:

GET /{cluster}/{database}/{table}/_schema


Response:

{
"status": 200,
"properties":{
"k1":{
"type":"SMALLINT",
"comment":"this is a small SMALLINT column"
},
"k2":{
"type":"LARGEINT",
"comment":"this is a small LARGEINT column"
},
"k3":{
"type":"FLOAT",
"comment":"this is a small FLOAT column"
},
"k4":{
"type":"CHAR",
"comment":"this is a small CHAR column"
}
}
}

GET Query Plan

Request:

POST /{cluster}/{database}/{table}/_query_plan
{
  "sql": "select k2, k4 from table where k1 > 2 and k3 > 4"
}


Response:

{
"status": 200,
"encrypted_plan_desc":"This is a base64 encrypted doris FE query plan",
"partitions":{
"{tabletId}":[
{
"host":"192.168.0.1",
"port":"9301"
},
{
"host":"192.168.0.2",
"port":"9301"
},
{
"host":"192.168.0.3",
"port":"9301"
}
],
"139":[
{
"host":"192.168.0.3",
"port":"9301"
},
{
"host":"192.168.0.4",
"port":"9301"
},
{
"host":"192.168.0.6",
"port":"9301"
}
],
"140":[
{
"host":"192.168.0.1",
"port":"9301"
},
{
"host":"192.168.0.2",
"port":"9301"
},
{
"host":"192.168.0.3",
"port":"9301"
}
]
}
}

Doris BE Thrift Transport Protocol

External Service:

// scan service expose ability of scanning data ability to other compute system
service **TDorisExternalService** 

[GitHub] [incubator-doris] imay closed pull request #1413: Add short key index builder, decoder, iterator

2019-07-22 Thread GitBox
imay closed pull request #1413: Add short key index builder, decoder, iterator
URL: https://github.com/apache/incubator-doris/pull/1413
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay merged pull request #1528: Fix bug that user with LOAD_PRIV can see load job by SHOW LOAD stmt

2019-07-22 Thread GitBox
imay merged pull request #1528: Fix bug that user with LOAD_PRIV can see load 
job by SHOW LOAD stmt
URL: https://github.com/apache/incubator-doris/pull/1528
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay opened a new pull request #1529: Refactor agg

2019-07-22 Thread GitBox
imay opened a new pull request #1529: Refactor agg
URL: https://github.com/apache/incubator-doris/pull/1529
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] gaodayue commented on a change in pull request #1409: Add dict page

2019-07-22 Thread GitBox
gaodayue commented on a change in pull request #1409: Add dict page
URL: https://github.com/apache/incubator-doris/pull/1409#discussion_r306099633
 
 

 ##
 File path: be/src/olap/rowset/segment_v2/bitshuffle_page.h
 ##
 @@ -170,11 +148,11 @@ class BitshufflePageBuilder : public PageBuilder {
 _data.push_back(0);
 }
 
-_buffer.resize(BITSHUFFLE_BLOCK_HEADER_SIZE +
+_buffer.resize(BITSHUFFLE_PAGE_HEADER_SIZE +
 bitshuffle::compress_lz4_bound(num_elems_after_padding, 
final_size_of_type, 0));
 
 encode_fixed32_le(&_buffer[0], _count);
-int64_t bytes = bitshuffle::compress_lz4(_data.data(), 
&_buffer[BITSHUFFLE_BLOCK_HEADER_SIZE],
+uint32_t bytes = bitshuffle::compress_lz4(_data.data(), 
&_buffer[BITSHUFFLE_PAGE_HEADER_SIZE],
 
 Review comment:
   please fix this also


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] yiguolei commented on a change in pull request #1497: Add ALTER_TABLET task in be

2019-07-22 Thread GitBox
yiguolei commented on a change in pull request #1497: Add ALTER_TABLET task in 
be
URL: https://github.com/apache/incubator-doris/pull/1497#discussion_r306106519
 
 

 ##
 File path: be/src/olap/schema_change.cpp
 ##
 @@ -1121,6 +1121,207 @@ bool SchemaChangeWithSorting::_external_sorting(
 return true;
 }
 
+OLAPStatus SchemaChangeHandler::process_alter_tablet_v2(const 
TAlterTabletReqV2& request) {
+LOG(INFO) << "begin to validate alter tablet request. base_tablet_id=" << 
request.base_tablet_id
+  << ", base_schema_hash" << request.base_schema_hash
+  << ", new_tablet_id=" << request.new_tablet_id
+  << ", new_schema_hash=" << request.new_schema_hash
+  << ", alter_version=" << request.alter_version
+  << ", alter_version_hash=" << request.alter_version_hash;
+
+// Lock schema_change_lock util schema change info is stored in tablet 
header
+if 
(!StorageEngine::instance()->tablet_manager()->try_schema_change_lock(request.base_tablet_id))
 {
+LOG(WARNING) << "failed to obtain schema change lock. "
+ << "base_tablet=" << request.base_tablet_id;
+return OLAP_ERR_TRY_LOCK_FAILED;
+}
+
+OLAPStatus res = _do_process_alter_tablet_v2(request);
+LOG(INFO) << "finished alter tablet process, res=" << res;
+
StorageEngine::instance()->tablet_manager()->release_schema_change_lock(request.base_tablet_id);
+return res;
+}
+
+OLAPStatus SchemaChangeHandler::_do_process_alter_tablet_v2(const 
TAlterTabletReqV2& request) {
 
 Review comment:
   NOT_READY tablet is created during create tablet task.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] yiguolei commented on a change in pull request #1497: Add ALTER_TABLET task in be

2019-07-22 Thread GitBox
yiguolei commented on a change in pull request #1497: Add ALTER_TABLET task in 
be
URL: https://github.com/apache/incubator-doris/pull/1497#discussion_r306107406
 
 

 ##
 File path: be/src/olap/schema_change.cpp
 ##
 @@ -1121,6 +1121,207 @@ bool SchemaChangeWithSorting::_external_sorting(
 return true;
 }
 
+OLAPStatus SchemaChangeHandler::process_alter_tablet_v2(const 
TAlterTabletReqV2& request) {
+LOG(INFO) << "begin to validate alter tablet request. base_tablet_id=" << 
request.base_tablet_id
+  << ", base_schema_hash" << request.base_schema_hash
+  << ", new_tablet_id=" << request.new_tablet_id
+  << ", new_schema_hash=" << request.new_schema_hash
+  << ", alter_version=" << request.alter_version
+  << ", alter_version_hash=" << request.alter_version_hash;
+
+// Lock schema_change_lock util schema change info is stored in tablet 
header
+if 
(!StorageEngine::instance()->tablet_manager()->try_schema_change_lock(request.base_tablet_id))
 {
+LOG(WARNING) << "failed to obtain schema change lock. "
+ << "base_tablet=" << request.base_tablet_id;
+return OLAP_ERR_TRY_LOCK_FAILED;
+}
+
+OLAPStatus res = _do_process_alter_tablet_v2(request);
+LOG(INFO) << "finished alter tablet process, res=" << res;
+
StorageEngine::instance()->tablet_manager()->release_schema_change_lock(request.base_tablet_id);
+return res;
+}
+
+OLAPStatus SchemaChangeHandler::_do_process_alter_tablet_v2(const 
TAlterTabletReqV2& request) {
 
 Review comment:
   Add some comment


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] yiguolei commented on a change in pull request #1497: Add ALTER_TABLET task in be

2019-07-22 Thread GitBox
yiguolei commented on a change in pull request #1497: Add ALTER_TABLET task in 
be
URL: https://github.com/apache/incubator-doris/pull/1497#discussion_r306107360
 
 

 ##
 File path: be/src/olap/schema_change.cpp
 ##
 @@ -1121,6 +1121,207 @@ bool SchemaChangeWithSorting::_external_sorting(
 return true;
 }
 
+OLAPStatus SchemaChangeHandler::process_alter_tablet_v2(const 
TAlterTabletReqV2& request) {
+LOG(INFO) << "begin to validate alter tablet request. base_tablet_id=" << 
request.base_tablet_id
 
 Review comment:
   Should not in _do_process_alter_Tablet_v2() because the request maybe failed 
during acquire lock.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] yiguolei commented on a change in pull request #1497: Add ALTER_TABLET task in be

2019-07-22 Thread GitBox
yiguolei commented on a change in pull request #1497: Add ALTER_TABLET task in 
be
URL: https://github.com/apache/incubator-doris/pull/1497#discussion_r306107791
 
 

 ##
 File path: be/src/olap/schema_change.cpp
 ##
 @@ -1121,6 +1121,207 @@ bool SchemaChangeWithSorting::_external_sorting(
 return true;
 }
 
+OLAPStatus SchemaChangeHandler::process_alter_tablet_v2(const 
TAlterTabletReqV2& request) {
+LOG(INFO) << "begin to validate alter tablet request. base_tablet_id=" << 
request.base_tablet_id
+  << ", base_schema_hash" << request.base_schema_hash
+  << ", new_tablet_id=" << request.new_tablet_id
+  << ", new_schema_hash=" << request.new_schema_hash
+  << ", alter_version=" << request.alter_version
+  << ", alter_version_hash=" << request.alter_version_hash;
+
+// Lock schema_change_lock util schema change info is stored in tablet 
header
+if 
(!StorageEngine::instance()->tablet_manager()->try_schema_change_lock(request.base_tablet_id))
 {
+LOG(WARNING) << "failed to obtain schema change lock. "
+ << "base_tablet=" << request.base_tablet_id;
+return OLAP_ERR_TRY_LOCK_FAILED;
+}
+
+OLAPStatus res = _do_process_alter_tablet_v2(request);
+LOG(INFO) << "finished alter tablet process, res=" << res;
+
StorageEngine::instance()->tablet_manager()->release_schema_change_lock(request.base_tablet_id);
+return res;
+}
+
+OLAPStatus SchemaChangeHandler::_do_process_alter_tablet_v2(const 
TAlterTabletReqV2& request) {
+OLAPStatus res = OLAP_SUCCESS;
+TabletSharedPtr base_tablet = 
StorageEngine::instance()->tablet_manager()->get_tablet(
+request.base_tablet_id, request.base_schema_hash);
+if (base_tablet == nullptr) {
+LOG(WARNING) << "fail to find base tablet. base_tablet=" << 
request.base_tablet_id
+ << ", base_schema_hash=" << request.base_schema_hash;
+return OLAP_ERR_TABLE_NOT_FOUND;
+}
+
+// new tablet has to exist
+TabletSharedPtr new_tablet = 
StorageEngine::instance()->tablet_manager()->get_tablet(
+request.new_tablet_id, request.new_schema_hash);
+if (new_tablet == nullptr) {
+LOG(WARNING) << "fail to find new tablet."
+ << ", new_tablet=" << request.new_tablet_id
+ << ", new_schema_hash=" << request.new_schema_hash;
+return OLAP_ERR_TABLE_NOT_FOUND;
+}
+
+// check if tablet's state is not_ready, if it is ready, it means the 
tablet already finished 
+// check whether the tablet's max continuous version == request.version
+if (new_tablet->tablet_state() != TABLET_NOTREADY) {
+res = _validate_alter_result(new_tablet, request);
+LOG(INFO) << "tablet's state=" << new_tablet->tablet_state()
+  << " the convert job alreay finished, check its version"
+  << " res=" << res;
+return res;
+}
+
+LOG(INFO) << "finish to validate alter tablet request. begin to convert 
data from base tablet to new tablet" 
+  << " base_tablet=" << base_tablet->full_name()
+  << " new_tablet=" << new_tablet->full_name();
+
+ReadLock base_migration_rlock(base_tablet->get_migration_lock_ptr(), 
TRY_LOCK);
+if (!base_migration_rlock.own_lock()) {
+return OLAP_ERR_RWLOCK_ERROR;
+}
+ReadLock new_migration_rlock(new_tablet->get_migration_lock_ptr(), 
TRY_LOCK);
+if (!new_migration_rlock.own_lock()) {
+return OLAP_ERR_RWLOCK_ERROR;
+}
+
+// 2. Get version_to_be_changed and store into tablet header
+base_tablet->obtain_push_lock();
+base_tablet->obtain_header_wrlock();
+new_tablet->obtain_header_wrlock();
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] imay merged pull request #1505: Add timediff function

2019-07-22 Thread GitBox
imay merged pull request #1505: Add timediff function
URL: https://github.com/apache/incubator-doris/pull/1505
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org



[GitHub] [incubator-doris] morningman opened a new pull request #1530: Add more logs and metrics to trace the broker load process

2019-07-22 Thread GitBox
morningman opened a new pull request #1530: Add more logs and metrics to trace 
the broker load process
URL: https://github.com/apache/incubator-doris/pull/1530
 
 
   1. The Operator wants to known when the job being scheduled as PENDING
   and LOADING. And how long it takes to finish these sub states. So I add log
   at the beginning of each state.
   
   2. Add 2 metrics on BE to monitor the memtable's flush time.
   `memtable_flush_total` and `memtable_flush_duration_us`
   
   3. Add a log when closing OlapTableSink, to record the add_batch() time of 
each
   Backend, cumulative.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org
For additional commands, e-mail: dev-h...@doris.apache.org