GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/883
SAMZA-2072: Update guava to 23.0
Startpoint is relying on an old version of guava, which should be updated
to 23.0 for the newer api.
You can merge this pull request into a Git repository by
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/881
SAMZA-2068: Separating container launch logic into util class
The container launch logic needs to be invoked for beam-runner to run beam
containers. This is a small refactoring of
GitHub user xinyuiscool reopened a pull request:
https://github.com/apache/samza/pull/867
SAMZA-2048: Add guide to run Beam wordcount example
Use the maven archetype to generate the example project for beam wordcount
examples. Add the steps to set it up and run the examples.
You
Github user xinyuiscool closed the pull request at:
https://github.com/apache/samza/pull/867
---
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/867
SAMZA-2048: Add guide to run Beam wordcount example
Use the maven archetype to generate the example project for beam wordcount
examples. Add the steps to set it up and run the examples.
You can
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/805
SAMZA-1972: Make Operator Timer metrics calculation configurable
This patch introduces two changes:
1. Make the timer metrics in OperatorImpl to be optional, and disabled by
default. Adding
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/704
SAMZA-1911: Add documentation for quick start
md file for quick start.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/xinyuiscool/samza
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/658
SAMZA-1907: Add metrics to monitor watermarks
Add initial metric to monitor the aggregated watermark time.
You can merge this pull request into a Git repository by running:
$ git pull https
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/595
SAMZA-1796: PassthroughJobCoordinator doesn't create changelog streams
Currently only the ClusterBasedJobCoordinator and ZkJobCoordinator are
creating changelog streams. The Passthroug
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/588
SAMZA-1768: Handle corrupted OFFSET file
This patch addresses the following tickets:
SAMZA-1778: SIGSEGV when reading properties (metrics) on a closed RocksDB
store
SAMZA-1777
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/566
SAMZA-1762: Fix Memory link in the Timer Registry Map
Found a memory leak in the SystemTimerScheduler which does not remove the
timers from scheduledFutures after the timers are fired. This
Github user xinyuiscool closed the pull request at:
https://github.com/apache/samza/pull/505
---
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/516
Remove the iterable interface from KeyValueSnapshot
The iterable interface makes it hard for the users to close it after using.
You can merge this pull request into a Git repository by running
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/510
SAMZA-1705: Switch to use snapshot in iterable impl of RocksDb
We should use rocksDb.snapshot() method to keep the snapshot and creates a
new iterator with it all the time. The perf shows a
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/508
SAMZA-1704: Fix compatibility issues with scala 2.12
Need to add override keyword for overriding a method in scala 2.12.
You can merge this pull request into a Git repository by running
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/507
SAMZA-1703: Disable flaky test
TestEmbeddedTaggedRateLimiter.testAcquireWithTimeout
You can merge this pull request into a Git repository by running:
$ git pull https://github.com
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/506
SAMZA-1702: Prepare 0.14.1 release on the master branch
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/xinyuiscool/samza SAMZA-1702-master
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/505
SAMZA-1702: Prepare 0.14.1 release on the 0.14.1 branch
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/xinyuiscool/samza SAMZA-1702
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/492
SAMZA-1691: Support get iterable from KeyValueStore
Right now for KeyValueStore we have a range query to return an iterator.
For usage in BEAM, we need a iterable which will 1) create the
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/469
SAMZA-1645: A few issues found by BEAM stress test
1. Revert the priority set to intermediate streams.
2. Fix a watermark propagation condition
You can merge this pull request into a Git
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/456
SAMZA-1627: Watermark broadcast enhancements
Currently each upstream task needs to broadcast to every single partition
of intermediate streams in order to aggregate watermarks in the consumers
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/444
SAMZA-1615: Fix a couple of issues in ControlMessageSender
Two issues I found during testing: 1)
medaDataCache.getSystemStreamMetadata(): if we pass in partitionOnly to be
true, it will
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/419
SAMZA-1498: Support arbitrary system clock timer in operators
This patch adds the capability to register arbitrary timers for both
high-level and low-level api.
For high-level
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/415
SAMZA-1578: Fix watermark bug found by BEAM tests
The problem is getOutputWatermark() does not return the real
outputWatermark. This caused problem in user override watermark function.
You can
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/410
SAMZA-1557: Broadcast operator
This patch adds Broadcast operator that allows broadcasting messages to all
tasks. It's the counterpart of the Samza broadcast stream in low level api, and
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/402
SAMZA-1553: Add log4j for latest Kafka build
Add it so Samza compiles with the latest kafka.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/400
SAMZA-1550: Update master to use 0.14.1-SNAPSHOT version
Update master to use 0.14.1-SNAPSHOT version.
You can merge this pull request into a Git repository by running:
$ git pull https
Github user xinyuiscool closed the pull request at:
https://github.com/apache/samza/pull/399
---
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/399
[SAMZA-1550]: replace snapshot with release version in 0.14.0 branch
Prepare the doc for the 0.14.0 branch.
You can merge this pull request into a Git repository by running:
$ git pull
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/396
SAMZA-1550: Doc for 0.14.0 release
Docs update for both master and 0.14.0 branch.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/xinyuiscool
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/385
SAMZA-1534: Fix the visualization in job graph with the new PartitionBy Op
Seems the stream and the partitionBy op has the same id. So in rendering I
added the stream as the id for the node
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/381
SAMZA-1512: Documentation on the multi-stage batch processing
Documentation to explain how partitionBy(), checkpoint and state works in
batch.
You can merge this pull request into a Git
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/370
SAMZA-1516: Another round of issues found by BEAM tests
A couple of more fixes: 1. fix a bug of identifying input streams for an
operator. 2. for partitionBy, set the partitionKey to 0L when key
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/364
SAMZA-1505: Fix CheckpointTool writing only one ssp per task
Currently when using CheckpointTool to write checkpoints, it only writes a
checkpoint of a single ssp per task. By debugging the code
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/361
SAMZA-1504: Allow user to register container-level metrics
This change allows user to register the metrics on the per-container basis.
Tested in beam runner and works as expected
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/345
SAMZA-1477: Fix issues found by BEAM tests
A bunch of issues were found by BEAM tests, which includes:
1) WatermarkFunction needs to be able to return output after
processWatermark
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/328
SAMZA-1457: Set retention for internal streams for Batch application
For intermediate streams, checkpoint and changelog, we need to set a short
retention period for batch.
You can merge this
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/307
SAMZA-1434: Fix issues found in Hadoop
Fix the following bugs found when running Samza on hadoop:
1. Hdfs allows output partitions to be 0 (empty folder)
2. Add null check for the
Github user xinyuiscool closed the pull request at:
https://github.com/apache/samza/pull/297
---
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/297
SAMZA-1417: Clear and recreate intermediate and metadata streams for batch
processing
For each run of a batch application, we need to clear the internal streams
from the previous run and
Github user xinyuiscool closed the pull request at:
https://github.com/apache/samza/pull/225
---
Github user xinyuiscool closed the pull request at:
https://github.com/apache/samza/pull/236
---
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/292
SAMZA-1415: Add clearStream API in SystemAdmin and remove deprecated APIs
The patch does the following:
1) add clearStream() APi in SystemAdmin. Currently it's only supported in
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/277
SAMZA-1386: Inline End-of-stream and Watermark logic inside OperatorImpl
This patch contains the following changes:
1. Refactor watermark and end-of-stream logic. The aggregation/handling has
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/236
SAMZA-1321: Propagate end-of-stream and watermark messages
The patch completes the end-of-stream work flow across multi-stage
pipeline. It also contains initial commit for supporting watermarks
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/225
SAMZA-1321: Propagate end-of-stream messages
The patch completes the end-of-stream propagation across intermediate
streams. It does the following:
1) EndOfStreamManager aggregates the
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/207
SAMZA-1312: Add Control Messages and Intermediate Stream Serde
In this patch, we add the control message types which includes:
* EndOfStreamMessage
* WatermarkMessage
To support
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/189
SAMZA-1289: Default id generator if not configured
Right now in standalone deployment we require the user to provide an id
generator. Since most of the time the users can simply use the UUID
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/188
SAMZA-1288: Add null check for sink OutputStream
The logic to generate json for Sink operator does not check whether the
output stream is null. This causes null pointer exception.
You can merge
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/186
Increase the plan graph size
Increase the canvas size to a standard 24 inch resolution and also the
scaling factor.
You can merge this pull request into a Git repository by running:
$ git
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/184
SAMZA-1283: Expose the buffered-message-size metric
Regardless of whether we enable size limit for the consumer buffer, this
metric helps to see what's the buffer size and make configuring
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/172
SAMZA-1273: Make StreamConfig.getStreamIds() public
Making StreamConfig.getStreamIds() public so config provider can scan
through all the configured streams and expand some properties if needed
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/168
SAMZA-1267: ApplicationRunner#getLocalRunner returns null
Remove ApplicationRunner#getLocalRunner and clean up any usage examples.
You can merge this pull request into a Git repository by
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/154
SAMZA-1246: ApplicatonRunner.stats() should include exception in case of
failure
Current when ApplicationRunner.stats() only returns the enum representing
the status. It also need to include
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/145
SAMZA-1245: Make stream samza.physical.name config name string public
For certain system such as hdfs, the physical stream name might need to be
finalized during the config generation. In order
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/135
SAMZA-1222: Clean up LocalApplicationRunner
Clean up the LocalApplicationRunner based on the further feedback in
https://github.com/apache/samza/pull/117.
You can merge this pull request into a
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/127
SAMZA-1204: Visualize StreamGraph and ExecutionPlan
First look: https://xinyuiscool.github.io/visualizer/plan.html. This is
based on the example graph JSON generated in TestJobGraphJsonGenerator
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/117
SAMZA-1132: LocalApplicationRunner for StreamApplication
LocalApplicationRunner runs the StreamApplication locally on every node
that the application is deployed to. LocalRunner.start() is
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/110
SAMZA-1178: Generate JSON from StreamPlan
As the first step to visualize the StreamGraph/Plan, this patch generates a
json representation of it. For the example StreamGraph in
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/109
Samza 1186: Rename Processor to Job
Now we have the top level Samza application, and each stage is called a
job, the previous introduced "processor" naming should be renamed as
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/100
SAMZA-1172: Fix for the topological sort to handle single-node loop
In the processor graph, the topological sort missed adding to the visited
set during graph traversal. This caused wrong graph
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/98
SAMZA-1171: Rewrite config in ApplicationRunnerMain when creating
ApplicationRunner
The config needs to be rewritten before passing down to the
ApplicationRunner. This is a bug that was
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/94
SAMZA-1137: Instantiate ApplicationRunner in SamzaContainer
Create an ApplicationRunner in SamzaContainer to provide StreamSpecs for
fluent API.
You can merge this pull request into a Git
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/88
SAMZA-1131: RemoteApplicationRunner for cluster-based Samza applications
RemoteApplicationRunner starts the Samza StreamApplication on the remote
cluster, e.g. Yarn. It uses ExecutionPlanner for
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/79
Samza 1123: Create intermediate stream in partitionBy() operator
For partitionBy() operator, Samza generates an intermediate stream with id
based on operator name and id, and system based on
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/76
SAMZA-1122: Rename ExecutionEnvironment to ApplicationRunner
Some refactoring/cleanup:
- rename ExecutionEnvironment to ApplicationRunner, including all the
subclasses.
- rename the
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/75
SAMZA-1067: Physical execution graph and planner for fluent API
Initial commit for the physical graph and plan. The commit includes:
1) Physical ProcessorGraph, where each processor
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/41
SAMZA-1078: Add my gpg key to KEYS
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/xinyuiscool/samza KEYS
Alternatively you can review and
GitHub user xinyuiscool opened a pull request:
https://github.com/apache/samza/pull/37
SAMZA-1069: Fix Deadlock between KafkaSystemProducer and KafkaProducer
Moving the producer.close() and sources.flush() outside the lock so it
won't have race condition with the kafka ne
69 matches
Mail list logo