Hi,
I am using 1.5.0 with the hdfs sink to write files to S3. The process
writes fine for a while but eventually I started getting the following
message:
INFO [pool-5-thread-1]
(org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents:224)
- Last read was never committed - resett
Hi,
Is there a way to get the hdfs sink to signal that a file was just closed
and then use that signal to add a partition to hive if one does not exist
already.
Right now, what I do is:
- move files to s3
- run recover partitions <--- step takes forever.
But given that I have so much historical
now I have removed the idleTimeout since a file will eventually get
closed due to the LRU but this is a known issue.
Should I open a JIRA for this ?
Thanks,
Viral
On Sun, Jul 27, 2014 at 7:25 PM, Viral Bajaria
wrote:
> Hi,
>
> I am using 1.5.0 with the hdfs sink to write files
I have a similar use case that cropped up yesterday. I saw the archive and
found that there was a recommendation to build it as Sharninder suggested.
For now, I went down the route of writing a python script which downloads
from S3 and puts the files in a directory which is configured to be picked
at 2:40 PM, Viral Bajaria
wrote:
> Hi,
>
> Is there a way to get the hdfs sink to signal that a file was just closed
> and then use that signal to add a partition to hive if one does not exist
> already.
>
> Right now, what I do is:
>
> - move files to s3
> - run reco
E command.
>
>
> On Thursday, July 31, 2014, Viral Bajaria wrote:
>
>> Any suggestions on this ? Still trying to figure out how do I get a
>> notification that a new partition is being created by the HDFS sink and I
>> can add that via a ALTER TABLE statement on a sepa
ing config files
>>>> stored on disk) stop fetching data from some S3 buckets
>>>>
>>>>
>>>> Would you need to be able to pull files from multiple S3
>>>> directories with the same source?
>>>>
>>
Hi Roshan,
I tried searching around, but could not find the ticket.
Can you point me to the ticket which shows the details of the Hive Sink ?
Is it coming out in the next release ?
Thanks,
Viral
On Tue, Aug 12, 2014 at 1:43 PM, Roshan Naik wrote:
> I could also do a talk/demo on the new Hi
All,
I had a question around using flume with columnar file formats like Parquet
or ORC files.
Has anyone tried writing to HDFS by creating ORC files instead of the TEXT
or SEQUNCEFILE ?
Thanks,
Viral
I have observed this again. I think this is definitely an issue because I
can repro it when running the flume agent for a few 100 files with the
above settings.
Anyone else seen this ?
Thanks,
Viral
On Mon, Jul 28, 2014 at 3:03 PM, Viral Bajaria
wrote:
> I found the issue.
>
> The
l 28, 2014 at 3:04 PM, Viral Bajaria
> wrote:
>
>> I found the issue.
>>
>> The deadlock happens since I have 3 events that can close a file:
>>
>> 1) maxOpenFiles: set to 500
>> 2) maxFileSize : set to 128MB
>> 3) idleTimeout : set to 30 seconds
>&g
11 matches
Mail list logo