On Sunday, January 31, 2016 at 12:13:31 AM UTC-6, Scotty C wrote:
> > that's what i did. so new performance data. this is with bytes instead of
> > strings for data on the hard drive but bignums in the hash still.
> >
> > as a single large file and a hash with 203 buckets for 26.6 million
>
> that's what i did. so new performance data. this is with bytes instead of
> strings for data on the hard drive but bignums in the hash still.
>
> as a single large file and a hash with 203 buckets for 26.6 million
> records the data rate is 98408/sec.
>
> when i split and go with 11 small
Fixed, thanks for the report!
On Sat, Jan 30, 2016 at 8:31 PM, Scotty C wrote:
> just found a small mistake in the documentation: can you find it?
>
> (numerator q) → integer?
>
> q : rational?
>
> Coerces q to an exact number, finds the numerator of the number expressed
> in its s
just found a small mistake in the documentation: can you find it?
(numerator q) → integer?
q : rational?
Coerces q to an exact number, finds the numerator of the number expressed in
its simplest fractional form, and returns this number coerced to the exactness
of q.
(den
> Yes. You probably do need to convert the files. Your originalat
> coding likely is not [easily] compatible with binary I/O.
that's what i did. so new performance data. this is with bytes instead of
strings for data on the hard drive but bignums in the hash still.
as a single large file and
On Thu, 28 Jan 2016 20:32:08 -0800 (PST), Scotty C
wrote:
>> (current-memory-use)
>yup, tried that a while back didn't like what i saw. check this out:
>
>> (current-memory-use)
>581753864
>> (current-memory-use)
>586242568
>> (current-memory-use)
>591181736
>> (current-memory-use)
>595527064
>
>
On Fri, 29 Jan 2016 16:45:40 -0800 (PST), Scotty C
wrote:
>i have a new issue. i wrote my data as char and end records with 'return. i
>use (read-line x 'return) and the first record is 15 char. when i use
> (read-line-bytes x 'return) i get 23 byte. i have to assume that my old
>assumption tha
> i get the feeling that i will need to read the entire file as i used to read
> it taking each record and doing the following:
> convert the string record to a bignum record
> convert the bignum record into a byte string
> write the byte string to a new data file
>
> does that seem right?
never
> i get the feeling that i will need to read the entire file as i used to read
> it taking each record and doing the following:
> convert the string record to a bignum record
> convert the bignum record into a byte string
> write the byte string to a new data file
>
> does that seem right?
never
> However, if you have implemented your own, you can still call
> `equal-hash-code`
yes, my own hash.
i think the equal-hash-code will work.
--
You received this message because you are subscribed to the Google Groups
"Racket Users" group.
To unsubscribe from this group and stop receiving emai
On Fri, Jan 29, 2016 at 7:45 PM, Scotty C wrote:
> > my plan right now is to rework my current hash so that it runs byte
> strings instead of bignums.
>
> i have a new issue. i wrote my data as char and end records with 'return.
> i use (read-line x 'return) and the first record is 15 char. when
> my plan right now is to rework my current hash so that it runs byte strings
> instead of bignums.
i have a new issue. i wrote my data as char and end records with 'return. i use
(read-line x 'return) and the first record is 15 char. when i use
(read-line-bytes x 'return) i get 23 byte. i have
On Fri, Jan 29, 2016 at 7:04 PM, Scotty C wrote:
> > > question for you all. right now i use modulo on my bignums. i know i
> > > can't do that to a byte string. i'll figure something out. if any of
> > > you know how to do this, can you post a method?
> > >
> >
> > I'm not sure what your asking
> > question for you all. right now i use modulo on my bignums. i know i
> > can't do that to a byte string. i'll figure something out. if any of
> > you know how to do this, can you post a method?
> >
>
> I'm not sure what your asking exactly.
i'm talking about getting the hash index of a key.
On Fri, 2016-01-29 at 13:00 -0800, Scotty C wrote:
> ok, had time to run my hash on my one test file
> '(611 1 1 19 24783208 4.19)
> this means
> # buckets
> % buckets empty
> non empty bucket # keys least
> non empty bucket # keys most
> total number of keys
> average number of keys per non em
ok, had time to run my hash on my one test file
'(611 1 1 19 24783208 4.19)
this means
# buckets
% buckets empty
non empty bucket # keys least
non empty bucket # keys most
total number of keys
average number of keys per non empty bucket
it took 377 sec.
original # records is 26570359 so 6.7% d
On Thursday, January 28, 2016 at 11:36:50 PM UTC-6, Brandon Thomas wrote:
> On Thu, 2016-01-28 at 20:32 -0800, Scotty C wrote:
> > > I think you understand perfectly.
> > i'm coming around
> >
> > > You said the keys are 128-bit (16 byte) values. You can store one
> > > key
> > > directly in a by
> Way back in this thread you implied that you had extremely large FILES
> containing FIXED SIZE RECORDS, from which you needed
> to FILTER DUPLICATE records based on the value of a FIXED SIZE KEY
> field.
this is mostly correct. the data is state and state associated data on the
fringe. hence th
On Thu, 2016-01-28 at 20:32 -0800, Scotty C wrote:
> > I think you understand perfectly.
> i'm coming around
>
> > You said the keys are 128-bit (16 byte) values. You can store one
> > key
> > directly in a byte string of length 16.
> yup
>
> > So instead of using a vector of pointers to individ
> I think you understand perfectly.
i'm coming around
> You said the keys are 128-bit (16 byte) values. You can store one key
> directly in a byte string of length 16.
yup
> So instead of using a vector of pointers to individual byte strings,
> you would allocate a single byte string of length
On Thu, 28 Jan 2016 11:49:09 -0800 (PST), Scotty C
wrote:
>what's been bothering me was trying to get the data into 16 bytes in
>a byte string of that length. i couldn't get that to work so gave up and
>just shoved the data into 25 bytes. here's a bit of code. i think it's
>faster than my bignum
On Thu, 28 Jan 2016 07:56:05 -0800 (PST), Scotty C
wrote:
>you knew this was coming, right? put this into your data structure of choice:
>
>16 5 1 12 6 24 17 9 2 22 4 10 13 18 19 20 0 23 7 21 15 11 8 3 14
>
>this is a particular 5x5 tile puzzle
>(#6 in www.aaai.org/Papers/AAAI/1996/AAAI96-178.pd
On Thu, 28 Jan 2016 07:56:05 -0800 (PST), Scotty C
wrote:
>> You claim you want filtering to be as fast as possible. If that were
>> so, you would not pack multiple keys (or features thereof) into a
>> bignum but rather would store the keys individually.
>
>chasing pointers? no, you're thinking
what's been bothering me was trying to get the data into 16 bytes in a byte
string of that length. i couldn't get that to work so gave up and just shoved
the data into 25 bytes. here's a bit of code. i think it's faster than my
bignum stuff.
(define p (bytes 16 5 1 12 6 24 17 9 2 22 4 10 13 18
> You claim you want filtering to be as fast as possible. If that were
> so, you would not pack multiple keys (or features thereof) into a
> bignum but rather would store the keys individually.
chasing pointers? no, you're thinking about doing some sort of byte-append and
subbytes type of thing.
On Wed, 27 Jan 2016 19:43:49 -0800 (PST), Scotty C
wrote:
>> Then you're not using the hash in a conventional manner ... else the
>> filter entries would be unique
>
>using it conventionally? absolutely. it is a hash with separate chaining.
You snipped the part I was responding to, which was:
> Is it important to retain that sorting? Or is it just informational?
it's important
> Then you're not using the hash in a conventional manner ... else the
> filter entries would be unique ... and we really have no clue what
> you're actually doing. So any suggestions we give you are shots in
>
On Wed, 27 Jan 2016 11:17:04 -0800 (PST), Scotty C
wrote:
>On Wednesday, January 27, 2016 at 2:57:42 AM UTC-6, gneuner2 wrote:
>
>> What is this other field on which the file is sorted?
>this field is the cost in operators to arrive at the key value
Is it important to retain that sorting? Or is
On Wed, 2016-01-27 at 17:49 -0500, George Neuner wrote:
> On 1/27/2016 10:50 AM, Brandon Thomas wrote:
> > On Wed, 2016-01-27 at 04:01 -0500, George Neuner wrote:
> > > On Tue, 26 Jan 2016 23:00:01 -0500, Brandon Thomas
> > > wrote:
> > >
> > > > Is there anything stopping you from restructuring
On 1/27/2016 10:50 AM, Brandon Thomas wrote:
On Wed, 2016-01-27 at 04:01 -0500, George Neuner wrote:
> On Tue, 26 Jan 2016 23:00:01 -0500, Brandon Thomas
> wrote:
>
> > Is there anything stopping you from restructuring
> > the data on disk and using the hash directly from there
>
> Scotty's hash
egroups.com on behalf of
George Neuner
Sent: Wednesday, January 27, 2016 4:28 AM
To: racket-users@googlegroups.com
Subject: Re: [racket-users] Re: appending files
Sorry. I shouldn't do math at 4am. Ignore the numbers. However, it
is still correct that the byte array will use less space th
On Wednesday, January 27, 2016 at 2:57:42 AM UTC-6, gneuner2 wrote:
> What is this other field on which the file is sorted?
this field is the cost in operators to arrive at the key value
> WRT a set of duplicates: are you throwing away all duplicates? Keeping
> the 1st one encountered? Something
On Wed, 2016-01-27 at 04:01 -0500, George Neuner wrote:
> On Tue, 26 Jan 2016 23:00:01 -0500, Brandon Thomas
> wrote:
>
> > Is there anything stopping you from restructuring
> > the data on disk and using the hash directly from there
>
> Scotty's hash table is much larger than he thinks it is an
On Tue, 2016-01-26 at 22:48 -0800, Scotty C wrote:
> ok brandon, that's a thought. build the hash on the hard drive at the
> time of data creation. you mention collision resolution. so let me
> build my hash on the hard drive using my 6 million buckets but
> increase the size of each bucket from 5
Sorry. I shouldn't do math at 4am. Ignore the numbers. However, it
is still correct that the byte array will use less space than an array
of bignums.
George
On 1/27/2016 3:54 AM, George Neuner wrote:
i run a custom built hash. i use separate chaining with a vector of
bignums. i am willing
On Tue, 26 Jan 2016 23:00:01 -0500, Brandon Thomas
wrote:
>Is there anything stopping you from restructuring
>the data on disk and using the hash directly from there
Scotty's hash table is much larger than he thinks it is and very
likely is being paged to disk already. Deliberately implementing
Hi Scotty,
I rearranged your message a bit for (my own) clarity.
On Tue, 26 Jan 2016 18:40:28 -0800 (PST), Scotty C
wrote:
>running 64 bit linux mint OS on a 2 core laptop with 2 gb of ram.
>the generated keys are random but i use one of the associated
>fields for sorting during the initia
ok brandon, that's a thought. build the hash on the hard drive at the time of
data creation. you mention collision resolution. so let me build my hash on the
hard drive using my 6 million buckets but increase the size of each bucket from
5 slots to 20. right? i can't exactly recreate my vector/b
On Tue, 2016-01-26 at 18:40 -0800, Scotty C wrote:
> alright george, i'm open to new ideas. here's what i've got going.
> running 64 bit linux mint OS on a 2 core laptop with 2 gb of ram. my
> key is 128 bits with ~256 bits per record. so my 1 gb file contains
> ~63 million records and ~32 million
alright george, i'm open to new ideas. here's what i've got going. running 64
bit linux mint OS on a 2 core laptop with 2 gb of ram. my key is 128 bits with
~256 bits per record. so my 1 gb file contains ~63 million records and ~32
million keys. about 8% will be dupes leaving me with ~30 million
+1 on George Neuner's comments about how one can do smart processing of
huge files in small space. (I almost said something about that myself,
but didn't have time to get into that kind of discussion, so I stuck to
only the simpler file concatenation question.)
BTW, students who have 8GB RAM
On 1/26/2016 2:51 PM, Scotty C wrote:
gneuner2 (george), you are over thinking this thing. my test data of 1 gb is
but a small sample file. i can't even hash that small 1 gb at the time of data
creation. the hashed data won't fit in ram. at the time i put the redundant
data on the hard drive,
gneuner2 (george), you are over thinking this thing. my test data of 1 gb is
but a small sample file. i can't even hash that small 1 gb at the time of data
creation. the hashed data won't fit in ram. at the time i put the redundant
data on the hard drive, i do some constant time sorting so that
neil van dyke, i have used the system function before but had forgotten what it
was called and couldn't find it as a result in the documentation. my problem
with using the system function is that i need 2 versions of it: windoz and
linux. the copy-port function is a write once use across multipl
robby findler, you the man. i like the copy-port idea. i incorporated it and it
is nice and fast and easily fit into the existing code.
--
You received this message because you are subscribed to the Google Groups
"Racket Users" group.
To unsubscribe from this group and stop receiving emails fro
45 matches
Mail list logo