[racket-users] Re: appending files

2016-01-31 Thread Scotty C
On Sunday, January 31, 2016 at 12:13:31 AM UTC-6, Scotty C wrote: > > that's what i did. so new performance data. this is with bytes instead of > > strings for data on the hard drive but bignums in the hash still. > > > > as a single large file and a hash with 200

[racket-users] Re: appending files

2016-01-30 Thread Scotty C
> that's what i did. so new performance data. this is with bytes instead of > strings for data on the hard drive but bignums in the hash still. > > as a single large file and a hash with 203 buckets for 26.6 million > records the data rate is 98408/sec. > > when i split and go with 11 small

[racket-users] Re: appending files

2016-01-30 Thread Scotty C
just found a small mistake in the documentation: can you find it? (numerator q) → integer? q : rational? Coerces q to an exact number, finds the numerator of the number expressed in its simplest fractional form, and returns this number coerced to the exactness of q. (den

[racket-users] Re: appending files

2016-01-30 Thread Scotty C
> Yes. You probably do need to convert the files. Your originalat > coding likely is not [easily] compatible with binary I/O. that's what i did. so new performance data. this is with bytes instead of strings for data on the hard drive but bignums in the hash still. as a single large file and

[racket-users] Re: appending files

2016-01-29 Thread Scotty C
> i get the feeling that i will need to read the entire file as i used to read > it taking each record and doing the following: > convert the string record to a bignum record > convert the bignum record into a byte string > write the byte string to a new data file > > does that seem right? never

[racket-users] Re: appending files

2016-01-29 Thread Scotty C
> i get the feeling that i will need to read the entire file as i used to read > it taking each record and doing the following: > convert the string record to a bignum record > convert the bignum record into a byte string > write the byte string to a new data file > > does that seem right? never

Re: [racket-users] Re: appending files

2016-01-29 Thread Scotty C
> However, if you have implemented your own, you can still call > `equal-hash-code` yes, my own hash. i think the equal-hash-code will work. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emai

[racket-users] Re: appending files

2016-01-29 Thread Scotty C
> my plan right now is to rework my current hash so that it runs byte strings > instead of bignums. i have a new issue. i wrote my data as char and end records with 'return. i use (read-line x 'return) and the first record is 15 char. when i use (read-line-bytes x 'return) i get 23 byte. i have

Re: [racket-users] Re: appending files

2016-01-29 Thread Scotty C
> > question for you all. right now i use modulo on my bignums. i know i > > can't do that to a byte string. i'll figure something out. if any of > > you know how to do this, can you post a method? > > > > I'm not sure what your asking exactly. i'm talking about getting the hash index of a key.

[racket-users] Re: appending files

2016-01-29 Thread Scotty C
ok, had time to run my hash on my one test file '(611 1 1 19 24783208 4.19) this means # buckets % buckets empty non empty bucket # keys least non empty bucket # keys most total number of keys average number of keys per non empty bucket it took 377 sec. original # records is 26570359 so 6.7% d

Re: [racket-users] Re: appending files

2016-01-28 Thread Scotty C
On Thursday, January 28, 2016 at 11:36:50 PM UTC-6, Brandon Thomas wrote: > On Thu, 2016-01-28 at 20:32 -0800, Scotty C wrote: > > > I think you understand perfectly. > > i'm coming around > > > > > You said the keys are 128-bit (16 byte) values.  You can s

[racket-users] Re: appending files

2016-01-28 Thread Scotty C
> Way back in this thread you implied that you had extremely large FILES > containing FIXED SIZE RECORDS, from which you needed > to FILTER DUPLICATE records based on the value of a FIXED SIZE KEY > field. this is mostly correct. the data is state and state associated data on the fringe. hence th

[racket-users] Re: appending files

2016-01-28 Thread Scotty C
> I think you understand perfectly. i'm coming around > You said the keys are 128-bit (16 byte) values. You can store one key > directly in a byte string of length 16. yup > So instead of using a vector of pointers to individual byte strings, > you would allocate a single byte string of length

[racket-users] Re: appending files

2016-01-28 Thread Scotty C
what's been bothering me was trying to get the data into 16 bytes in a byte string of that length. i couldn't get that to work so gave up and just shoved the data into 25 bytes. here's a bit of code. i think it's faster than my bignum stuff. (define p (bytes 16 5 1 12 6 24 17 9 2 22 4 10 13 18

[racket-users] Re: appending files

2016-01-28 Thread Scotty C
> You claim you want filtering to be as fast as possible. If that were > so, you would not pack multiple keys (or features thereof) into a > bignum but rather would store the keys individually. chasing pointers? no, you're thinking about doing some sort of byte-append and subbytes type of thing.

[racket-users] Re: appending files

2016-01-27 Thread Scotty C
> Is it important to retain that sorting? Or is it just informational? it's important > Then you're not using the hash in a conventional manner ... else the > filter entries would be unique ... and we really have no clue what > you're actually doing. So any suggestions we give you are shots in >

[racket-users] Re: appending files

2016-01-27 Thread Scotty C
On Wednesday, January 27, 2016 at 2:57:42 AM UTC-6, gneuner2 wrote: > What is this other field on which the file is sorted? this field is the cost in operators to arrive at the key value > WRT a set of duplicates: are you throwing away all duplicates? Keeping > the 1st one encountered? Something

[racket-users] Re: appending files

2016-01-26 Thread Scotty C
ok brandon, that's a thought. build the hash on the hard drive at the time of data creation. you mention collision resolution. so let me build my hash on the hard drive using my 6 million buckets but increase the size of each bucket from 5 slots to 20. right? i can't exactly recreate my vector/b

[racket-users] Re: appending files

2016-01-26 Thread Scotty C
alright george, i'm open to new ideas. here's what i've got going. running 64 bit linux mint OS on a 2 core laptop with 2 gb of ram. my key is 128 bits with ~256 bits per record. so my 1 gb file contains ~63 million records and ~32 million keys. about 8% will be dupes leaving me with ~30 million

[racket-users] Re: appending files

2016-01-26 Thread Scotty C
gneuner2 (george), you are over thinking this thing. my test data of 1 gb is but a small sample file. i can't even hash that small 1 gb at the time of data creation. the hashed data won't fit in ram. at the time i put the redundant data on the hard drive, i do some constant time sorting so that

[racket-users] Re: appending files

2016-01-26 Thread Scotty C
neil van dyke, i have used the system function before but had forgotten what it was called and couldn't find it as a result in the documentation. my problem with using the system function is that i need 2 versions of it: windoz and linux. the copy-port function is a write once use across multipl

[racket-users] Re: appending files

2016-01-26 Thread Scotty C
robby findler, you the man. i like the copy-port idea. i incorporated it and it is nice and fast and easily fit into the existing code. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails fro

[racket-users] appending files

2016-01-25 Thread Scotty C
here's what i'm doing. i make a large, say 1 gb file with small records and there is some redundancy in the records. i will use a hash to identify duplicates by reading the file back in a record at a time but the file is too large to hash so i split it. the resultant files (10) are about 100 mb